java.lang.Object
- org.apache.iceberg.spark.actions.RewriteDataFilesSparkAction

All Implemented Interfaces:

Action<RewriteDataFiles,RewriteDataFiles.Result>, RewriteDataFiles, SnapshotUpdate<RewriteDataFiles,RewriteDataFiles.Result>
```
public class RewriteDataFilesSparkAction
extends java.lang.Object
implements RewriteDataFiles
```

Nested Class Summary
- Nested classes/interfaces inherited from interface org.apache.iceberg.actions.RewriteDataFiles
  RewriteDataFiles.FileGroupInfo, RewriteDataFiles.FileGroupRewriteResult, RewriteDataFiles.Result

Field Summary

Fields
Modifier and Type	Field	Description
`protected static java.lang.String`	`CONTENT_FILE`
`protected static java.lang.String`	`FILE_PATH`
`protected static java.lang.String`	`FILE_TYPE`
`protected static java.lang.String`	`LAST_MODIFIED`
`protected static java.lang.String`	`MANIFEST`
`protected static java.lang.String`	`MANIFEST_LIST`
`protected static java.lang.String`	`OTHERS`

Fields inherited from interface org.apache.iceberg.actions.RewriteDataFiles
MAX_CONCURRENT_FILE_GROUP_REWRITES, MAX_CONCURRENT_FILE_GROUP_REWRITES_DEFAULT, MAX_FILE_GROUP_SIZE_BYTES, MAX_FILE_GROUP_SIZE_BYTES_DEFAULT, PARTIAL_PROGRESS_ENABLED, PARTIAL_PROGRESS_ENABLED_DEFAULT, PARTIAL_PROGRESS_MAX_COMMITS, PARTIAL_PROGRESS_MAX_COMMITS_DEFAULT, REWRITE_JOB_ORDER, REWRITE_JOB_ORDER_DEFAULT, TARGET_FILE_SIZE_BYTES, USE_STARTING_SEQUENCE_NUMBER, USE_STARTING_SEQUENCE_NUMBER_DEFAULT

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`RewriteDataFilesSparkAction`	`binPack()`	Choose BINPACK as a strategy for this rewrite operation
`protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`buildAllReachableOtherMetadataFileDF(Table table)`
`protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`buildManifestFileDF(Table table)`
`protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`buildManifestListDF(Table table)`
`protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`buildOtherMetadataFileDF(Table table)`
`protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`buildValidContentFileDF(Table table)`
`protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`buildValidContentFileWithTypeDF(Table table)`
`protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`buildValidMetadataFileDF(Table table)`
`protected void`	`commit(SnapshotUpdate<?> update)`
`RewriteDataFiles.Result`	`execute()`	Executes this action.
`RewriteDataFilesSparkAction`	`filter(Expression expression)`	A user provided filter for determining which files will be considered by the rewrite strategy.
`protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`loadMetadataTable(Table table, MetadataTableType type)`
`protected JobGroupInfo`	`newJobGroupInfo(java.lang.String groupId, java.lang.String desc)`
`protected Table`	`newStaticTable(TableMetadata metadata, FileIO io)`
`ThisT`	`option(java.lang.String name, java.lang.String value)`
`protected java.util.Map<java.lang.String,java.lang.String>`	`options()`
`ThisT`	`options(java.util.Map<java.lang.String,java.lang.String> newOptions)`
`protected RewriteDataFilesSparkAction`	`self()`
`ThisT`	`snapshotProperty(java.lang.String property, java.lang.String value)`
`RewriteDataFilesSparkAction`	`sort()`	Choose SORT as a strategy for this rewrite operation using the table's sortOrder
`RewriteDataFilesSparkAction`	`sort(SortOrder sortOrder)`	Choose SORT as a strategy for this rewrite operation and manually specify the sortOrder to use
`protected org.apache.spark.sql.SparkSession`	`spark()`
`protected org.apache.spark.api.java.JavaSparkContext`	`sparkContext()`
`protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`withFileType(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> ds, java.lang.String type)`
`protected <T> T`	`withJobGroupInfo(JobGroupInfo info, java.util.function.Supplier<T> supplier)`
`RewriteDataFilesSparkAction`	`zOrder(java.lang.String... columnNames)`	Choose Z-ORDER as a strategy for this rewrite operation with a specified list of columns to use

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.iceberg.actions.Action
option, options

Methods inherited from interface org.apache.iceberg.actions.SnapshotUpdate
snapshotProperty

Field Detail

CONTENT_FILE

protected static final java.lang.String CONTENT_FILE

See Also:: Constant Field Values

MANIFEST

protected static final java.lang.String MANIFEST

See Also:: Constant Field Values

MANIFEST_LIST

protected static final java.lang.String MANIFEST_LIST

See Also:: Constant Field Values

OTHERS

protected static final java.lang.String OTHERS

See Also:: Constant Field Values

FILE_PATH

protected static final java.lang.String FILE_PATH

See Also:: Constant Field Values

FILE_TYPE

protected static final java.lang.String FILE_TYPE

See Also:: Constant Field Values

LAST_MODIFIED

protected static final java.lang.String LAST_MODIFIED

See Also:: Constant Field Values

Method Detail

self

protected RewriteDataFilesSparkAction self()

binPack
```
public RewriteDataFilesSparkAction binPack()
```
Description copied from interface: RewriteDataFiles

Choose BINPACK as a strategy for this rewrite operation

Specified by:

binPack in interface RewriteDataFiles

Returns:

this for method chaining

sort
```
public RewriteDataFilesSparkAction sort(SortOrder sortOrder)
```
Description copied from interface: RewriteDataFiles

Choose SORT as a strategy for this rewrite operation and manually specify the sortOrder to use

Specified by:

sort in interface RewriteDataFiles

Parameters:

sortOrder - user defined sortOrder

Returns:

this for method chaining

sort
```
public RewriteDataFilesSparkAction sort()
```
Description copied from interface: RewriteDataFiles

Choose SORT as a strategy for this rewrite operation using the table's sortOrder

Specified by:

sort in interface RewriteDataFiles

Returns:

this for method chaining

zOrder
```
public RewriteDataFilesSparkAction zOrder(java.lang.String... columnNames)
```
Description copied from interface: RewriteDataFiles

Choose Z-ORDER as a strategy for this rewrite operation with a specified list of columns to use

Specified by:

zOrder in interface RewriteDataFiles

Parameters:

columnNames - Columns to be used to generate Z-Values

Returns:

this for method chaining

filter
```
public RewriteDataFilesSparkAction filter(Expression expression)
```
Description copied from interface: RewriteDataFiles

A user provided filter for determining which files will be considered by the rewrite strategy. This will be used in addition to whatever rules the rewrite strategy generates. For example this would be used for providing a restriction to only run rewrite on a specific partition.

Specified by:

filter in interface RewriteDataFiles

Parameters:

expression - An iceberg expression used to determine which files will be considered for rewriting

Returns:

this for chaining

execute
```
public RewriteDataFiles.Result execute()
```
Description copied from interface: Action

Executes this action.

Specified by:

execute in interface Action<RewriteDataFiles,RewriteDataFiles.Result>

Returns:

the result of this action

snapshotProperty

public ThisT snapshotProperty(java.lang.String property,
                              java.lang.String value)

commit

protected void commit(SnapshotUpdate<?> update)

spark

protected org.apache.spark.sql.SparkSession spark()

sparkContext

protected org.apache.spark.api.java.JavaSparkContext sparkContext()

option

public ThisT option(java.lang.String name,
                    java.lang.String value)

options

public ThisT options(java.util.Map<java.lang.String,java.lang.String> newOptions)

options

protected java.util.Map<java.lang.String,java.lang.String> options()

withJobGroupInfo

protected <T> T withJobGroupInfo(JobGroupInfo info,
                                 java.util.function.Supplier<T> supplier)

newJobGroupInfo

protected JobGroupInfo newJobGroupInfo(java.lang.String groupId,
                                       java.lang.String desc)

newStaticTable

protected Table newStaticTable(TableMetadata metadata,
                               FileIO io)

buildValidContentFileWithTypeDF

protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildValidContentFileWithTypeDF(Table table)

buildValidContentFileDF

protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildValidContentFileDF(Table table)

buildManifestFileDF

protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildManifestFileDF(Table table)

buildManifestListDF

protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildManifestListDF(Table table)

buildOtherMetadataFileDF

protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildOtherMetadataFileDF(Table table)

buildAllReachableOtherMetadataFileDF

protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildAllReachableOtherMetadataFileDF(Table table)

buildValidMetadataFileDF

protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildValidMetadataFileDF(Table table)

withFileType

protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> withFileType(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> ds,
                                                                              java.lang.String type)

loadMetadataTable

protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(Table table,
                                                                                   MetadataTableType type)

Class RewriteDataFilesSparkAction

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.iceberg.actions.RewriteDataFiles

Field Summary

Fields inherited from interface org.apache.iceberg.actions.RewriteDataFiles

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.iceberg.actions.Action

Methods inherited from interface org.apache.iceberg.actions.SnapshotUpdate

Field Detail

CONTENT_FILE

MANIFEST

MANIFEST_LIST

OTHERS

FILE_PATH

FILE_TYPE

LAST_MODIFIED

Method Detail

self

binPack

sort

sort

zOrder

filter

execute

snapshotProperty

commit

spark

sparkContext

option

options

options

withJobGroupInfo

newJobGroupInfo

newStaticTable

buildValidContentFileWithTypeDF

buildValidContentFileDF

buildManifestFileDF

buildManifestListDF

buildOtherMetadataFileDF

buildAllReachableOtherMetadataFileDF

buildValidMetadataFileDF

withFileType

loadMetadataTable