RewriteManifestsSparkAction

java.lang.Object
- org.apache.iceberg.spark.actions.RewriteManifestsSparkAction

All Implemented Interfaces:

Action<RewriteManifests,RewriteManifests.Result>, RewriteManifests, SnapshotUpdate<RewriteManifests,RewriteManifests.Result>
```
public class RewriteManifestsSparkAction
extends java.lang.Object
implements RewriteManifests
```
An action that rewrites manifests in a distributed manner and co-locates metadata for partitions.
By default, this action rewrites all manifests for the current partition spec and writes the result to the metadata folder. The behavior can be modified by passing a custom predicate to rewriteIf(Predicate) and a custom spec id to specId(int). In addition, there is a way to configure a custom location for new manifests via stagingLocation.

Nested Class Summary
- Nested classes/interfaces inherited from interface org.apache.iceberg.actions.RewriteManifests
  RewriteManifests.Result

Field Summary

Fields
Modifier and Type	Field and Description
`protected static org.apache.iceberg.relocated.com.google.common.base.Joiner`	`COMMA_JOINER`
`protected static org.apache.iceberg.relocated.com.google.common.base.Splitter`	`COMMA_SPLITTER`
`protected static java.lang.String`	`FILE_PATH`
`protected static java.lang.String`	`LAST_MODIFIED`
`protected static java.lang.String`	`MANIFEST`
`protected static java.lang.String`	`MANIFEST_LIST`
`protected static java.lang.String`	`OTHERS`
`protected static java.lang.String`	`STATISTICS_FILES`
`static java.lang.String`	`USE_CACHING`
`static boolean`	`USE_CACHING_DEFAULT`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`protected org.apache.spark.sql.Dataset<FileInfo>`	`allReachableOtherMetadataFileDS(Table table)`
`protected void`	`commit(SnapshotUpdate<?> update)`
`protected org.apache.spark.sql.Dataset<FileInfo>`	`contentFileDS(Table table)`
`protected org.apache.spark.sql.Dataset<FileInfo>`	`contentFileDS(Table table, java.util.Set<java.lang.Long> snapshotIds)`
`protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary`	`deleteFiles(java.util.concurrent.ExecutorService executorService, java.util.function.Consumer<java.lang.String> deleteFunc, java.util.Iterator<FileInfo> files)` Deletes files and keeps track of how many files were removed for each file type.
`protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary`	`deleteFiles(SupportsBulkOperations io, java.util.Iterator<FileInfo> files)`
`RewriteManifests.Result`	`execute()` Executes this action.
`protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`loadMetadataTable(Table table, MetadataTableType type)`
`protected org.apache.spark.sql.Dataset<FileInfo>`	`manifestDS(Table table)`
`protected org.apache.spark.sql.Dataset<FileInfo>`	`manifestDS(Table table, java.util.Set<java.lang.Long> snapshotIds)`
`protected org.apache.spark.sql.Dataset<FileInfo>`	`manifestListDS(Table table)`
`protected org.apache.spark.sql.Dataset<FileInfo>`	`manifestListDS(Table table, java.util.Set<java.lang.Long> snapshotIds)`
`protected JobGroupInfo`	`newJobGroupInfo(java.lang.String groupId, java.lang.String desc)`
`protected Table`	`newStaticTable(TableMetadata metadata, FileIO io)`
`ThisT`	`option(java.lang.String name, java.lang.String value)`
`protected java.util.Map<java.lang.String,java.lang.String>`	`options()`
`ThisT`	`options(java.util.Map<java.lang.String,java.lang.String> newOptions)`
`protected org.apache.spark.sql.Dataset<FileInfo>`	`otherMetadataFileDS(Table table)`
`RewriteManifestsSparkAction`	`rewriteIf(java.util.function.Predicate<ManifestFile> newPredicate)` Rewrites only manifests that match the given predicate.
`protected RewriteManifestsSparkAction`	`self()`
`ThisT`	`snapshotProperty(java.lang.String property, java.lang.String value)`
`protected org.apache.spark.sql.SparkSession`	`spark()`
`protected org.apache.spark.api.java.JavaSparkContext`	`sparkContext()`
`RewriteManifestsSparkAction`	`specId(int specId)` Rewrites manifests for a given spec id.
`RewriteManifestsSparkAction`	`stagingLocation(java.lang.String newStagingLocation)` Passes a location where the staged manifests should be written.
`protected org.apache.spark.sql.Dataset<FileInfo>`	`statisticsFileDS(Table table, java.util.Set<java.lang.Long> snapshotIds)`
`protected <T> T`	`withJobGroupInfo(JobGroupInfo info, java.util.function.Supplier<T> supplier)`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.iceberg.actions.SnapshotUpdate
snapshotProperty

Methods inherited from interface org.apache.iceberg.actions.Action
option, options

Field Detail

USE_CACHING

public static final java.lang.String USE_CACHING

See Also:: Constant Field Values

USE_CACHING_DEFAULT
```
public static final boolean USE_CACHING_DEFAULT
```
See Also:

Constant Field Values

MANIFEST

protected static final java.lang.String MANIFEST

See Also:: Constant Field Values

MANIFEST_LIST

protected static final java.lang.String MANIFEST_LIST

See Also:: Constant Field Values

STATISTICS_FILES

protected static final java.lang.String STATISTICS_FILES

See Also:: Constant Field Values

OTHERS

protected static final java.lang.String OTHERS

See Also:: Constant Field Values

FILE_PATH

protected static final java.lang.String FILE_PATH

See Also:: Constant Field Values

LAST_MODIFIED

protected static final java.lang.String LAST_MODIFIED

See Also:: Constant Field Values

COMMA_SPLITTER

protected static final org.apache.iceberg.relocated.com.google.common.base.Splitter COMMA_SPLITTER

COMMA_JOINER

protected static final org.apache.iceberg.relocated.com.google.common.base.Joiner COMMA_JOINER

Method Detail

self

protected RewriteManifestsSparkAction self()

specId
```
public RewriteManifestsSparkAction specId(int specId)
```
Description copied from interface: RewriteManifests

Rewrites manifests for a given spec id.
If not set, defaults to the table's default spec ID.

Specified by:

specId in interface RewriteManifests

Parameters:

specId - a spec id

Returns:

this for method chaining

rewriteIf
```
public RewriteManifestsSparkAction rewriteIf(java.util.function.Predicate<ManifestFile> newPredicate)
```
Description copied from interface: RewriteManifests

Rewrites only manifests that match the given predicate.
If not set, all manifests will be rewritten.

Specified by:

rewriteIf in interface RewriteManifests

Parameters:

newPredicate - a predicate

Returns:

this for method chaining

stagingLocation
```
public RewriteManifestsSparkAction stagingLocation(java.lang.String newStagingLocation)
```
Description copied from interface: RewriteManifests

Passes a location where the staged manifests should be written.
If not set, defaults to the table's metadata location.

Specified by:

stagingLocation in interface RewriteManifests

Parameters:

newStagingLocation - a staging location

Returns:

this for method chaining

execute
```
public RewriteManifests.Result execute()
```
Description copied from interface: Action

Executes this action.

Specified by:

execute in interface Action<RewriteManifests,RewriteManifests.Result>

Returns:

the result of this action

snapshotProperty

public ThisT snapshotProperty(java.lang.String property,
                              java.lang.String value)

commit

protected void commit(SnapshotUpdate<?> update)

spark

protected org.apache.spark.sql.SparkSession spark()

sparkContext

protected org.apache.spark.api.java.JavaSparkContext sparkContext()

option

public ThisT option(java.lang.String name,
                    java.lang.String value)

options

public ThisT options(java.util.Map<java.lang.String,java.lang.String> newOptions)

options

protected java.util.Map<java.lang.String,java.lang.String> options()

withJobGroupInfo

protected <T> T withJobGroupInfo(JobGroupInfo info,
                                 java.util.function.Supplier<T> supplier)

newJobGroupInfo

protected JobGroupInfo newJobGroupInfo(java.lang.String groupId,
                                       java.lang.String desc)

newStaticTable

protected Table newStaticTable(TableMetadata metadata,
                               FileIO io)

contentFileDS

protected org.apache.spark.sql.Dataset<FileInfo> contentFileDS(Table table)

contentFileDS

protected org.apache.spark.sql.Dataset<FileInfo> contentFileDS(Table table,
                                                               java.util.Set<java.lang.Long> snapshotIds)

manifestDS

protected org.apache.spark.sql.Dataset<FileInfo> manifestDS(Table table)

manifestDS

protected org.apache.spark.sql.Dataset<FileInfo> manifestDS(Table table,
                                                            java.util.Set<java.lang.Long> snapshotIds)

manifestListDS

protected org.apache.spark.sql.Dataset<FileInfo> manifestListDS(Table table)

manifestListDS

protected org.apache.spark.sql.Dataset<FileInfo> manifestListDS(Table table,
                                                                java.util.Set<java.lang.Long> snapshotIds)

statisticsFileDS

protected org.apache.spark.sql.Dataset<FileInfo> statisticsFileDS(Table table,
                                                                  java.util.Set<java.lang.Long> snapshotIds)

otherMetadataFileDS

protected org.apache.spark.sql.Dataset<FileInfo> otherMetadataFileDS(Table table)

allReachableOtherMetadataFileDS

protected org.apache.spark.sql.Dataset<FileInfo> allReachableOtherMetadataFileDS(Table table)

loadMetadataTable

protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(Table table,
                                                                                   MetadataTableType type)

deleteFiles

protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(java.util.concurrent.ExecutorService executorService,
                                                                                     java.util.function.Consumer<java.lang.String> deleteFunc,
                                                                                     java.util.Iterator<FileInfo> files)

Deletes files and keeps track of how many files were removed for each file type.

Parameters:: executorService - an executor service to use for parallel deletes; deleteFunc - a delete func; files - an iterator of Spark rows of the structure (path: String, type: String)
Returns:: stats on which files were deleted

deleteFiles

protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(SupportsBulkOperations io,
                                                                                     java.util.Iterator<FileInfo> files)

Class RewriteManifestsSparkAction

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.iceberg.actions.RewriteManifests

Field Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.iceberg.actions.SnapshotUpdate

Methods inherited from interface org.apache.iceberg.actions.Action

Field Detail

USE_CACHING

USE_CACHING_DEFAULT

MANIFEST

MANIFEST_LIST

STATISTICS_FILES

OTHERS

FILE_PATH

LAST_MODIFIED

COMMA_SPLITTER

COMMA_JOINER

Method Detail

self

specId

rewriteIf

stagingLocation

execute

snapshotProperty

commit

spark

sparkContext

option

options

options

withJobGroupInfo

newJobGroupInfo

newStaticTable

contentFileDS

contentFileDS

manifestDS

manifestDS

manifestListDS

manifestListDS

statisticsFileDS

otherMetadataFileDS

allReachableOtherMetadataFileDS

loadMetadataTable

deleteFiles

deleteFiles