Package org.apache.iceberg.spark.actions
Class RewriteManifestsSparkAction
- java.lang.Object
-
- org.apache.iceberg.spark.actions.RewriteManifestsSparkAction
-
- All Implemented Interfaces:
Action<RewriteManifests,RewriteManifests.Result>,RewriteManifests,SnapshotUpdate<RewriteManifests,RewriteManifests.Result>
public class RewriteManifestsSparkAction extends java.lang.Object implements RewriteManifests
An action that rewrites manifests in a distributed manner and co-locates metadata for partitions.By default, this action rewrites all manifests for the current partition spec and writes the result to the metadata folder. The behavior can be modified by passing a custom predicate to
rewriteIf(Predicate)and a custom spec ID tospecId(int). In addition, there is a way to configure a custom location for staged manifests viastagingLocation(String). The provided staging location will be ignored if snapshot ID inheritance is enabled. In such cases, the manifests are always written to the metadata folder and committed without staging.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface org.apache.iceberg.actions.RewriteManifests
RewriteManifests.Result
-
-
Field Summary
Fields Modifier and Type Field Description protected static org.apache.iceberg.relocated.com.google.common.base.JoinerCOMMA_JOINERprotected static org.apache.iceberg.relocated.com.google.common.base.SplitterCOMMA_SPLITTERprotected static java.lang.StringFILE_PATHprotected static java.lang.StringLAST_MODIFIEDprotected static java.lang.StringMANIFESTprotected static java.lang.StringMANIFEST_LISTprotected static java.lang.StringOTHERSprotected static java.lang.StringSTATISTICS_FILESstatic java.lang.StringUSE_CACHINGstatic booleanUSE_CACHING_DEFAULT
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected org.apache.spark.sql.Dataset<FileInfo>allReachableOtherMetadataFileDS(Table table)protected voidcommit(SnapshotUpdate<?> update)protected java.util.Map<java.lang.String,java.lang.String>commitSummary()protected org.apache.spark.sql.Dataset<FileInfo>contentFileDS(Table table)protected org.apache.spark.sql.Dataset<FileInfo>contentFileDS(Table table, java.util.Set<java.lang.Long> snapshotIds)protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummarydeleteFiles(java.util.concurrent.ExecutorService executorService, java.util.function.Consumer<java.lang.String> deleteFunc, java.util.Iterator<FileInfo> files)Deletes files and keeps track of how many files were removed for each file type.protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummarydeleteFiles(SupportsBulkOperations io, java.util.Iterator<FileInfo> files)RewriteManifests.Resultexecute()Executes this action.protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>loadMetadataTable(Table table, MetadataTableType type)protected org.apache.spark.sql.Dataset<FileInfo>manifestDS(Table table)protected org.apache.spark.sql.Dataset<FileInfo>manifestDS(Table table, java.util.Set<java.lang.Long> snapshotIds)protected org.apache.spark.sql.Dataset<FileInfo>manifestListDS(Table table)protected org.apache.spark.sql.Dataset<FileInfo>manifestListDS(Table table, java.util.Set<java.lang.Long> snapshotIds)protected JobGroupInfonewJobGroupInfo(java.lang.String groupId, java.lang.String desc)protected TablenewStaticTable(TableMetadata metadata, FileIO io)ThisToption(java.lang.String name, java.lang.String value)protected java.util.Map<java.lang.String,java.lang.String>options()ThisToptions(java.util.Map<java.lang.String,java.lang.String> newOptions)protected org.apache.spark.sql.Dataset<FileInfo>otherMetadataFileDS(Table table)RewriteManifestsSparkActionrewriteIf(java.util.function.Predicate<ManifestFile> newPredicate)Rewrites only manifests that match the given predicate.protected RewriteManifestsSparkActionself()ThisTsnapshotProperty(java.lang.String property, java.lang.String value)protected org.apache.spark.sql.SparkSessionspark()protected org.apache.spark.api.java.JavaSparkContextsparkContext()RewriteManifestsSparkActionspecId(int specId)Rewrites manifests for a given spec id.RewriteManifestsSparkActionstagingLocation(java.lang.String newStagingLocation)Passes a location where the staged manifests should be written.protected org.apache.spark.sql.Dataset<FileInfo>statisticsFileDS(Table table, java.util.Set<java.lang.Long> snapshotIds)protected <T> TwithJobGroupInfo(JobGroupInfo info, java.util.function.Supplier<T> supplier)-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.iceberg.actions.SnapshotUpdate
snapshotProperty
-
-
-
-
Field Detail
-
USE_CACHING
public static final java.lang.String USE_CACHING
- See Also:
- Constant Field Values
-
USE_CACHING_DEFAULT
public static final boolean USE_CACHING_DEFAULT
- See Also:
- Constant Field Values
-
MANIFEST
protected static final java.lang.String MANIFEST
- See Also:
- Constant Field Values
-
MANIFEST_LIST
protected static final java.lang.String MANIFEST_LIST
- See Also:
- Constant Field Values
-
STATISTICS_FILES
protected static final java.lang.String STATISTICS_FILES
- See Also:
- Constant Field Values
-
OTHERS
protected static final java.lang.String OTHERS
- See Also:
- Constant Field Values
-
FILE_PATH
protected static final java.lang.String FILE_PATH
- See Also:
- Constant Field Values
-
LAST_MODIFIED
protected static final java.lang.String LAST_MODIFIED
- See Also:
- Constant Field Values
-
COMMA_SPLITTER
protected static final org.apache.iceberg.relocated.com.google.common.base.Splitter COMMA_SPLITTER
-
COMMA_JOINER
protected static final org.apache.iceberg.relocated.com.google.common.base.Joiner COMMA_JOINER
-
-
Method Detail
-
self
protected RewriteManifestsSparkAction self()
-
specId
public RewriteManifestsSparkAction specId(int specId)
Description copied from interface:RewriteManifestsRewrites manifests for a given spec id.If not set, defaults to the table's default spec ID.
- Specified by:
specIdin interfaceRewriteManifests- Parameters:
specId- a spec id- Returns:
- this for method chaining
-
rewriteIf
public RewriteManifestsSparkAction rewriteIf(java.util.function.Predicate<ManifestFile> newPredicate)
Description copied from interface:RewriteManifestsRewrites only manifests that match the given predicate.If not set, all manifests will be rewritten.
- Specified by:
rewriteIfin interfaceRewriteManifests- Parameters:
newPredicate- a predicate- Returns:
- this for method chaining
-
stagingLocation
public RewriteManifestsSparkAction stagingLocation(java.lang.String newStagingLocation)
Description copied from interface:RewriteManifestsPasses a location where the staged manifests should be written.If not set, defaults to the table's metadata location.
- Specified by:
stagingLocationin interfaceRewriteManifests- Parameters:
newStagingLocation- a staging location- Returns:
- this for method chaining
-
execute
public RewriteManifests.Result execute()
Description copied from interface:ActionExecutes this action.- Specified by:
executein interfaceAction<RewriteManifests,RewriteManifests.Result>- Returns:
- the result of this action
-
snapshotProperty
public ThisT snapshotProperty(java.lang.String property, java.lang.String value)
-
commit
protected void commit(SnapshotUpdate<?> update)
-
commitSummary
protected java.util.Map<java.lang.String,java.lang.String> commitSummary()
-
spark
protected org.apache.spark.sql.SparkSession spark()
-
sparkContext
protected org.apache.spark.api.java.JavaSparkContext sparkContext()
-
option
public ThisT option(java.lang.String name, java.lang.String value)
-
options
public ThisT options(java.util.Map<java.lang.String,java.lang.String> newOptions)
-
options
protected java.util.Map<java.lang.String,java.lang.String> options()
-
withJobGroupInfo
protected <T> T withJobGroupInfo(JobGroupInfo info, java.util.function.Supplier<T> supplier)
-
newJobGroupInfo
protected JobGroupInfo newJobGroupInfo(java.lang.String groupId, java.lang.String desc)
-
newStaticTable
protected Table newStaticTable(TableMetadata metadata, FileIO io)
-
contentFileDS
protected org.apache.spark.sql.Dataset<FileInfo> contentFileDS(Table table, java.util.Set<java.lang.Long> snapshotIds)
-
manifestDS
protected org.apache.spark.sql.Dataset<FileInfo> manifestDS(Table table, java.util.Set<java.lang.Long> snapshotIds)
-
manifestListDS
protected org.apache.spark.sql.Dataset<FileInfo> manifestListDS(Table table, java.util.Set<java.lang.Long> snapshotIds)
-
statisticsFileDS
protected org.apache.spark.sql.Dataset<FileInfo> statisticsFileDS(Table table, java.util.Set<java.lang.Long> snapshotIds)
-
otherMetadataFileDS
protected org.apache.spark.sql.Dataset<FileInfo> otherMetadataFileDS(Table table)
-
allReachableOtherMetadataFileDS
protected org.apache.spark.sql.Dataset<FileInfo> allReachableOtherMetadataFileDS(Table table)
-
loadMetadataTable
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(Table table, MetadataTableType type)
-
deleteFiles
protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(java.util.concurrent.ExecutorService executorService, java.util.function.Consumer<java.lang.String> deleteFunc, java.util.Iterator<FileInfo> files)Deletes files and keeps track of how many files were removed for each file type.- Parameters:
executorService- an executor service to use for parallel deletesdeleteFunc- a delete funcfiles- an iterator of Spark rows of the structure (path: String, type: String)- Returns:
- stats on which files were deleted
-
deleteFiles
protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(SupportsBulkOperations io, java.util.Iterator<FileInfo> files)
-
-