Package org.apache.iceberg.spark.actions
Class RewriteManifestsSparkAction
- java.lang.Object
-
- org.apache.iceberg.spark.actions.RewriteManifestsSparkAction
-
- All Implemented Interfaces:
Action<RewriteManifests,RewriteManifests.Result>,RewriteManifests,SnapshotUpdate<RewriteManifests,RewriteManifests.Result>
public class RewriteManifestsSparkAction extends java.lang.Object implements RewriteManifests
An action that rewrites manifests in a distributed manner and co-locates metadata for partitions.By default, this action rewrites all manifests for the current partition spec and writes the result to the metadata folder. The behavior can be modified by passing a custom predicate to
rewriteIf(Predicate)and a custom spec id tospecId(int). In addition, there is a way to configure a custom location for new manifests viastagingLocation.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface org.apache.iceberg.actions.RewriteManifests
RewriteManifests.Result
-
-
Field Summary
Fields Modifier and Type Field Description protected static java.lang.StringCONTENT_FILEprotected static java.lang.StringFILE_PATHprotected static java.lang.StringFILE_TYPEprotected static java.lang.StringLAST_MODIFIEDprotected static java.lang.StringMANIFESTprotected static java.lang.StringMANIFEST_LISTprotected static java.lang.StringOTHERSstatic java.lang.StringUSE_CACHINGstatic booleanUSE_CACHING_DEFAULT
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>buildAllReachableOtherMetadataFileDF(Table table)protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>buildManifestFileDF(Table table)protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>buildManifestListDF(Table table)protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>buildOtherMetadataFileDF(Table table)protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>buildValidContentFileDF(Table table)protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>buildValidContentFileWithTypeDF(Table table)protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>buildValidMetadataFileDF(Table table)protected voidcommit(SnapshotUpdate<?> update)RewriteManifests.Resultexecute()Executes this action.protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>loadMetadataTable(Table table, MetadataTableType type)protected JobGroupInfonewJobGroupInfo(java.lang.String groupId, java.lang.String desc)protected TablenewStaticTable(TableMetadata metadata, FileIO io)ThisToption(java.lang.String name, java.lang.String value)protected java.util.Map<java.lang.String,java.lang.String>options()ThisToptions(java.util.Map<java.lang.String,java.lang.String> newOptions)RewriteManifestsSparkActionrewriteIf(java.util.function.Predicate<ManifestFile> newPredicate)Rewrites only manifests that match the given predicate.protected RewriteManifestsSparkActionself()ThisTsnapshotProperty(java.lang.String property, java.lang.String value)protected org.apache.spark.sql.SparkSessionspark()protected org.apache.spark.api.java.JavaSparkContextsparkContext()RewriteManifestsSparkActionspecId(int specId)Rewrites manifests for a given spec id.RewriteManifestsSparkActionstagingLocation(java.lang.String newStagingLocation)Passes a location where the staged manifests should be written.protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>withFileType(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> ds, java.lang.String type)protected <T> TwithJobGroupInfo(JobGroupInfo info, java.util.function.Supplier<T> supplier)-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.iceberg.actions.SnapshotUpdate
snapshotProperty
-
-
-
-
Field Detail
-
USE_CACHING
public static final java.lang.String USE_CACHING
- See Also:
- Constant Field Values
-
USE_CACHING_DEFAULT
public static final boolean USE_CACHING_DEFAULT
- See Also:
- Constant Field Values
-
CONTENT_FILE
protected static final java.lang.String CONTENT_FILE
- See Also:
- Constant Field Values
-
MANIFEST
protected static final java.lang.String MANIFEST
- See Also:
- Constant Field Values
-
MANIFEST_LIST
protected static final java.lang.String MANIFEST_LIST
- See Also:
- Constant Field Values
-
OTHERS
protected static final java.lang.String OTHERS
- See Also:
- Constant Field Values
-
FILE_PATH
protected static final java.lang.String FILE_PATH
- See Also:
- Constant Field Values
-
FILE_TYPE
protected static final java.lang.String FILE_TYPE
- See Also:
- Constant Field Values
-
LAST_MODIFIED
protected static final java.lang.String LAST_MODIFIED
- See Also:
- Constant Field Values
-
-
Method Detail
-
self
protected RewriteManifestsSparkAction self()
-
specId
public RewriteManifestsSparkAction specId(int specId)
Description copied from interface:RewriteManifestsRewrites manifests for a given spec id.If not set, defaults to the table's default spec ID.
- Specified by:
specIdin interfaceRewriteManifests- Parameters:
specId- a spec id- Returns:
- this for method chaining
-
rewriteIf
public RewriteManifestsSparkAction rewriteIf(java.util.function.Predicate<ManifestFile> newPredicate)
Description copied from interface:RewriteManifestsRewrites only manifests that match the given predicate.If not set, all manifests will be rewritten.
- Specified by:
rewriteIfin interfaceRewriteManifests- Parameters:
newPredicate- a predicate- Returns:
- this for method chaining
-
stagingLocation
public RewriteManifestsSparkAction stagingLocation(java.lang.String newStagingLocation)
Description copied from interface:RewriteManifestsPasses a location where the staged manifests should be written.If not set, defaults to the table's metadata location.
- Specified by:
stagingLocationin interfaceRewriteManifests- Parameters:
newStagingLocation- a staging location- Returns:
- this for method chaining
-
execute
public RewriteManifests.Result execute()
Description copied from interface:ActionExecutes this action.- Specified by:
executein interfaceAction<RewriteManifests,RewriteManifests.Result>- Returns:
- the result of this action
-
snapshotProperty
public ThisT snapshotProperty(java.lang.String property, java.lang.String value)
-
commit
protected void commit(SnapshotUpdate<?> update)
-
spark
protected org.apache.spark.sql.SparkSession spark()
-
sparkContext
protected org.apache.spark.api.java.JavaSparkContext sparkContext()
-
option
public ThisT option(java.lang.String name, java.lang.String value)
-
options
public ThisT options(java.util.Map<java.lang.String,java.lang.String> newOptions)
-
options
protected java.util.Map<java.lang.String,java.lang.String> options()
-
withJobGroupInfo
protected <T> T withJobGroupInfo(JobGroupInfo info, java.util.function.Supplier<T> supplier)
-
newJobGroupInfo
protected JobGroupInfo newJobGroupInfo(java.lang.String groupId, java.lang.String desc)
-
newStaticTable
protected Table newStaticTable(TableMetadata metadata, FileIO io)
-
buildValidContentFileWithTypeDF
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildValidContentFileWithTypeDF(Table table)
-
buildValidContentFileDF
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildValidContentFileDF(Table table)
-
buildManifestFileDF
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildManifestFileDF(Table table)
-
buildManifestListDF
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildManifestListDF(Table table)
-
buildOtherMetadataFileDF
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildOtherMetadataFileDF(Table table)
-
buildAllReachableOtherMetadataFileDF
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildAllReachableOtherMetadataFileDF(Table table)
-
buildValidMetadataFileDF
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildValidMetadataFileDF(Table table)
-
withFileType
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> withFileType(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> ds, java.lang.String type)
-
loadMetadataTable
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(Table table, MetadataTableType type)
-
-