Package org.apache.iceberg.spark.actions
Class BaseRewriteManifestsSparkAction
- java.lang.Object
-
- org.apache.iceberg.spark.actions.BaseRewriteManifestsSparkAction
-
- All Implemented Interfaces:
Action<RewriteManifests,RewriteManifests.Result>
,RewriteManifests
,SnapshotUpdate<RewriteManifests,RewriteManifests.Result>
public class BaseRewriteManifestsSparkAction extends java.lang.Object implements RewriteManifests
An action that rewrites manifests in a distributed manner and co-locates metadata for partitions.By default, this action rewrites all manifests for the current partition spec and writes the result to the metadata folder. The behavior can be modified by passing a custom predicate to
rewriteIf(Predicate)
and a custom spec id tospecId(int)
. In addition, there is a way to configure a custom location for new manifests viastagingLocation
.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface org.apache.iceberg.actions.RewriteManifests
RewriteManifests.Result
-
-
Constructor Summary
Constructors Constructor Description BaseRewriteManifestsSparkAction(org.apache.spark.sql.SparkSession spark, Table table)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>
buildManifestFileDF(Table table)
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>
buildManifestListDF(Table table)
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>
buildOtherMetadataFileDF(Table table)
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>
buildValidDataFileDF(Table table)
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>
buildValidMetadataFileDF(Table table)
protected void
commit(SnapshotUpdate<?> update)
RewriteManifests.Result
execute()
Executes this action.protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>
loadMetadataTable(Table table, MetadataTableType type)
protected JobGroupInfo
newJobGroupInfo(java.lang.String groupId, java.lang.String desc)
protected Table
newStaticTable(TableMetadata metadata, FileIO io)
ThisT
option(java.lang.String name, java.lang.String value)
Configures this action with an extra option.protected java.util.Map<java.lang.String,java.lang.String>
options()
ThisT
options(java.util.Map<java.lang.String,java.lang.String> newOptions)
Configures this action with extra options.RewriteManifests
rewriteIf(java.util.function.Predicate<ManifestFile> newPredicate)
Rewrites only manifests that match the given predicate.protected RewriteManifests
self()
ThisT
snapshotProperty(java.lang.String property, java.lang.String value)
Sets a summary property in the snapshot produced by this action.protected org.apache.spark.sql.SparkSession
spark()
protected org.apache.spark.api.java.JavaSparkContext
sparkContext()
RewriteManifests
specId(int specId)
Rewrites manifests for a given spec id.RewriteManifests
stagingLocation(java.lang.String newStagingLocation)
Passes a location where the staged manifests should be written.protected <T> T
withJobGroupInfo(JobGroupInfo info, java.util.function.Supplier<T> supplier)
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.iceberg.actions.SnapshotUpdate
snapshotProperty
-
-
-
-
Constructor Detail
-
BaseRewriteManifestsSparkAction
public BaseRewriteManifestsSparkAction(org.apache.spark.sql.SparkSession spark, Table table)
-
-
Method Detail
-
self
protected RewriteManifests self()
-
specId
public RewriteManifests specId(int specId)
Description copied from interface:RewriteManifests
Rewrites manifests for a given spec id.If not set, defaults to the table's default spec ID.
- Specified by:
specId
in interfaceRewriteManifests
- Parameters:
specId
- a spec id- Returns:
- this for method chaining
-
rewriteIf
public RewriteManifests rewriteIf(java.util.function.Predicate<ManifestFile> newPredicate)
Description copied from interface:RewriteManifests
Rewrites only manifests that match the given predicate.If not set, all manifests will be rewritten.
- Specified by:
rewriteIf
in interfaceRewriteManifests
- Parameters:
newPredicate
- a predicate- Returns:
- this for method chaining
-
stagingLocation
public RewriteManifests stagingLocation(java.lang.String newStagingLocation)
Description copied from interface:RewriteManifests
Passes a location where the staged manifests should be written.If not set, defaults to the table's metadata location.
- Specified by:
stagingLocation
in interfaceRewriteManifests
- Parameters:
newStagingLocation
- a staging location- Returns:
- this for method chaining
-
execute
public RewriteManifests.Result execute()
Description copied from interface:Action
Executes this action.- Specified by:
execute
in interfaceAction<RewriteManifests,RewriteManifests.Result>
- Returns:
- the result of this action
-
snapshotProperty
public ThisT snapshotProperty(java.lang.String property, java.lang.String value)
Description copied from interface:SnapshotUpdate
Sets a summary property in the snapshot produced by this action.- Specified by:
snapshotProperty
in interfaceSnapshotUpdate<ThisT,R>
- Parameters:
property
- a snapshot property namevalue
- a snapshot property value- Returns:
- this for method chaining
-
commit
protected void commit(SnapshotUpdate<?> update)
-
spark
protected org.apache.spark.sql.SparkSession spark()
-
sparkContext
protected org.apache.spark.api.java.JavaSparkContext sparkContext()
-
option
public ThisT option(java.lang.String name, java.lang.String value)
Description copied from interface:Action
Configures this action with an extra option.Certain actions allow users to control internal details of their execution via options.
-
options
public ThisT options(java.util.Map<java.lang.String,java.lang.String> newOptions)
Description copied from interface:Action
Configures this action with extra options.Certain actions allow users to control internal details of their execution via options.
-
options
protected java.util.Map<java.lang.String,java.lang.String> options()
-
withJobGroupInfo
protected <T> T withJobGroupInfo(JobGroupInfo info, java.util.function.Supplier<T> supplier)
-
newJobGroupInfo
protected JobGroupInfo newJobGroupInfo(java.lang.String groupId, java.lang.String desc)
-
newStaticTable
protected Table newStaticTable(TableMetadata metadata, FileIO io)
-
buildValidDataFileDF
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildValidDataFileDF(Table table)
-
buildManifestFileDF
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildManifestFileDF(Table table)
-
buildManifestListDF
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildManifestListDF(Table table)
-
buildOtherMetadataFileDF
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildOtherMetadataFileDF(Table table)
-
buildValidMetadataFileDF
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildValidMetadataFileDF(Table table)
-
loadMetadataTable
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(Table table, MetadataTableType type)
-
-