public class BaseRewriteManifestsSparkAction extends java.lang.Object implements RewriteManifests
By default, this action rewrites all manifests for the current partition spec and writes the result
to the metadata folder. The behavior can be modified by passing a custom predicate to rewriteIf(Predicate)
and a custom spec id to specId(int)
. In addition, there is a way to configure a custom location
for new manifests via stagingLocation
.
RewriteManifests.Result
Constructor and Description |
---|
BaseRewriteManifestsSparkAction(org.apache.spark.sql.SparkSession spark,
Table table) |
Modifier and Type | Method and Description |
---|---|
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> |
buildManifestFileDF(Table table) |
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> |
buildManifestListDF(Table table) |
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> |
buildOtherMetadataFileDF(Table table) |
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> |
buildValidDataFileDF(Table table) |
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> |
buildValidMetadataFileDF(Table table) |
protected void |
commit(SnapshotUpdate<?> update) |
RewriteManifests.Result |
execute()
Executes this action.
|
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> |
loadMetadataTable(Table table,
MetadataTableType type) |
protected JobGroupInfo |
newJobGroupInfo(java.lang.String groupId,
java.lang.String desc) |
protected Table |
newStaticTable(TableMetadata metadata,
FileIO io) |
ThisT |
option(java.lang.String name,
java.lang.String value)
Configures this action with an extra option.
|
protected java.util.Map<java.lang.String,java.lang.String> |
options() |
ThisT |
options(java.util.Map<java.lang.String,java.lang.String> newOptions)
Configures this action with extra options.
|
RewriteManifests |
rewriteIf(java.util.function.Predicate<ManifestFile> newPredicate)
Rewrites only manifests that match the given predicate.
|
protected RewriteManifests |
self() |
ThisT |
snapshotProperty(java.lang.String property,
java.lang.String value)
Sets a summary property in the snapshot produced by this action.
|
protected org.apache.spark.sql.SparkSession |
spark() |
protected org.apache.spark.api.java.JavaSparkContext |
sparkContext() |
RewriteManifests |
specId(int specId)
Rewrites manifests for a given spec id.
|
RewriteManifests |
stagingLocation(java.lang.String newStagingLocation)
Passes a location where the staged manifests should be written.
|
protected <T> T |
withJobGroupInfo(JobGroupInfo info,
java.util.function.Supplier<T> supplier) |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
snapshotProperty
public BaseRewriteManifestsSparkAction(org.apache.spark.sql.SparkSession spark, Table table)
protected RewriteManifests self()
public RewriteManifests specId(int specId)
RewriteManifests
If not set, defaults to the table's default spec ID.
specId
in interface RewriteManifests
specId
- a spec idpublic RewriteManifests rewriteIf(java.util.function.Predicate<ManifestFile> newPredicate)
RewriteManifests
If not set, all manifests will be rewritten.
rewriteIf
in interface RewriteManifests
newPredicate
- a predicatepublic RewriteManifests stagingLocation(java.lang.String newStagingLocation)
RewriteManifests
If not set, defaults to the table's metadata location.
stagingLocation
in interface RewriteManifests
newStagingLocation
- a staging locationpublic RewriteManifests.Result execute()
Action
execute
in interface Action<RewriteManifests,RewriteManifests.Result>
public ThisT snapshotProperty(java.lang.String property, java.lang.String value)
SnapshotUpdate
snapshotProperty
in interface SnapshotUpdate<ThisT,R>
property
- a snapshot property namevalue
- a snapshot property valueprotected void commit(SnapshotUpdate<?> update)
protected org.apache.spark.sql.SparkSession spark()
protected org.apache.spark.api.java.JavaSparkContext sparkContext()
public ThisT option(java.lang.String name, java.lang.String value)
Action
Certain actions allow users to control internal details of their execution via options.
public ThisT options(java.util.Map<java.lang.String,java.lang.String> newOptions)
Action
Certain actions allow users to control internal details of their execution via options.
protected java.util.Map<java.lang.String,java.lang.String> options()
protected <T> T withJobGroupInfo(JobGroupInfo info, java.util.function.Supplier<T> supplier)
protected JobGroupInfo newJobGroupInfo(java.lang.String groupId, java.lang.String desc)
protected Table newStaticTable(TableMetadata metadata, FileIO io)
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildValidDataFileDF(Table table)
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildManifestFileDF(Table table)
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildManifestListDF(Table table)
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildOtherMetadataFileDF(Table table)
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildValidMetadataFileDF(Table table)
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(Table table, MetadataTableType type)