Package org.apache.iceberg.spark.actions
Class BaseRewriteDataFilesSpark3Action
- java.lang.Object
-
- org.apache.iceberg.spark.actions.BaseRewriteDataFilesSpark3Action
-
- All Implemented Interfaces:
Action<RewriteDataFiles,RewriteDataFiles.Result>
,RewriteDataFiles
,SnapshotUpdate<RewriteDataFiles,RewriteDataFiles.Result>
public class BaseRewriteDataFilesSpark3Action extends java.lang.Object
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface org.apache.iceberg.actions.RewriteDataFiles
RewriteDataFiles.FileGroupInfo, RewriteDataFiles.FileGroupRewriteResult, RewriteDataFiles.Result
-
-
Field Summary
-
Fields inherited from interface org.apache.iceberg.actions.RewriteDataFiles
MAX_CONCURRENT_FILE_GROUP_REWRITES, MAX_CONCURRENT_FILE_GROUP_REWRITES_DEFAULT, MAX_FILE_GROUP_SIZE_BYTES, MAX_FILE_GROUP_SIZE_BYTES_DEFAULT, PARTIAL_PROGRESS_ENABLED, PARTIAL_PROGRESS_ENABLED_DEFAULT, PARTIAL_PROGRESS_MAX_COMMITS, PARTIAL_PROGRESS_MAX_COMMITS_DEFAULT, TARGET_FILE_SIZE_BYTES, USE_STARTING_SEQUENCE_NUMBER, USE_STARTING_SEQUENCE_NUMBER_DEFAULT
-
-
Constructor Summary
Constructors Modifier Constructor Description protected
BaseRewriteDataFilesSpark3Action(org.apache.spark.sql.SparkSession spark, Table table)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description RewriteDataFiles
binPack()
Choose BINPACK as a strategy for this rewrite operationprotected BinPackStrategy
binPackStrategy()
The framework specificBinPackStrategy
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>
buildManifestFileDF(Table table)
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>
buildManifestListDF(Table table)
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>
buildOtherMetadataFileDF(Table table)
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>
buildValidDataFileDF(Table table)
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>
buildValidMetadataFileDF(Table table)
protected void
commit(SnapshotUpdate<?> update)
RewriteDataFiles.Result
execute()
Executes this action.RewriteDataFiles
filter(Expression expression)
A user provided filter for determining which files will be considered by the rewrite strategy.protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>
loadMetadataTable(Table table, MetadataTableType type)
protected JobGroupInfo
newJobGroupInfo(java.lang.String groupId, java.lang.String desc)
protected Table
newStaticTable(TableMetadata metadata, FileIO io)
ThisT
option(java.lang.String name, java.lang.String value)
Configures this action with an extra option.protected java.util.Map<java.lang.String,java.lang.String>
options()
ThisT
options(java.util.Map<java.lang.String,java.lang.String> newOptions)
Configures this action with extra options.protected RewriteDataFiles
self()
ThisT
snapshotProperty(java.lang.String property, java.lang.String value)
Sets a summary property in the snapshot produced by this action.RewriteDataFiles
sort()
Choose SORT as a strategy for this rewrite operation using the table's sortOrderRewriteDataFiles
sort(SortOrder sortOrder)
Choose SORT as a strategy for this rewrite operation and manually specify the sortOrder to useprotected SortStrategy
sortStrategy()
The framework specificSortStrategy
protected org.apache.spark.sql.SparkSession
spark()
protected org.apache.spark.api.java.JavaSparkContext
sparkContext()
protected Table
table()
protected <T> T
withJobGroupInfo(JobGroupInfo info, java.util.function.Supplier<T> supplier)
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.iceberg.actions.SnapshotUpdate
snapshotProperty
-
-
-
-
Constructor Detail
-
BaseRewriteDataFilesSpark3Action
protected BaseRewriteDataFilesSpark3Action(org.apache.spark.sql.SparkSession spark, Table table)
-
-
Method Detail
-
binPackStrategy
protected BinPackStrategy binPackStrategy()
The framework specificBinPackStrategy
-
sortStrategy
protected SortStrategy sortStrategy()
The framework specificSortStrategy
-
self
protected RewriteDataFiles self()
-
table
protected Table table()
-
binPack
public RewriteDataFiles binPack()
Description copied from interface:RewriteDataFiles
Choose BINPACK as a strategy for this rewrite operation- Specified by:
binPack
in interfaceRewriteDataFiles
- Returns:
- this for method chaining
-
sort
public RewriteDataFiles sort(SortOrder sortOrder)
Description copied from interface:RewriteDataFiles
Choose SORT as a strategy for this rewrite operation and manually specify the sortOrder to use- Specified by:
sort
in interfaceRewriteDataFiles
- Parameters:
sortOrder
- user defined sortOrder- Returns:
- this for method chaining
-
sort
public RewriteDataFiles sort()
Description copied from interface:RewriteDataFiles
Choose SORT as a strategy for this rewrite operation using the table's sortOrder- Specified by:
sort
in interfaceRewriteDataFiles
- Returns:
- this for method chaining
-
filter
public RewriteDataFiles filter(Expression expression)
Description copied from interface:RewriteDataFiles
A user provided filter for determining which files will be considered by the rewrite strategy. This will be used in addition to whatever rules the rewrite strategy generates. For example this would be used for providing a restriction to only run rewrite on a specific partition.- Specified by:
filter
in interfaceRewriteDataFiles
- Parameters:
expression
- An iceberg expression used to determine which files will be considered for rewriting- Returns:
- this for chaining
-
execute
public RewriteDataFiles.Result execute()
Description copied from interface:Action
Executes this action.- Specified by:
execute
in interfaceAction<RewriteDataFiles,RewriteDataFiles.Result>
- Returns:
- the result of this action
-
snapshotProperty
public ThisT snapshotProperty(java.lang.String property, java.lang.String value)
Description copied from interface:SnapshotUpdate
Sets a summary property in the snapshot produced by this action.- Specified by:
snapshotProperty
in interfaceSnapshotUpdate<ThisT,R>
- Parameters:
property
- a snapshot property namevalue
- a snapshot property value- Returns:
- this for method chaining
-
commit
protected void commit(SnapshotUpdate<?> update)
-
spark
protected org.apache.spark.sql.SparkSession spark()
-
sparkContext
protected org.apache.spark.api.java.JavaSparkContext sparkContext()
-
option
public ThisT option(java.lang.String name, java.lang.String value)
Description copied from interface:Action
Configures this action with an extra option.Certain actions allow users to control internal details of their execution via options.
-
options
public ThisT options(java.util.Map<java.lang.String,java.lang.String> newOptions)
Description copied from interface:Action
Configures this action with extra options.Certain actions allow users to control internal details of their execution via options.
-
options
protected java.util.Map<java.lang.String,java.lang.String> options()
-
withJobGroupInfo
protected <T> T withJobGroupInfo(JobGroupInfo info, java.util.function.Supplier<T> supplier)
-
newJobGroupInfo
protected JobGroupInfo newJobGroupInfo(java.lang.String groupId, java.lang.String desc)
-
newStaticTable
protected Table newStaticTable(TableMetadata metadata, FileIO io)
-
buildValidDataFileDF
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildValidDataFileDF(Table table)
-
buildManifestFileDF
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildManifestFileDF(Table table)
-
buildManifestListDF
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildManifestListDF(Table table)
-
buildOtherMetadataFileDF
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildOtherMetadataFileDF(Table table)
-
buildValidMetadataFileDF
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildValidMetadataFileDF(Table table)
-
loadMetadataTable
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(Table table, MetadataTableType type)
-
-