Package org.apache.iceberg.spark.actions
Class RewriteDataFilesSparkAction
- java.lang.Object
-
- org.apache.iceberg.spark.actions.RewriteDataFilesSparkAction
-
- All Implemented Interfaces:
Action<RewriteDataFiles,RewriteDataFiles.Result>,RewriteDataFiles,SnapshotUpdate<RewriteDataFiles,RewriteDataFiles.Result>
public class RewriteDataFilesSparkAction extends java.lang.Object implements RewriteDataFiles
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface org.apache.iceberg.actions.RewriteDataFiles
RewriteDataFiles.FileGroupInfo, RewriteDataFiles.FileGroupRewriteResult, RewriteDataFiles.Result
-
-
Field Summary
Fields Modifier and Type Field Description protected static java.lang.StringCONTENT_FILEprotected static java.lang.StringFILE_PATHprotected static java.lang.StringFILE_TYPEprotected static java.lang.StringLAST_MODIFIEDprotected static java.lang.StringMANIFESTprotected static java.lang.StringMANIFEST_LISTprotected static java.lang.StringOTHERS-
Fields inherited from interface org.apache.iceberg.actions.RewriteDataFiles
MAX_CONCURRENT_FILE_GROUP_REWRITES, MAX_CONCURRENT_FILE_GROUP_REWRITES_DEFAULT, MAX_FILE_GROUP_SIZE_BYTES, MAX_FILE_GROUP_SIZE_BYTES_DEFAULT, PARTIAL_PROGRESS_ENABLED, PARTIAL_PROGRESS_ENABLED_DEFAULT, PARTIAL_PROGRESS_MAX_COMMITS, PARTIAL_PROGRESS_MAX_COMMITS_DEFAULT, REWRITE_JOB_ORDER, REWRITE_JOB_ORDER_DEFAULT, TARGET_FILE_SIZE_BYTES, USE_STARTING_SEQUENCE_NUMBER, USE_STARTING_SEQUENCE_NUMBER_DEFAULT
-
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description RewriteDataFilesSparkActionbinPack()Choose BINPACK as a strategy for this rewrite operationprotected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>buildAllReachableOtherMetadataFileDF(Table table)protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>buildManifestFileDF(Table table)protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>buildManifestListDF(Table table)protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>buildOtherMetadataFileDF(Table table)protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>buildValidContentFileDF(Table table)protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>buildValidContentFileWithTypeDF(Table table)protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>buildValidMetadataFileDF(Table table)protected voidcommit(SnapshotUpdate<?> update)RewriteDataFiles.Resultexecute()Executes this action.RewriteDataFilesSparkActionfilter(Expression expression)A user provided filter for determining which files will be considered by the rewrite strategy.protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>loadMetadataTable(Table table, MetadataTableType type)protected JobGroupInfonewJobGroupInfo(java.lang.String groupId, java.lang.String desc)protected TablenewStaticTable(TableMetadata metadata, FileIO io)ThisToption(java.lang.String name, java.lang.String value)protected java.util.Map<java.lang.String,java.lang.String>options()ThisToptions(java.util.Map<java.lang.String,java.lang.String> newOptions)protected RewriteDataFilesSparkActionself()ThisTsnapshotProperty(java.lang.String property, java.lang.String value)RewriteDataFilesSparkActionsort()Choose SORT as a strategy for this rewrite operation using the table's sortOrderRewriteDataFilesSparkActionsort(SortOrder sortOrder)Choose SORT as a strategy for this rewrite operation and manually specify the sortOrder to useprotected org.apache.spark.sql.SparkSessionspark()protected org.apache.spark.api.java.JavaSparkContextsparkContext()protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>withFileType(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> ds, java.lang.String type)protected <T> TwithJobGroupInfo(JobGroupInfo info, java.util.function.Supplier<T> supplier)RewriteDataFilesSparkActionzOrder(java.lang.String... columnNames)Choose Z-ORDER as a strategy for this rewrite operation with a specified list of columns to use-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.iceberg.actions.SnapshotUpdate
snapshotProperty
-
-
-
-
Field Detail
-
CONTENT_FILE
protected static final java.lang.String CONTENT_FILE
- See Also:
- Constant Field Values
-
MANIFEST
protected static final java.lang.String MANIFEST
- See Also:
- Constant Field Values
-
MANIFEST_LIST
protected static final java.lang.String MANIFEST_LIST
- See Also:
- Constant Field Values
-
OTHERS
protected static final java.lang.String OTHERS
- See Also:
- Constant Field Values
-
FILE_PATH
protected static final java.lang.String FILE_PATH
- See Also:
- Constant Field Values
-
FILE_TYPE
protected static final java.lang.String FILE_TYPE
- See Also:
- Constant Field Values
-
LAST_MODIFIED
protected static final java.lang.String LAST_MODIFIED
- See Also:
- Constant Field Values
-
-
Method Detail
-
self
protected RewriteDataFilesSparkAction self()
-
binPack
public RewriteDataFilesSparkAction binPack()
Description copied from interface:RewriteDataFilesChoose BINPACK as a strategy for this rewrite operation- Specified by:
binPackin interfaceRewriteDataFiles- Returns:
- this for method chaining
-
sort
public RewriteDataFilesSparkAction sort(SortOrder sortOrder)
Description copied from interface:RewriteDataFilesChoose SORT as a strategy for this rewrite operation and manually specify the sortOrder to use- Specified by:
sortin interfaceRewriteDataFiles- Parameters:
sortOrder- user defined sortOrder- Returns:
- this for method chaining
-
sort
public RewriteDataFilesSparkAction sort()
Description copied from interface:RewriteDataFilesChoose SORT as a strategy for this rewrite operation using the table's sortOrder- Specified by:
sortin interfaceRewriteDataFiles- Returns:
- this for method chaining
-
zOrder
public RewriteDataFilesSparkAction zOrder(java.lang.String... columnNames)
Description copied from interface:RewriteDataFilesChoose Z-ORDER as a strategy for this rewrite operation with a specified list of columns to use- Specified by:
zOrderin interfaceRewriteDataFiles- Parameters:
columnNames- Columns to be used to generate Z-Values- Returns:
- this for method chaining
-
filter
public RewriteDataFilesSparkAction filter(Expression expression)
Description copied from interface:RewriteDataFilesA user provided filter for determining which files will be considered by the rewrite strategy. This will be used in addition to whatever rules the rewrite strategy generates. For example this would be used for providing a restriction to only run rewrite on a specific partition.- Specified by:
filterin interfaceRewriteDataFiles- Parameters:
expression- An iceberg expression used to determine which files will be considered for rewriting- Returns:
- this for chaining
-
execute
public RewriteDataFiles.Result execute()
Description copied from interface:ActionExecutes this action.- Specified by:
executein interfaceAction<RewriteDataFiles,RewriteDataFiles.Result>- Returns:
- the result of this action
-
snapshotProperty
public ThisT snapshotProperty(java.lang.String property, java.lang.String value)
-
commit
protected void commit(SnapshotUpdate<?> update)
-
spark
protected org.apache.spark.sql.SparkSession spark()
-
sparkContext
protected org.apache.spark.api.java.JavaSparkContext sparkContext()
-
option
public ThisT option(java.lang.String name, java.lang.String value)
-
options
public ThisT options(java.util.Map<java.lang.String,java.lang.String> newOptions)
-
options
protected java.util.Map<java.lang.String,java.lang.String> options()
-
withJobGroupInfo
protected <T> T withJobGroupInfo(JobGroupInfo info, java.util.function.Supplier<T> supplier)
-
newJobGroupInfo
protected JobGroupInfo newJobGroupInfo(java.lang.String groupId, java.lang.String desc)
-
newStaticTable
protected Table newStaticTable(TableMetadata metadata, FileIO io)
-
buildValidContentFileWithTypeDF
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildValidContentFileWithTypeDF(Table table)
-
buildValidContentFileDF
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildValidContentFileDF(Table table)
-
buildManifestFileDF
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildManifestFileDF(Table table)
-
buildManifestListDF
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildManifestListDF(Table table)
-
buildOtherMetadataFileDF
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildOtherMetadataFileDF(Table table)
-
buildAllReachableOtherMetadataFileDF
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildAllReachableOtherMetadataFileDF(Table table)
-
buildValidMetadataFileDF
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildValidMetadataFileDF(Table table)
-
withFileType
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> withFileType(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> ds, java.lang.String type)
-
loadMetadataTable
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(Table table, MetadataTableType type)
-
-