Package org.apache.iceberg.actions
Interface RewriteDataFiles
- 
- All Superinterfaces:
- Action<RewriteDataFiles,RewriteDataFiles.Result>,- SnapshotUpdate<RewriteDataFiles,RewriteDataFiles.Result>
 - All Known Implementing Classes:
- RewriteDataFilesSparkAction
 
 public interface RewriteDataFiles extends SnapshotUpdate<RewriteDataFiles,RewriteDataFiles.Result> An action for rewriting data files according to a rewrite strategy. Generally used for optimizing the sizing and layout of data files within a table.
- 
- 
Nested Class SummaryNested Classes Modifier and Type Interface Description static interfaceRewriteDataFiles.FileGroupInfoA description of a file group, when it was processed, and within which partition.static interfaceRewriteDataFiles.FileGroupRewriteResultFor a particular file group, the number of files which are newly created and the number of files which were formerly part of the table but have been rewritten.static interfaceRewriteDataFiles.ResultA map of file group information to the results of rewriting that file group.
 - 
Field SummaryFields Modifier and Type Field Description static java.lang.StringMAX_CONCURRENT_FILE_GROUP_REWRITESThe max number of file groups to be simultaneously rewritten by the rewrite strategy.static intMAX_CONCURRENT_FILE_GROUP_REWRITES_DEFAULTstatic java.lang.StringMAX_FILE_GROUP_SIZE_BYTESThe entire rewrite operation is broken down into pieces based on partitioning and within partitions based on size into groups.static longMAX_FILE_GROUP_SIZE_BYTES_DEFAULTstatic java.lang.StringPARTIAL_PROGRESS_ENABLEDEnable committing groups of files (see max-file-group-size-bytes) prior to the entire rewrite completing.static booleanPARTIAL_PROGRESS_ENABLED_DEFAULTstatic java.lang.StringPARTIAL_PROGRESS_MAX_COMMITSThe maximum amount of Iceberg commits that this rewrite is allowed to produce if partial progress is enabled.static intPARTIAL_PROGRESS_MAX_COMMITS_DEFAULTstatic java.lang.StringREWRITE_JOB_ORDERForces the rewrite job order based on the value.static java.lang.StringREWRITE_JOB_ORDER_DEFAULTstatic java.lang.StringTARGET_FILE_SIZE_BYTESThe output file size that this rewrite strategy will attempt to generate when rewriting files.static java.lang.StringUSE_STARTING_SEQUENCE_NUMBERIf the compaction should use the sequence number of the snapshot at compaction start time for new data files, instead of using the sequence number of the newly produced snapshot.static booleanUSE_STARTING_SEQUENCE_NUMBER_DEFAULT
 - 
Method SummaryAll Methods Instance Methods Abstract Methods Default Methods Modifier and Type Method Description default RewriteDataFilesbinPack()Choose BINPACK as a strategy for this rewrite operationRewriteDataFilesfilter(Expression expression)A user provided filter for determining which files will be considered by the rewrite strategy.default RewriteDataFilessort()Choose SORT as a strategy for this rewrite operation using the table's sortOrderdefault RewriteDataFilessort(SortOrder sortOrder)Choose SORT as a strategy for this rewrite operation and manually specify the sortOrder to usedefault RewriteDataFileszOrder(java.lang.String... columns)Choose Z-ORDER as a strategy for this rewrite operation with a specified list of columns to use- 
Methods inherited from interface org.apache.iceberg.actions.SnapshotUpdatesnapshotProperty
 
- 
 
- 
- 
- 
Field Detail- 
PARTIAL_PROGRESS_ENABLEDstatic final java.lang.String PARTIAL_PROGRESS_ENABLED Enable committing groups of files (see max-file-group-size-bytes) prior to the entire rewrite completing. This will produce additional commits but allow for progress even if some groups fail to commit. This setting will not change the correctness of the rewrite operation as file groups can be compacted independently.The default is false, which produces a single commit when the entire job has completed. - See Also:
- Constant Field Values
 
 - 
PARTIAL_PROGRESS_ENABLED_DEFAULTstatic final boolean PARTIAL_PROGRESS_ENABLED_DEFAULT - See Also:
- Constant Field Values
 
 - 
PARTIAL_PROGRESS_MAX_COMMITSstatic final java.lang.String PARTIAL_PROGRESS_MAX_COMMITS The maximum amount of Iceberg commits that this rewrite is allowed to produce if partial progress is enabled. This setting has no effect if partial progress is disabled.- See Also:
- Constant Field Values
 
 - 
PARTIAL_PROGRESS_MAX_COMMITS_DEFAULTstatic final int PARTIAL_PROGRESS_MAX_COMMITS_DEFAULT - See Also:
- Constant Field Values
 
 - 
MAX_FILE_GROUP_SIZE_BYTESstatic final java.lang.String MAX_FILE_GROUP_SIZE_BYTES The entire rewrite operation is broken down into pieces based on partitioning and within partitions based on size into groups. These sub-units of the rewrite are referred to as file groups. The largest amount of data that should be compacted in a single group is controlled byMAX_FILE_GROUP_SIZE_BYTES. This helps with breaking down the rewriting of very large partitions which may not be rewritable otherwise due to the resource constraints of the cluster. For example a sort based rewrite may not scale to terabyte sized partitions, those partitions need to be worked on in small subsections to avoid exhaustion of resources.When grouping files, the underlying rewrite strategy will use this value as to limit the files which will be included in a single file group. A group will be processed by a single framework "action". For example, in Spark this means that each group would be rewritten in its own Spark action. A group will never contain files for multiple output partitions. - See Also:
- Constant Field Values
 
 - 
MAX_FILE_GROUP_SIZE_BYTES_DEFAULTstatic final long MAX_FILE_GROUP_SIZE_BYTES_DEFAULT - See Also:
- Constant Field Values
 
 - 
MAX_CONCURRENT_FILE_GROUP_REWRITESstatic final java.lang.String MAX_CONCURRENT_FILE_GROUP_REWRITES The max number of file groups to be simultaneously rewritten by the rewrite strategy. The structure and contents of the group is determined by the rewrite strategy. Each file group will be rewritten independently and asynchronously.- See Also:
- Constant Field Values
 
 - 
MAX_CONCURRENT_FILE_GROUP_REWRITES_DEFAULTstatic final int MAX_CONCURRENT_FILE_GROUP_REWRITES_DEFAULT - See Also:
- Constant Field Values
 
 - 
TARGET_FILE_SIZE_BYTESstatic final java.lang.String TARGET_FILE_SIZE_BYTES The output file size that this rewrite strategy will attempt to generate when rewriting files. By default this will use the "write.target-file-size-bytes value" in the table properties of the table being updated.- See Also:
- Constant Field Values
 
 - 
USE_STARTING_SEQUENCE_NUMBERstatic final java.lang.String USE_STARTING_SEQUENCE_NUMBER If the compaction should use the sequence number of the snapshot at compaction start time for new data files, instead of using the sequence number of the newly produced snapshot.This avoids commit conflicts with updates that add newer equality deletes at a higher sequence number. Defaults to true. - See Also:
- Constant Field Values
 
 - 
USE_STARTING_SEQUENCE_NUMBER_DEFAULTstatic final boolean USE_STARTING_SEQUENCE_NUMBER_DEFAULT - See Also:
- Constant Field Values
 
 - 
REWRITE_JOB_ORDERstatic final java.lang.String REWRITE_JOB_ORDER Forces the rewrite job order based on the value.- If rewrite-job-order=bytes-asc, then rewrite the smallest job groups first.
- If rewrite-job-order=bytes-desc, then rewrite the largest job groups first.
- If rewrite-job-order=files-asc, then rewrite the job groups with the least files first.
- If rewrite-job-order=files-desc, then rewrite the job groups with the most files first.
- If rewrite-job-order=none, then rewrite job groups in the order they were planned (no specific ordering).
 Defaults to none. - See Also:
- Constant Field Values
 
 - 
REWRITE_JOB_ORDER_DEFAULTstatic final java.lang.String REWRITE_JOB_ORDER_DEFAULT 
 
- 
 - 
Method Detail- 
binPackdefault RewriteDataFiles binPack() Choose BINPACK as a strategy for this rewrite operation- Returns:
- this for method chaining
 
 - 
sortdefault RewriteDataFiles sort() Choose SORT as a strategy for this rewrite operation using the table's sortOrder- Returns:
- this for method chaining
 
 - 
sortdefault RewriteDataFiles sort(SortOrder sortOrder) Choose SORT as a strategy for this rewrite operation and manually specify the sortOrder to use- Parameters:
- sortOrder- user defined sortOrder
- Returns:
- this for method chaining
 
 - 
zOrderdefault RewriteDataFiles zOrder(java.lang.String... columns) Choose Z-ORDER as a strategy for this rewrite operation with a specified list of columns to use- Parameters:
- columns- Columns to be used to generate Z-Values
- Returns:
- this for method chaining
 
 - 
filterRewriteDataFiles filter(Expression expression) A user provided filter for determining which files will be considered by the rewrite strategy. This will be used in addition to whatever rules the rewrite strategy generates. For example this would be used for providing a restriction to only run rewrite on a specific partition.- Parameters:
- expression- An iceberg expression used to determine which files will be considered for rewriting
- Returns:
- this for chaining
 
 
- 
 
-