Package org.apache.iceberg.actions
Class BinPackRewriteFilePlanner
java.lang.Object
org.apache.iceberg.actions.SizeBasedFileRewritePlanner<RewriteDataFiles.FileGroupInfo,FileScanTask,DataFile,RewriteFileGroup>
org.apache.iceberg.actions.BinPackRewriteFilePlanner
- All Implemented Interfaces:
FileRewritePlanner<RewriteDataFiles.FileGroupInfo,
FileScanTask, DataFile, RewriteFileGroup>
public class BinPackRewriteFilePlanner
extends SizeBasedFileRewritePlanner<RewriteDataFiles.FileGroupInfo,FileScanTask,DataFile,RewriteFileGroup>
Groups specified data files in the
Table
into RewriteFileGroup
s. The files are
grouped by partitions based on their size using fix sized bins. Extends SizeBasedFileRewritePlanner
with delete file number and delete ratio thresholds and job RewriteDataFiles.REWRITE_JOB_ORDER
handling.-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.iceberg.actions.SizeBasedFileRewritePlanner
SizeBasedFileRewritePlanner.RewriteExecutionContext
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final String
The minimum number of deletes that needs to be associated with a data file for it to be considered for rewriting.static final int
static final String
The ratio of the deleted rows in a data file for it to be considered for rewriting.static final double
Fields inherited from class org.apache.iceberg.actions.SizeBasedFileRewritePlanner
MAX_FILE_GROUP_SIZE_BYTES, MAX_FILE_GROUP_SIZE_BYTES_DEFAULT, MAX_FILE_SIZE_BYTES, MAX_FILE_SIZE_DEFAULT_RATIO, MIN_FILE_SIZE_BYTES, MIN_FILE_SIZE_DEFAULT_RATIO, MIN_INPUT_FILES, MIN_INPUT_FILES_DEFAULT, REWRITE_ALL, REWRITE_ALL_DEFAULT, TARGET_FILE_SIZE_BYTES
-
Constructor Summary
ConstructorsConstructorDescriptionBinPackRewriteFilePlanner
(Table table) BinPackRewriteFilePlanner
(Table table, Expression filter) BinPackRewriteFilePlanner
(Table table, Expression filter, Long snapshotId, boolean caseSensitive) Creates the planner for the given table. -
Method Summary
Modifier and TypeMethodDescriptionprotected long
Expected target file size before configuration.protected Iterable<List<FileScanTask>>
filterFileGroups
(List<List<FileScanTask>> groups) Additional filter for groups.protected Iterable<FileScanTask>
filterFiles
(Iterable<FileScanTask> tasks) Additional filter for tasks before grouping.void
Initializes this planner using provided options.plan()
Generates the plan for rewrite.Returns a set of supported options for this planner.Methods inherited from class org.apache.iceberg.actions.SizeBasedFileRewritePlanner
enoughContent, enoughInputFiles, expectedOutputFiles, inputSize, inputSplitSize, outputSpecId, outsideDesiredFileSizeRange, planFileGroups, table, tooMuchContent, writeMaxFileSize
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.apache.iceberg.actions.FileRewritePlanner
description
-
Field Details
-
DELETE_FILE_THRESHOLD
The minimum number of deletes that needs to be associated with a data file for it to be considered for rewriting. If a data file has this number of deletes or more, it will be rewritten regardless of its file size determined bySizeBasedFileRewritePlanner.MIN_FILE_SIZE_BYTES
andSizeBasedFileRewritePlanner.MAX_FILE_SIZE_BYTES
. If a file group contains a file that satisfies this condition, the file group will be rewritten regardless of the number of files in the file group determined bySizeBasedFileRewritePlanner.MIN_INPUT_FILES
.Defaults to Integer.MAX_VALUE, which means this feature is not enabled by default.
- See Also:
-
DELETE_FILE_THRESHOLD_DEFAULT
public static final int DELETE_FILE_THRESHOLD_DEFAULT- See Also:
-
DELETE_RATIO_THRESHOLD
The ratio of the deleted rows in a data file for it to be considered for rewriting. If the deletion ratio of a data file is greater than or equal to this value, it will be rewritten regardless of its file size determined bySizeBasedFileRewritePlanner.MIN_FILE_SIZE_BYTES
andSizeBasedFileRewritePlanner.MAX_FILE_SIZE_BYTES
. If a file group contains a file that satisfies this condition, the file group will be rewritten regardless of the number of files in the file group determined bySizeBasedFileRewritePlanner.MIN_INPUT_FILES
.Defaults to 0.3, which means that if the number of deleted records in a file reaches or exceeds 30%, it will trigger the rewriting operation.
- See Also:
-
DELETE_RATIO_THRESHOLD_DEFAULT
public static final double DELETE_RATIO_THRESHOLD_DEFAULT- See Also:
-
-
Constructor Details
-
BinPackRewriteFilePlanner
-
BinPackRewriteFilePlanner
-
BinPackRewriteFilePlanner
public BinPackRewriteFilePlanner(Table table, Expression filter, Long snapshotId, boolean caseSensitive) Creates the planner for the given table.- Parameters:
table
- to plan forfilter
- used to remove files from the plansnapshotId
- a snapshot ID used for planning and as the starting snapshot id for commit validation when replacing the filescaseSensitive
- property used for scanning
-
-
Method Details
-
validOptions
Description copied from interface:FileRewritePlanner
Returns a set of supported options for this planner. Only options specified in this list will be accepted at runtime. Any other options will be rejected.- Specified by:
validOptions
in interfaceFileRewritePlanner<RewriteDataFiles.FileGroupInfo,
FileScanTask, DataFile, RewriteFileGroup> - Overrides:
validOptions
in classSizeBasedFileRewritePlanner<RewriteDataFiles.FileGroupInfo,
FileScanTask, DataFile, RewriteFileGroup>
-
init
Description copied from interface:FileRewritePlanner
Initializes this planner using provided options.- Specified by:
init
in interfaceFileRewritePlanner<RewriteDataFiles.FileGroupInfo,
FileScanTask, DataFile, RewriteFileGroup> - Overrides:
init
in classSizeBasedFileRewritePlanner<RewriteDataFiles.FileGroupInfo,
FileScanTask, DataFile, RewriteFileGroup> - Parameters:
options
- options to initialize this planner
-
filterFiles
Description copied from class:SizeBasedFileRewritePlanner
Additional filter for tasks before grouping.- Specified by:
filterFiles
in classSizeBasedFileRewritePlanner<RewriteDataFiles.FileGroupInfo,
FileScanTask, DataFile, RewriteFileGroup>
-
filterFileGroups
Description copied from class:SizeBasedFileRewritePlanner
Additional filter for groups.- Specified by:
filterFileGroups
in classSizeBasedFileRewritePlanner<RewriteDataFiles.FileGroupInfo,
FileScanTask, DataFile, RewriteFileGroup>
-
defaultTargetFileSize
protected long defaultTargetFileSize()Description copied from class:SizeBasedFileRewritePlanner
Expected target file size before configuration.- Specified by:
defaultTargetFileSize
in classSizeBasedFileRewritePlanner<RewriteDataFiles.FileGroupInfo,
FileScanTask, DataFile, RewriteFileGroup>
-
plan
public FileRewritePlan<RewriteDataFiles.FileGroupInfo,FileScanTask, plan()DataFile, RewriteFileGroup> Description copied from interface:FileRewritePlanner
Generates the plan for rewrite.- Returns:
- the generated plan which could be executed during the compaction
-