Package org.apache.iceberg.actions
Class SizeBasedDataRewriter
java.lang.Object
org.apache.iceberg.actions.SizeBasedFileRewriter<FileScanTask,DataFile>
org.apache.iceberg.actions.SizeBasedDataRewriter
- All Implemented Interfaces:
FileRewriter<FileScanTask,DataFile>
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final StringThe minimum number of deletes that needs to be associated with a data file for it to be considered for rewriting.static final intstatic final StringThe minimum deletion ratio that needs to be associated with a data file for it to be considered for rewriting.static final doubleFields inherited from class org.apache.iceberg.actions.SizeBasedFileRewriter
MAX_FILE_GROUP_SIZE_BYTES, MAX_FILE_GROUP_SIZE_BYTES_DEFAULT, MAX_FILE_SIZE_BYTES, MAX_FILE_SIZE_DEFAULT_RATIO, MIN_FILE_SIZE_BYTES, MIN_FILE_SIZE_DEFAULT_RATIO, MIN_INPUT_FILES, MIN_INPUT_FILES_DEFAULT, REWRITE_ALL, REWRITE_ALL_DEFAULT, TARGET_FILE_SIZE_BYTES -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected longprotected Iterable<List<FileScanTask>>filterFileGroups(List<List<FileScanTask>> groups) protected Iterable<FileScanTask>filterFiles(Iterable<FileScanTask> tasks) voidInitializes this rewriter using provided options.Returns a set of supported options for this rewriter.Methods inherited from class org.apache.iceberg.actions.SizeBasedFileRewriter
enoughContent, enoughInputFiles, inputSize, numOutputFiles, outputSpec, outputSpecId, planFileGroups, splitSize, table, tooMuchContent, writeMaxFileSize, wronglySizedMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.iceberg.actions.FileRewriter
description, rewrite
-
Field Details
-
DELETE_FILE_THRESHOLD
The minimum number of deletes that needs to be associated with a data file for it to be considered for rewriting. If a data file has this number of deletes or more, it will be rewritten regardless of its file size determined bySizeBasedFileRewriter.MIN_FILE_SIZE_BYTESandSizeBasedFileRewriter.MAX_FILE_SIZE_BYTES. If a file group contains a file that satisfies this condition, the file group will be rewritten regardless of the number of files in the file group determined bySizeBasedFileRewriter.MIN_INPUT_FILES.Defaults to Integer.MAX_VALUE, which means this feature is not enabled by default.
- See Also:
-
DELETE_FILE_THRESHOLD_DEFAULT
public static final int DELETE_FILE_THRESHOLD_DEFAULT- See Also:
-
DELETE_RATIO_THRESHOLD
The minimum deletion ratio that needs to be associated with a data file for it to be considered for rewriting. If the deletion ratio of a data file is greater than or equal to this value, it will be rewritten regardless of its file size determined bySizeBasedFileRewriter.MIN_FILE_SIZE_BYTESandSizeBasedFileRewriter.MAX_FILE_SIZE_BYTES. If a file group contains a file that satisfies this condition, the file group will be rewritten regardless of the number of files in the file group determined bySizeBasedFileRewriter.MIN_INPUT_FILES.Defaults to 0.3, which means that if the deletion ratio of a file reaches or exceeds 30%, it may trigger the rewriting operation.
- See Also:
-
DELETE_RATIO_THRESHOLD_DEFAULT
public static final double DELETE_RATIO_THRESHOLD_DEFAULT- See Also:
-
-
Constructor Details
-
SizeBasedDataRewriter
-
-
Method Details
-
validOptions
Description copied from interface:FileRewriterReturns a set of supported options for this rewriter. Only options specified in this list will be accepted at runtime. Any other options will be rejected.- Specified by:
validOptionsin interfaceFileRewriter<FileScanTask,DataFile> - Overrides:
validOptionsin classSizeBasedFileRewriter<FileScanTask,DataFile>
-
init
Description copied from interface:FileRewriterInitializes this rewriter using provided options.- Specified by:
initin interfaceFileRewriter<FileScanTask,DataFile> - Overrides:
initin classSizeBasedFileRewriter<FileScanTask,DataFile> - Parameters:
options- options to initialize this rewriter
-
filterFiles
- Specified by:
filterFilesin classSizeBasedFileRewriter<FileScanTask,DataFile>
-
filterFileGroups
- Specified by:
filterFileGroupsin classSizeBasedFileRewriter<FileScanTask,DataFile>
-
defaultTargetFileSize
protected long defaultTargetFileSize()- Specified by:
defaultTargetFileSizein classSizeBasedFileRewriter<FileScanTask,DataFile>
-