Package org.apache.iceberg.actions
Class SizeBasedDataRewriter
java.lang.Object
org.apache.iceberg.actions.SizeBasedFileRewriter<FileScanTask,DataFile>
org.apache.iceberg.actions.SizeBasedDataRewriter
- All Implemented Interfaces:
FileRewriter<FileScanTask,DataFile>
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final StringThe minimum number of deletes that needs to be associated with a data file for it to be considered for rewriting.static final intFields inherited from class org.apache.iceberg.actions.SizeBasedFileRewriter
MAX_FILE_GROUP_SIZE_BYTES, MAX_FILE_GROUP_SIZE_BYTES_DEFAULT, MAX_FILE_SIZE_BYTES, MAX_FILE_SIZE_DEFAULT_RATIO, MIN_FILE_SIZE_BYTES, MIN_FILE_SIZE_DEFAULT_RATIO, MIN_INPUT_FILES, MIN_INPUT_FILES_DEFAULT, REWRITE_ALL, REWRITE_ALL_DEFAULT, TARGET_FILE_SIZE_BYTES -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected longprotected Iterable<List<FileScanTask>> filterFileGroups(List<List<FileScanTask>> groups) protected Iterable<FileScanTask> filterFiles(Iterable<FileScanTask> tasks) voidInitializes this rewriter using provided options.Returns a set of supported options for this rewriter.Methods inherited from class org.apache.iceberg.actions.SizeBasedFileRewriter
enoughContent, enoughInputFiles, inputSize, numOutputFiles, outputSpec, outputSpecId, planFileGroups, splitSize, table, tooMuchContent, writeMaxFileSize, wronglySizedMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.iceberg.actions.FileRewriter
description, rewrite
-
Field Details
-
DELETE_FILE_THRESHOLD
The minimum number of deletes that needs to be associated with a data file for it to be considered for rewriting. If a data file has this number of deletes or more, it will be rewritten regardless of its file size determined bySizeBasedFileRewriter.MIN_FILE_SIZE_BYTESandSizeBasedFileRewriter.MAX_FILE_SIZE_BYTES. If a file group contains a file that satisfies this condition, the file group will be rewritten regardless of the number of files in the file group determined bySizeBasedFileRewriter.MIN_INPUT_FILES.Defaults to Integer.MAX_VALUE, which means this feature is not enabled by default.
- See Also:
-
DELETE_FILE_THRESHOLD_DEFAULT
public static final int DELETE_FILE_THRESHOLD_DEFAULT- See Also:
-
-
Constructor Details
-
SizeBasedDataRewriter
-
-
Method Details
-
validOptions
Description copied from interface:FileRewriterReturns a set of supported options for this rewriter. Only options specified in this list will be accepted at runtime. Any other options will be rejected.- Specified by:
validOptionsin interfaceFileRewriter<FileScanTask,DataFile> - Overrides:
validOptionsin classSizeBasedFileRewriter<FileScanTask,DataFile>
-
init
Description copied from interface:FileRewriterInitializes this rewriter using provided options.- Specified by:
initin interfaceFileRewriter<FileScanTask,DataFile> - Overrides:
initin classSizeBasedFileRewriter<FileScanTask,DataFile> - Parameters:
options- options to initialize this rewriter
-
filterFiles
- Specified by:
filterFilesin classSizeBasedFileRewriter<FileScanTask,DataFile>
-
filterFileGroups
- Specified by:
filterFileGroupsin classSizeBasedFileRewriter<FileScanTask,DataFile>
-
defaultTargetFileSize
protected long defaultTargetFileSize()- Specified by:
defaultTargetFileSizein classSizeBasedFileRewriter<FileScanTask,DataFile>
-