Package org.apache.iceberg.actions
Class SizeBasedDataRewriter
java.lang.Object
org.apache.iceberg.actions.SizeBasedFileRewriter<FileScanTask,DataFile>
org.apache.iceberg.actions.SizeBasedDataRewriter
- All Implemented Interfaces:
FileRewriter<FileScanTask,
DataFile>
-
Field Summary
Modifier and TypeFieldDescriptionstatic final String
The minimum number of deletes that needs to be associated with a data file for it to be considered for rewriting.static final int
Fields inherited from class org.apache.iceberg.actions.SizeBasedFileRewriter
MAX_FILE_GROUP_SIZE_BYTES, MAX_FILE_GROUP_SIZE_BYTES_DEFAULT, MAX_FILE_SIZE_BYTES, MAX_FILE_SIZE_DEFAULT_RATIO, MIN_FILE_SIZE_BYTES, MIN_FILE_SIZE_DEFAULT_RATIO, MIN_INPUT_FILES, MIN_INPUT_FILES_DEFAULT, REWRITE_ALL, REWRITE_ALL_DEFAULT, TARGET_FILE_SIZE_BYTES
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionprotected long
protected Iterable
<List<FileScanTask>> filterFileGroups
(List<List<FileScanTask>> groups) protected Iterable
<FileScanTask> filterFiles
(Iterable<FileScanTask> tasks) void
Initializes this rewriter using provided options.Returns a set of supported options for this rewriter.Methods inherited from class org.apache.iceberg.actions.SizeBasedFileRewriter
enoughContent, enoughInputFiles, inputSize, numOutputFiles, outputSpec, outputSpecId, planFileGroups, splitSize, table, tooMuchContent, writeMaxFileSize, wronglySized
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.apache.iceberg.actions.FileRewriter
description, rewrite
-
Field Details
-
DELETE_FILE_THRESHOLD
The minimum number of deletes that needs to be associated with a data file for it to be considered for rewriting. If a data file has this number of deletes or more, it will be rewritten regardless of its file size determined bySizeBasedFileRewriter.MIN_FILE_SIZE_BYTES
andSizeBasedFileRewriter.MAX_FILE_SIZE_BYTES
. If a file group contains a file that satisfies this condition, the file group will be rewritten regardless of the number of files in the file group determined bySizeBasedFileRewriter.MIN_INPUT_FILES
.Defaults to Integer.MAX_VALUE, which means this feature is not enabled by default.
- See Also:
-
DELETE_FILE_THRESHOLD_DEFAULT
public static final int DELETE_FILE_THRESHOLD_DEFAULT- See Also:
-
-
Constructor Details
-
SizeBasedDataRewriter
-
-
Method Details
-
validOptions
Description copied from interface:FileRewriter
Returns a set of supported options for this rewriter. Only options specified in this list will be accepted at runtime. Any other options will be rejected.- Specified by:
validOptions
in interfaceFileRewriter<FileScanTask,
DataFile> - Overrides:
validOptions
in classSizeBasedFileRewriter<FileScanTask,
DataFile>
-
init
Description copied from interface:FileRewriter
Initializes this rewriter using provided options.- Specified by:
init
in interfaceFileRewriter<FileScanTask,
DataFile> - Overrides:
init
in classSizeBasedFileRewriter<FileScanTask,
DataFile> - Parameters:
options
- options to initialize this rewriter
-
filterFiles
- Specified by:
filterFiles
in classSizeBasedFileRewriter<FileScanTask,
DataFile>
-
filterFileGroups
- Specified by:
filterFileGroups
in classSizeBasedFileRewriter<FileScanTask,
DataFile>
-
defaultTargetFileSize
protected long defaultTargetFileSize()- Specified by:
defaultTargetFileSize
in classSizeBasedFileRewriter<FileScanTask,
DataFile>
-