Package org.apache.iceberg.actions
Class SortStrategy
- java.lang.Object
-
- org.apache.iceberg.actions.BinPackStrategy
-
- org.apache.iceberg.actions.SortStrategy
-
- All Implemented Interfaces:
java.io.Serializable,RewriteStrategy
- Direct Known Subclasses:
Spark3SortStrategy
public abstract class SortStrategy extends BinPackStrategy
A rewrite strategy for data files which aims to reorder data with data files to optimally lay them out in relation to a column. For example, if the Sort strategy is used on a set of files which is ordered by column x and original has files File A (x: 0 - 50), File B ( x: 10 - 40) and File C ( x: 30 - 60), this Strategy will attempt to rewrite those files into File A' (x: 0-20), File B' (x: 21 - 40), File C' (x: 41 - 60).Currently the there is no file overlap detection and we will rewrite all files if
REWRITE_ALLis true (default: false). If this property is disabled any files that would be chosen byBinPackStrategywill be rewrite candidates.In the future other algorithms for determining files to rewrite will be provided.
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static java.lang.StringREWRITE_ALLRewrites all files, regardless of their size.static booleanREWRITE_ALL_DEFAULT-
Fields inherited from class org.apache.iceberg.actions.BinPackStrategy
DELETE_FILE_THRESHOLD, DELETE_FILE_THRESHOLD_DEFAULT, MAX_FILE_SIZE_BYTES, MAX_FILE_SIZE_DEFAULT_RATIO, MIN_FILE_SIZE_BYTES, MIN_FILE_SIZE_DEFAULT_RATIO, MIN_INPUT_FILES, MIN_INPUT_FILES_DEFAULT
-
-
Constructor Summary
Constructors Constructor Description SortStrategy()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description java.lang.Stringname()Returns the name of this rewrite strategyRewriteStrategyoptions(java.util.Map<java.lang.String,java.lang.String> options)Sets options to be used with this strategyjava.lang.Iterable<java.util.List<FileScanTask>>planFileGroups(java.lang.Iterable<FileScanTask> dataFiles)Groups file scans into lists which will be processed in a single executable unit.java.lang.Iterable<FileScanTask>selectFilesToRewrite(java.lang.Iterable<FileScanTask> dataFiles)Selects files which this strategy believes are valid targets to be rewritten.protected SortOrdersortOrder()SortStrategysortOrder(SortOrder order)Sets the sort order to be used in this strategy when rewriting filesprotected voidvalidateOptions()java.util.Set<java.lang.String>validOptions()Returns a set of options which this rewrite strategy can use.-
Methods inherited from class org.apache.iceberg.actions.BinPackStrategy
inputFileSize, maxGroupSize, numOutputFiles, splitSize, targetFileSize, writeMaxFileSize
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.iceberg.actions.RewriteStrategy
rewriteFiles, table
-
-
-
-
Field Detail
-
REWRITE_ALL
public static final java.lang.String REWRITE_ALL
Rewrites all files, regardless of their size. Defaults to false, rewriting only mis-sized files;- See Also:
- Constant Field Values
-
REWRITE_ALL_DEFAULT
public static final boolean REWRITE_ALL_DEFAULT
- See Also:
- Constant Field Values
-
-
Method Detail
-
sortOrder
public SortStrategy sortOrder(SortOrder order)
Sets the sort order to be used in this strategy when rewriting files- Parameters:
order- the order to use- Returns:
- this for method chaining
-
sortOrder
protected SortOrder sortOrder()
-
name
public java.lang.String name()
Description copied from interface:RewriteStrategyReturns the name of this rewrite strategy- Specified by:
namein interfaceRewriteStrategy- Overrides:
namein classBinPackStrategy
-
validOptions
public java.util.Set<java.lang.String> validOptions()
Description copied from interface:RewriteStrategyReturns a set of options which this rewrite strategy can use. This is an allowed-list and any options not specified here will be rejected at runtime.- Specified by:
validOptionsin interfaceRewriteStrategy- Overrides:
validOptionsin classBinPackStrategy
-
options
public RewriteStrategy options(java.util.Map<java.lang.String,java.lang.String> options)
Description copied from interface:RewriteStrategySets options to be used with this strategy- Specified by:
optionsin interfaceRewriteStrategy- Overrides:
optionsin classBinPackStrategy
-
selectFilesToRewrite
public java.lang.Iterable<FileScanTask> selectFilesToRewrite(java.lang.Iterable<FileScanTask> dataFiles)
Description copied from interface:RewriteStrategySelects files which this strategy believes are valid targets to be rewritten.- Specified by:
selectFilesToRewritein interfaceRewriteStrategy- Overrides:
selectFilesToRewritein classBinPackStrategy- Parameters:
dataFiles- iterable of FileScanTasks for files in a given partition- Returns:
- iterable containing only FileScanTasks to be rewritten
-
planFileGroups
public java.lang.Iterable<java.util.List<FileScanTask>> planFileGroups(java.lang.Iterable<FileScanTask> dataFiles)
Description copied from interface:RewriteStrategyGroups file scans into lists which will be processed in a single executable unit. Each group will end up being committed as an independent set of changes. This creates the jobs which will eventually be run as by the underlying Action.- Specified by:
planFileGroupsin interfaceRewriteStrategy- Overrides:
planFileGroupsin classBinPackStrategy- Parameters:
dataFiles- iterable of FileScanTasks to be rewritten- Returns:
- iterable of lists of FileScanTasks which will be processed together
-
validateOptions
protected void validateOptions()
-
-