Package org.apache.iceberg.actions
Class SortStrategy
- java.lang.Object
-
- org.apache.iceberg.actions.BinPackStrategy
-
- org.apache.iceberg.actions.SortStrategy
-
- All Implemented Interfaces:
java.io.Serializable
,RewriteStrategy
- Direct Known Subclasses:
Spark3SortStrategy
public abstract class SortStrategy extends BinPackStrategy
A rewrite strategy for data files which aims to reorder data with data files to optimally lay them out in relation to a column. For example, if the Sort strategy is used on a set of files which is ordered by column x and original has files File A (x: 0 - 50), File B ( x: 10 - 40) and File C ( x: 30 - 60), this Strategy will attempt to rewrite those files into File A' (x: 0-20), File B' (x: 21 - 40), File C' (x: 41 - 60).Currently the there is no file overlap detection and we will rewrite all files if
REWRITE_ALL
is true (default: false). If this property is disabled any files that would be chosen byBinPackStrategy
will be rewrite candidates.In the future other algorithms for determining files to rewrite will be provided.
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String
REWRITE_ALL
Rewrites all files, regardless of their size.static boolean
REWRITE_ALL_DEFAULT
-
Fields inherited from class org.apache.iceberg.actions.BinPackStrategy
DELETE_FILE_THRESHOLD, DELETE_FILE_THRESHOLD_DEFAULT, MAX_FILE_SIZE_BYTES, MAX_FILE_SIZE_DEFAULT_RATIO, MIN_FILE_SIZE_BYTES, MIN_FILE_SIZE_DEFAULT_RATIO, MIN_INPUT_FILES, MIN_INPUT_FILES_DEFAULT
-
-
Constructor Summary
Constructors Constructor Description SortStrategy()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description java.lang.String
name()
Returns the name of this rewrite strategyRewriteStrategy
options(java.util.Map<java.lang.String,java.lang.String> options)
Sets options to be used with this strategyjava.lang.Iterable<java.util.List<FileScanTask>>
planFileGroups(java.lang.Iterable<FileScanTask> dataFiles)
Groups file scans into lists which will be processed in a single executable unit.java.lang.Iterable<FileScanTask>
selectFilesToRewrite(java.lang.Iterable<FileScanTask> dataFiles)
Selects files which this strategy believes are valid targets to be rewritten.protected SortOrder
sortOrder()
SortStrategy
sortOrder(SortOrder order)
Sets the sort order to be used in this strategy when rewriting filesprotected void
validateOptions()
java.util.Set<java.lang.String>
validOptions()
Returns a set of options which this rewrite strategy can use.-
Methods inherited from class org.apache.iceberg.actions.BinPackStrategy
inputFileSize, maxGroupSize, numOutputFiles, splitSize, targetFileSize, writeMaxFileSize
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.iceberg.actions.RewriteStrategy
rewriteFiles, table
-
-
-
-
Field Detail
-
REWRITE_ALL
public static final java.lang.String REWRITE_ALL
Rewrites all files, regardless of their size. Defaults to false, rewriting only mis-sized files;- See Also:
- Constant Field Values
-
REWRITE_ALL_DEFAULT
public static final boolean REWRITE_ALL_DEFAULT
- See Also:
- Constant Field Values
-
-
Method Detail
-
sortOrder
public SortStrategy sortOrder(SortOrder order)
Sets the sort order to be used in this strategy when rewriting files- Parameters:
order
- the order to use- Returns:
- this for method chaining
-
sortOrder
protected SortOrder sortOrder()
-
name
public java.lang.String name()
Description copied from interface:RewriteStrategy
Returns the name of this rewrite strategy- Specified by:
name
in interfaceRewriteStrategy
- Overrides:
name
in classBinPackStrategy
-
validOptions
public java.util.Set<java.lang.String> validOptions()
Description copied from interface:RewriteStrategy
Returns a set of options which this rewrite strategy can use. This is an allowed-list and any options not specified here will be rejected at runtime.- Specified by:
validOptions
in interfaceRewriteStrategy
- Overrides:
validOptions
in classBinPackStrategy
-
options
public RewriteStrategy options(java.util.Map<java.lang.String,java.lang.String> options)
Description copied from interface:RewriteStrategy
Sets options to be used with this strategy- Specified by:
options
in interfaceRewriteStrategy
- Overrides:
options
in classBinPackStrategy
-
selectFilesToRewrite
public java.lang.Iterable<FileScanTask> selectFilesToRewrite(java.lang.Iterable<FileScanTask> dataFiles)
Description copied from interface:RewriteStrategy
Selects files which this strategy believes are valid targets to be rewritten.- Specified by:
selectFilesToRewrite
in interfaceRewriteStrategy
- Overrides:
selectFilesToRewrite
in classBinPackStrategy
- Parameters:
dataFiles
- iterable of FileScanTasks for files in a given partition- Returns:
- iterable containing only FileScanTasks to be rewritten
-
planFileGroups
public java.lang.Iterable<java.util.List<FileScanTask>> planFileGroups(java.lang.Iterable<FileScanTask> dataFiles)
Description copied from interface:RewriteStrategy
Groups file scans into lists which will be processed in a single executable unit. Each group will end up being committed as an independent set of changes. This creates the jobs which will eventually be run as by the underlying Action.- Specified by:
planFileGroups
in interfaceRewriteStrategy
- Overrides:
planFileGroups
in classBinPackStrategy
- Parameters:
dataFiles
- iterable of FileScanTasks to be rewritten- Returns:
- iterable of lists of FileScanTasks which will be processed together
-
validateOptions
protected void validateOptions()
-
-