Package org.apache.iceberg.actions
Interface RewriteStrategy
-
- All Superinterfaces:
java.io.Serializable
- All Known Implementing Classes:
BinPackStrategy
,SortStrategy
,Spark3BinPackStrategy
,Spark3SortStrategy
public interface RewriteStrategy extends java.io.Serializable
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description java.lang.String
name()
Returns the name of this rewrite strategyRewriteStrategy
options(java.util.Map<java.lang.String,java.lang.String> options)
Sets options to be used with this strategyjava.lang.Iterable<java.util.List<FileScanTask>>
planFileGroups(java.lang.Iterable<FileScanTask> dataFiles)
Groups file scans into lists which will be processed in a single executable unit.java.util.Set<DataFile>
rewriteFiles(java.util.List<FileScanTask> filesToRewrite)
Method which will rewrite files based on this particular RewriteStrategy's algorithm.java.lang.Iterable<FileScanTask>
selectFilesToRewrite(java.lang.Iterable<FileScanTask> dataFiles)
Selects files which this strategy believes are valid targets to be rewritten.Table
table()
Returns the table being modified by this rewrite strategyjava.util.Set<java.lang.String>
validOptions()
Returns a set of options which this rewrite strategy can use.
-
-
-
Method Detail
-
name
java.lang.String name()
Returns the name of this rewrite strategy
-
table
Table table()
Returns the table being modified by this rewrite strategy
-
validOptions
java.util.Set<java.lang.String> validOptions()
Returns a set of options which this rewrite strategy can use. This is an allowed-list and any options not specified here will be rejected at runtime.
-
options
RewriteStrategy options(java.util.Map<java.lang.String,java.lang.String> options)
Sets options to be used with this strategy
-
selectFilesToRewrite
java.lang.Iterable<FileScanTask> selectFilesToRewrite(java.lang.Iterable<FileScanTask> dataFiles)
Selects files which this strategy believes are valid targets to be rewritten.- Parameters:
dataFiles
- iterable of FileScanTasks for files in a given partition- Returns:
- iterable containing only FileScanTasks to be rewritten
-
planFileGroups
java.lang.Iterable<java.util.List<FileScanTask>> planFileGroups(java.lang.Iterable<FileScanTask> dataFiles)
Groups file scans into lists which will be processed in a single executable unit. Each group will end up being committed as an independent set of changes. This creates the jobs which will eventually be run as by the underlying Action.- Parameters:
dataFiles
- iterable of FileScanTasks to be rewritten- Returns:
- iterable of lists of FileScanTasks which will be processed together
-
rewriteFiles
java.util.Set<DataFile> rewriteFiles(java.util.List<FileScanTask> filesToRewrite)
Method which will rewrite files based on this particular RewriteStrategy's algorithm. This will most likely be Action framework specific (Spark/Presto/Flink ....).- Parameters:
filesToRewrite
- a group of files to be rewritten together- Returns:
- a set of newly written files
-
-