Package org.apache.iceberg.spark.actions
Class SparkSortStrategy
- java.lang.Object
-
- org.apache.iceberg.actions.BinPackStrategy
-
- org.apache.iceberg.actions.SortStrategy
-
- org.apache.iceberg.spark.actions.SparkSortStrategy
-
- All Implemented Interfaces:
java.io.Serializable
,RewriteStrategy
- Direct Known Subclasses:
SparkZOrderStrategy
public class SparkSortStrategy extends SortStrategy
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String
COMPRESSION_FACTOR
The number of shuffle partitions and consequently the number of output files created by the Spark Sort is based on the size of the input data files used in this rewrite operation.-
Fields inherited from class org.apache.iceberg.actions.BinPackStrategy
DELETE_FILE_THRESHOLD, DELETE_FILE_THRESHOLD_DEFAULT, MAX_FILE_SIZE_BYTES, MAX_FILE_SIZE_DEFAULT_RATIO, MIN_FILE_SIZE_BYTES, MIN_FILE_SIZE_DEFAULT_RATIO, MIN_INPUT_FILES, MIN_INPUT_FILES_DEFAULT, REWRITE_ALL, REWRITE_ALL_DEFAULT
-
-
Constructor Summary
Constructors Constructor Description SparkSortStrategy(Table table, org.apache.spark.sql.SparkSession spark)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected FileScanTaskSetManager
manager()
RewriteStrategy
options(java.util.Map<java.lang.String,java.lang.String> options)
Sets options to be used with this strategyprotected FileRewriteCoordinator
rewriteCoordinator()
java.util.Set<DataFile>
rewriteFiles(java.util.List<FileScanTask> filesToRewrite)
Method which will rewrite files based on this particular RewriteStrategy's algorithm.protected double
sizeEstimateMultiple()
protected org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
sortPlan(org.apache.spark.sql.connector.distributions.Distribution distribution, org.apache.spark.sql.connector.expressions.SortOrder[] ordering, org.apache.spark.sql.catalyst.plans.logical.LogicalPlan plan, org.apache.spark.sql.internal.SQLConf conf)
protected org.apache.spark.sql.SparkSession
spark()
Table
table()
Returns the table being modified by this rewrite strategyprotected SparkTableCache
tableCache()
java.util.Set<java.lang.String>
validOptions()
Returns a set of options which this rewrite strategy can use.-
Methods inherited from class org.apache.iceberg.actions.SortStrategy
name, sortOrder, sortOrder, validateOptions
-
Methods inherited from class org.apache.iceberg.actions.BinPackStrategy
inputFileSize, numOutputFiles, planFileGroups, selectFilesToRewrite, splitSize, targetFileSize, writeMaxFileSize
-
-
-
-
Field Detail
-
COMPRESSION_FACTOR
public static final java.lang.String COMPRESSION_FACTOR
The number of shuffle partitions and consequently the number of output files created by the Spark Sort is based on the size of the input data files used in this rewrite operation. Due to compression, the disk file sizes may not accurately represent the size of files in the output. This parameter lets the user adjust the file size used for estimating actual output data size. A factor greater than 1.0 would generate more files than we would expect based on the on-disk file size. A value less than 1.0 would create fewer files than we would expect due to the on-disk size.- See Also:
- Constant Field Values
-
-
Constructor Detail
-
SparkSortStrategy
public SparkSortStrategy(Table table, org.apache.spark.sql.SparkSession spark)
-
-
Method Detail
-
table
public Table table()
Description copied from interface:RewriteStrategy
Returns the table being modified by this rewrite strategy
-
validOptions
public java.util.Set<java.lang.String> validOptions()
Description copied from interface:RewriteStrategy
Returns a set of options which this rewrite strategy can use. This is an allowed-list and any options not specified here will be rejected at runtime.- Specified by:
validOptions
in interfaceRewriteStrategy
- Overrides:
validOptions
in classSortStrategy
-
options
public RewriteStrategy options(java.util.Map<java.lang.String,java.lang.String> options)
Description copied from interface:RewriteStrategy
Sets options to be used with this strategy- Specified by:
options
in interfaceRewriteStrategy
- Overrides:
options
in classSortStrategy
-
rewriteFiles
public java.util.Set<DataFile> rewriteFiles(java.util.List<FileScanTask> filesToRewrite)
Description copied from interface:RewriteStrategy
Method which will rewrite files based on this particular RewriteStrategy's algorithm. This will most likely be Action framework specific (Spark/Presto/Flink ....).- Parameters:
filesToRewrite
- a group of files to be rewritten together- Returns:
- a set of newly written files
-
spark
protected org.apache.spark.sql.SparkSession spark()
-
sortPlan
protected org.apache.spark.sql.catalyst.plans.logical.LogicalPlan sortPlan(org.apache.spark.sql.connector.distributions.Distribution distribution, org.apache.spark.sql.connector.expressions.SortOrder[] ordering, org.apache.spark.sql.catalyst.plans.logical.LogicalPlan plan, org.apache.spark.sql.internal.SQLConf conf)
-
sizeEstimateMultiple
protected double sizeEstimateMultiple()
-
tableCache
protected SparkTableCache tableCache()
-
manager
protected FileScanTaskSetManager manager()
-
rewriteCoordinator
protected FileRewriteCoordinator rewriteCoordinator()
-
-