Package org.apache.iceberg.spark.actions
Class SparkSortStrategy
- java.lang.Object
-
- org.apache.iceberg.actions.BinPackStrategy
-
- org.apache.iceberg.actions.SortStrategy
-
- org.apache.iceberg.spark.actions.SparkSortStrategy
-
- All Implemented Interfaces:
java.io.Serializable,RewriteStrategy
- Direct Known Subclasses:
SparkZOrderStrategy
public class SparkSortStrategy extends SortStrategy
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static java.lang.StringCOMPRESSION_FACTORThe number of shuffle partitions and consequently the number of output files created by the Spark Sort is based on the size of the input data files used in this rewrite operation.-
Fields inherited from class org.apache.iceberg.actions.BinPackStrategy
DELETE_FILE_THRESHOLD, DELETE_FILE_THRESHOLD_DEFAULT, MAX_FILE_SIZE_BYTES, MAX_FILE_SIZE_DEFAULT_RATIO, MIN_FILE_SIZE_BYTES, MIN_FILE_SIZE_DEFAULT_RATIO, MIN_INPUT_FILES, MIN_INPUT_FILES_DEFAULT, REWRITE_ALL, REWRITE_ALL_DEFAULT
-
-
Constructor Summary
Constructors Constructor Description SparkSortStrategy(Table table, org.apache.spark.sql.SparkSession spark)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected FileScanTaskSetManagermanager()RewriteStrategyoptions(java.util.Map<java.lang.String,java.lang.String> options)Sets options to be used with this strategyprotected FileRewriteCoordinatorrewriteCoordinator()java.util.Set<DataFile>rewriteFiles(java.util.List<FileScanTask> filesToRewrite)Method which will rewrite files based on this particular RewriteStrategy's algorithm.protected doublesizeEstimateMultiple()protected org.apache.spark.sql.catalyst.plans.logical.LogicalPlansortPlan(org.apache.spark.sql.connector.distributions.Distribution distribution, org.apache.spark.sql.connector.expressions.SortOrder[] ordering, org.apache.spark.sql.catalyst.plans.logical.LogicalPlan plan, org.apache.spark.sql.internal.SQLConf conf)protected org.apache.spark.sql.SparkSessionspark()Tabletable()Returns the table being modified by this rewrite strategyprotected SparkTableCachetableCache()java.util.Set<java.lang.String>validOptions()Returns a set of options which this rewrite strategy can use.-
Methods inherited from class org.apache.iceberg.actions.SortStrategy
name, sortOrder, sortOrder, validateOptions
-
Methods inherited from class org.apache.iceberg.actions.BinPackStrategy
inputFileSize, numOutputFiles, planFileGroups, selectFilesToRewrite, splitSize, targetFileSize, writeMaxFileSize
-
-
-
-
Field Detail
-
COMPRESSION_FACTOR
public static final java.lang.String COMPRESSION_FACTOR
The number of shuffle partitions and consequently the number of output files created by the Spark Sort is based on the size of the input data files used in this rewrite operation. Due to compression, the disk file sizes may not accurately represent the size of files in the output. This parameter lets the user adjust the file size used for estimating actual output data size. A factor greater than 1.0 would generate more files than we would expect based on the on-disk file size. A value less than 1.0 would create fewer files than we would expect due to the on-disk size.- See Also:
- Constant Field Values
-
-
Constructor Detail
-
SparkSortStrategy
public SparkSortStrategy(Table table, org.apache.spark.sql.SparkSession spark)
-
-
Method Detail
-
table
public Table table()
Description copied from interface:RewriteStrategyReturns the table being modified by this rewrite strategy
-
validOptions
public java.util.Set<java.lang.String> validOptions()
Description copied from interface:RewriteStrategyReturns a set of options which this rewrite strategy can use. This is an allowed-list and any options not specified here will be rejected at runtime.- Specified by:
validOptionsin interfaceRewriteStrategy- Overrides:
validOptionsin classSortStrategy
-
options
public RewriteStrategy options(java.util.Map<java.lang.String,java.lang.String> options)
Description copied from interface:RewriteStrategySets options to be used with this strategy- Specified by:
optionsin interfaceRewriteStrategy- Overrides:
optionsin classSortStrategy
-
rewriteFiles
public java.util.Set<DataFile> rewriteFiles(java.util.List<FileScanTask> filesToRewrite)
Description copied from interface:RewriteStrategyMethod which will rewrite files based on this particular RewriteStrategy's algorithm. This will most likely be Action framework specific (Spark/Presto/Flink ....).- Parameters:
filesToRewrite- a group of files to be rewritten together- Returns:
- a set of newly written files
-
spark
protected org.apache.spark.sql.SparkSession spark()
-
sortPlan
protected org.apache.spark.sql.catalyst.plans.logical.LogicalPlan sortPlan(org.apache.spark.sql.connector.distributions.Distribution distribution, org.apache.spark.sql.connector.expressions.SortOrder[] ordering, org.apache.spark.sql.catalyst.plans.logical.LogicalPlan plan, org.apache.spark.sql.internal.SQLConf conf)
-
sizeEstimateMultiple
protected double sizeEstimateMultiple()
-
tableCache
protected SparkTableCache tableCache()
-
manager
protected FileScanTaskSetManager manager()
-
rewriteCoordinator
protected FileRewriteCoordinator rewriteCoordinator()
-
-