Package org.apache.iceberg.spark.actions
Class Spark3SortStrategy
- java.lang.Object
- 
- org.apache.iceberg.actions.BinPackStrategy
- 
- org.apache.iceberg.actions.SortStrategy
- 
- org.apache.iceberg.spark.actions.Spark3SortStrategy
 
 
 
- 
- All Implemented Interfaces:
- java.io.Serializable,- RewriteStrategy
 
 public class Spark3SortStrategy extends SortStrategy - See Also:
- Serialized Form
 
- 
- 
Field SummaryFields Modifier and Type Field Description static java.lang.StringCOMPRESSION_FACTORThe number of shuffle partitions and consequently the number of output files created by the Spark Sort is based on the size of the input data files used in this rewrite operation.- 
Fields inherited from class org.apache.iceberg.actions.SortStrategyREWRITE_ALL, REWRITE_ALL_DEFAULT
 - 
Fields inherited from class org.apache.iceberg.actions.BinPackStrategyDELETE_FILE_THRESHOLD, DELETE_FILE_THRESHOLD_DEFAULT, MAX_FILE_SIZE_BYTES, MAX_FILE_SIZE_DEFAULT_RATIO, MIN_FILE_SIZE_BYTES, MIN_FILE_SIZE_DEFAULT_RATIO, MIN_INPUT_FILES, MIN_INPUT_FILES_DEFAULT
 
- 
 - 
Constructor SummaryConstructors Constructor Description Spark3SortStrategy(Table table, org.apache.spark.sql.SparkSession spark)
 - 
Method SummaryAll Methods Instance Methods Concrete Methods Modifier and Type Method Description RewriteStrategyoptions(java.util.Map<java.lang.String,java.lang.String> options)Sets options to be used with this strategyjava.util.Set<DataFile>rewriteFiles(java.util.List<FileScanTask> filesToRewrite)Method which will rewrite files based on this particular RewriteStrategy's algorithm.protected org.apache.spark.sql.catalyst.plans.logical.LogicalPlansortPlan(org.apache.spark.sql.connector.distributions.Distribution distribution, org.apache.spark.sql.connector.expressions.SortOrder[] ordering, org.apache.spark.sql.catalyst.plans.logical.LogicalPlan plan, org.apache.spark.sql.internal.SQLConf conf)protected org.apache.spark.sql.SparkSessionspark()Tabletable()Returns the table being modified by this rewrite strategyjava.util.Set<java.lang.String>validOptions()Returns a set of options which this rewrite strategy can use.- 
Methods inherited from class org.apache.iceberg.actions.SortStrategyname, planFileGroups, selectFilesToRewrite, sortOrder, sortOrder, validateOptions
 - 
Methods inherited from class org.apache.iceberg.actions.BinPackStrategyinputFileSize, maxGroupSize, numOutputFiles, splitSize, targetFileSize, writeMaxFileSize
 
- 
 
- 
- 
- 
Field Detail- 
COMPRESSION_FACTORpublic static final java.lang.String COMPRESSION_FACTOR The number of shuffle partitions and consequently the number of output files created by the Spark Sort is based on the size of the input data files used in this rewrite operation. Due to compression, the disk file sizes may not accurately represent the size of files in the output. This parameter lets the user adjust the file size used for estimating actual output data size. A factor greater than 1.0 would generate more files than we would expect based on the on-disk file size. A value less than 1.0 would create fewer files than we would expect due to the on-disk size.- See Also:
- Constant Field Values
 
 
- 
 - 
Constructor Detail- 
Spark3SortStrategypublic Spark3SortStrategy(Table table, org.apache.spark.sql.SparkSession spark) 
 
- 
 - 
Method Detail- 
tablepublic Table table() Description copied from interface:RewriteStrategyReturns the table being modified by this rewrite strategy
 - 
validOptionspublic java.util.Set<java.lang.String> validOptions() Description copied from interface:RewriteStrategyReturns a set of options which this rewrite strategy can use. This is an allowed-list and any options not specified here will be rejected at runtime.- Specified by:
- validOptionsin interface- RewriteStrategy
- Overrides:
- validOptionsin class- SortStrategy
 
 - 
optionspublic RewriteStrategy options(java.util.Map<java.lang.String,java.lang.String> options) Description copied from interface:RewriteStrategySets options to be used with this strategy- Specified by:
- optionsin interface- RewriteStrategy
- Overrides:
- optionsin class- SortStrategy
 
 - 
rewriteFilespublic java.util.Set<DataFile> rewriteFiles(java.util.List<FileScanTask> filesToRewrite) Description copied from interface:RewriteStrategyMethod which will rewrite files based on this particular RewriteStrategy's algorithm. This will most likely be Action framework specific (Spark/Presto/Flink ....).- Parameters:
- filesToRewrite- a group of files to be rewritten together
- Returns:
- a set of newly written files
 
 - 
sparkprotected org.apache.spark.sql.SparkSession spark() 
 - 
sortPlanprotected org.apache.spark.sql.catalyst.plans.logical.LogicalPlan sortPlan(org.apache.spark.sql.connector.distributions.Distribution distribution, org.apache.spark.sql.connector.expressions.SortOrder[] ordering, org.apache.spark.sql.catalyst.plans.logical.LogicalPlan plan, org.apache.spark.sql.internal.SQLConf conf)
 
- 
 
-