Package org.apache.iceberg.spark
Class SparkTableUtil
- java.lang.Object
- 
- org.apache.iceberg.spark.SparkTableUtil
 
- 
 public class SparkTableUtil extends java.lang.ObjectJava version of the original SparkTableUtil.scala https://github.com/apache/iceberg/blob/apache-iceberg-0.8.0-incubating/spark/src/main/scala/org/apache/iceberg/spark/SparkTableUtil.scala
- 
- 
Nested Class SummaryNested Classes Modifier and Type Class Description static classSparkTableUtil.SparkPartitionClass representing a table partition.
 - 
Method SummaryAll Methods Static Methods Concrete Methods Deprecated Methods Modifier and Type Method Description static java.util.List<SparkTableUtil.SparkPartition>filterPartitions(java.util.List<SparkTableUtil.SparkPartition> partitions, java.util.Map<java.lang.String,java.lang.String> partitionFilter)static java.util.List<SparkTableUtil.SparkPartition>getPartitions(org.apache.spark.sql.SparkSession spark, java.lang.String table)Returns all partitions in the table.static java.util.List<SparkTableUtil.SparkPartition>getPartitions(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.catalyst.TableIdentifier tableIdent, java.util.Map<java.lang.String,java.lang.String> partitionFilter)Returns all partitions in the table.static java.util.List<SparkTableUtil.SparkPartition>getPartitionsByFilter(org.apache.spark.sql.SparkSession spark, java.lang.String table, java.lang.String predicate)Returns partitions that match the specified 'predicate'.static java.util.List<SparkTableUtil.SparkPartition>getPartitionsByFilter(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.catalyst.TableIdentifier tableIdent, org.apache.spark.sql.catalyst.expressions.Expression predicateExpr)Returns partitions that match the specified 'predicate'.static voidimportSparkPartitions(org.apache.spark.sql.SparkSession spark, java.util.List<SparkTableUtil.SparkPartition> partitions, Table targetTable, PartitionSpec spec, java.lang.String stagingDir)Import files from given partitions to an Iceberg table.static voidimportSparkTable(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.catalyst.TableIdentifier sourceTableIdent, Table targetTable, java.lang.String stagingDir)Import files from an existing Spark table to an Iceberg table.static voidimportSparkTable(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.catalyst.TableIdentifier sourceTableIdent, Table targetTable, java.lang.String stagingDir, java.util.Map<java.lang.String,java.lang.String> partitionFilter)Import files from an existing Spark table to an Iceberg table.static java.util.List<DataFile>listPartition(SparkTableUtil.SparkPartition partition, PartitionSpec spec, SerializableConfiguration conf, MetricsConfig metricsConfig)static java.util.List<DataFile>listPartition(SparkTableUtil.SparkPartition partition, PartitionSpec spec, SerializableConfiguration conf, MetricsConfig metricsConfig, NameMapping mapping)static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>partitionDF(org.apache.spark.sql.SparkSession spark, java.lang.String table)Returns a DataFrame with a row for each partition in the table.static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>partitionDFByFilter(org.apache.spark.sql.SparkSession spark, java.lang.String table, java.lang.String expression)Returns a DataFrame with a row for each partition that matches the specified 'expression'.
 
- 
- 
- 
Method Detail- 
partitionDFpublic static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> partitionDF(org.apache.spark.sql.SparkSession spark, java.lang.String table)Returns a DataFrame with a row for each partition in the table. The DataFrame has 3 columns, partition key (a=1/b=2), partition location, and format (avro or parquet).- Parameters:
- spark- a Spark session
- table- a table name and (optional) database
- Returns:
- a DataFrame of the table's partitions
 
 - 
partitionDFByFilterpublic static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> partitionDFByFilter(org.apache.spark.sql.SparkSession spark, java.lang.String table, java.lang.String expression)Returns a DataFrame with a row for each partition that matches the specified 'expression'.- Parameters:
- spark- a Spark session.
- table- name of the table.
- expression- The expression whose matching partitions are returned.
- Returns:
- a DataFrame of the table partitions.
 
 - 
getPartitionspublic static java.util.List<SparkTableUtil.SparkPartition> getPartitions(org.apache.spark.sql.SparkSession spark, java.lang.String table) Returns all partitions in the table.- Parameters:
- spark- a Spark session
- table- a table name and (optional) database
- Returns:
- all table's partitions
 
 - 
getPartitionspublic static java.util.List<SparkTableUtil.SparkPartition> getPartitions(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.catalyst.TableIdentifier tableIdent, java.util.Map<java.lang.String,java.lang.String> partitionFilter) Returns all partitions in the table.- Parameters:
- spark- a Spark session
- tableIdent- a table identifier
- partitionFilter- partition filter, or null if no filter
- Returns:
- all table's partitions
 
 - 
getPartitionsByFilterpublic static java.util.List<SparkTableUtil.SparkPartition> getPartitionsByFilter(org.apache.spark.sql.SparkSession spark, java.lang.String table, java.lang.String predicate) Returns partitions that match the specified 'predicate'.- Parameters:
- spark- a Spark session
- table- a table name and (optional) database
- predicate- a predicate on partition columns
- Returns:
- matching table's partitions
 
 - 
getPartitionsByFilterpublic static java.util.List<SparkTableUtil.SparkPartition> getPartitionsByFilter(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.catalyst.TableIdentifier tableIdent, org.apache.spark.sql.catalyst.expressions.Expression predicateExpr) Returns partitions that match the specified 'predicate'.- Parameters:
- spark- a Spark session
- tableIdent- a table identifier
- predicateExpr- a predicate expression on partition columns
- Returns:
- matching table's partitions
 
 - 
listPartition@Deprecated public static java.util.List<DataFile> listPartition(SparkTableUtil.SparkPartition partition, PartitionSpec spec, SerializableConfiguration conf, MetricsConfig metricsConfig) Deprecated.Returns the data files in a partition by listing the partition location. For Parquet and ORC partitions, this will read metrics from the file footer. For Avro partitions, metrics are set to null.- Parameters:
- partition- a partition
- conf- a serializable Hadoop conf
- metricsConfig- a metrics conf
- Returns:
- a List of DataFile
 
 - 
listPartition@Deprecated public static java.util.List<DataFile> listPartition(SparkTableUtil.SparkPartition partition, PartitionSpec spec, SerializableConfiguration conf, MetricsConfig metricsConfig, NameMapping mapping) Deprecated.Returns the data files in a partition by listing the partition location. For Parquet and ORC partitions, this will read metrics from the file footer. For Avro partitions, metrics are set to null.- Parameters:
- partition- a partition
- conf- a serializable Hadoop conf
- metricsConfig- a metrics conf
- mapping- a name mapping
- Returns:
- a List of DataFile
 
 - 
importSparkTablepublic static void importSparkTable(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.catalyst.TableIdentifier sourceTableIdent, Table targetTable, java.lang.String stagingDir, java.util.Map<java.lang.String,java.lang.String> partitionFilter)Import files from an existing Spark table to an Iceberg table. The import uses the Spark session to get table metadata. It assumes no operation is going on the original and target table and thus is not thread-safe.- Parameters:
- spark- a Spark session
- sourceTableIdent- an identifier of the source Spark table
- targetTable- an Iceberg table where to import the data
- stagingDir- a staging directory to store temporary manifest files
- partitionFilter- only import partitions whose values match those in the map, can be partially defined
 
 - 
importSparkTablepublic static void importSparkTable(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.catalyst.TableIdentifier sourceTableIdent, Table targetTable, java.lang.String stagingDir)Import files from an existing Spark table to an Iceberg table. The import uses the Spark session to get table metadata. It assumes no operation is going on the original and target table and thus is not thread-safe.- Parameters:
- spark- a Spark session
- sourceTableIdent- an identifier of the source Spark table
- targetTable- an Iceberg table where to import the data
- stagingDir- a staging directory to store temporary manifest files
 
 - 
importSparkPartitionspublic static void importSparkPartitions(org.apache.spark.sql.SparkSession spark, java.util.List<SparkTableUtil.SparkPartition> partitions, Table targetTable, PartitionSpec spec, java.lang.String stagingDir)Import files from given partitions to an Iceberg table.- Parameters:
- spark- a Spark session
- partitions- partitions to import
- targetTable- an Iceberg table where to import the data
- spec- a partition spec
- stagingDir- a staging directory to store temporary manifest files
 
 - 
filterPartitionspublic static java.util.List<SparkTableUtil.SparkPartition> filterPartitions(java.util.List<SparkTableUtil.SparkPartition> partitions, java.util.Map<java.lang.String,java.lang.String> partitionFilter) 
 
- 
 
-