SparkTableUtil

java.lang.Object
- org.apache.iceberg.spark.SparkTableUtil

```
public class SparkTableUtil
extends java.lang.Object
```
Java version of the original SparkTableUtil.scala https://github.com/apache/iceberg/blob/apache-iceberg-0.8.0-incubating/spark/src/main/scala/org/apache/iceberg/spark/SparkTableUtil.scala

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

static class SparkTableUtil.SparkPartition
Class representing a table partition.

Nested Classes
Modifier and Type	Class and Description
`static class`	`SparkTableUtil.SparkPartition` Class representing a table partition.

Method Summary

All Methods Static Methods Concrete Methods Deprecated Methods
Modifier and Type	Method and Description
`static java.util.List<SparkTableUtil.SparkPartition>`	`filterPartitions(java.util.List<SparkTableUtil.SparkPartition> partitions, java.util.Map<java.lang.String,java.lang.String> partitionFilter)`
`static java.util.List<SparkTableUtil.SparkPartition>`	`getPartitions(org.apache.spark.sql.SparkSession spark, java.lang.String table)` Returns all partitions in the table.
`static java.util.List<SparkTableUtil.SparkPartition>`	`getPartitions(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.catalyst.TableIdentifier tableIdent, java.util.Map<java.lang.String,java.lang.String> partitionFilter)` Returns all partitions in the table.
`static java.util.List<SparkTableUtil.SparkPartition>`	`getPartitionsByFilter(org.apache.spark.sql.SparkSession spark, java.lang.String table, java.lang.String predicate)` Returns partitions that match the specified 'predicate'.
`static java.util.List<SparkTableUtil.SparkPartition>`	`getPartitionsByFilter(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.catalyst.TableIdentifier tableIdent, org.apache.spark.sql.catalyst.expressions.Expression predicateExpr)` Returns partitions that match the specified 'predicate'.
`static void`	`importSparkPartitions(org.apache.spark.sql.SparkSession spark, java.util.List<SparkTableUtil.SparkPartition> partitions, Table targetTable, PartitionSpec spec, java.lang.String stagingDir)` Import files from given partitions to an Iceberg table.
`static void`	`importSparkPartitions(org.apache.spark.sql.SparkSession spark, java.util.List<SparkTableUtil.SparkPartition> partitions, Table targetTable, PartitionSpec spec, java.lang.String stagingDir, boolean checkDuplicateFiles)` Import files from given partitions to an Iceberg table.
`static void`	`importSparkTable(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.catalyst.TableIdentifier sourceTableIdent, Table targetTable, java.lang.String stagingDir)` Import files from an existing Spark table to an Iceberg table.
`static void`	`importSparkTable(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.catalyst.TableIdentifier sourceTableIdent, Table targetTable, java.lang.String stagingDir, boolean checkDuplicateFiles)` Import files from an existing Spark table to an Iceberg table.
`static void`	`importSparkTable(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.catalyst.TableIdentifier sourceTableIdent, Table targetTable, java.lang.String stagingDir, java.util.Map<java.lang.String,java.lang.String> partitionFilter, boolean checkDuplicateFiles)` Import files from an existing Spark table to an Iceberg table.
`static java.util.List<DataFile>`	`listPartition(SparkTableUtil.SparkPartition partition, PartitionSpec spec, SerializableConfiguration conf, MetricsConfig metricsConfig)` Deprecated. use `TableMigrationUtil.listPartition(Map, String, String, PartitionSpec, Configuration, MetricsConfig, NameMapping)`
`static java.util.List<DataFile>`	`listPartition(SparkTableUtil.SparkPartition partition, PartitionSpec spec, SerializableConfiguration conf, MetricsConfig metricsConfig, NameMapping mapping)` Deprecated. use `TableMigrationUtil.listPartition(Map, String, String, PartitionSpec, Configuration, MetricsConfig, NameMapping)`
`static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`loadCatalogMetadataTable(org.apache.spark.sql.SparkSession spark, Table table, MetadataTableType type)` Deprecated. since 0.14.0, will be removed in 0.15.0; use `loadMetadataTable(SparkSession, Table, MetadataTableType)`.
`static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`loadMetadataTable(org.apache.spark.sql.SparkSession spark, Table table, MetadataTableType type)`
`static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`loadMetadataTable(org.apache.spark.sql.SparkSession spark, Table table, MetadataTableType type, java.util.Map<java.lang.String,java.lang.String> extraOptions)`
`static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`partitionDF(org.apache.spark.sql.SparkSession spark, java.lang.String table)` Returns a DataFrame with a row for each partition in the table.
`static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`partitionDFByFilter(org.apache.spark.sql.SparkSession spark, java.lang.String table, java.lang.String expression)` Returns a DataFrame with a row for each partition that matches the specified 'expression'.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Method Detail

partitionDF
```
public static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> partitionDF(org.apache.spark.sql.SparkSession spark,
                                                                                 java.lang.String table)
```
Returns a DataFrame with a row for each partition in the table.
The DataFrame has 3 columns, partition key (a=1/b=2), partition location, and format (avro or parquet).

Parameters:

spark - a Spark session

table - a table name and (optional) database

Returns:

a DataFrame of the table's partitions

partitionDFByFilter

public static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> partitionDFByFilter(org.apache.spark.sql.SparkSession spark,
                                                                                         java.lang.String table,
                                                                                         java.lang.String expression)

Returns a DataFrame with a row for each partition that matches the specified 'expression'.

Parameters:: spark - a Spark session.; table - name of the table.; expression - The expression whose matching partitions are returned.
Returns:: a DataFrame of the table partitions.

getPartitions

public static java.util.List<SparkTableUtil.SparkPartition> getPartitions(org.apache.spark.sql.SparkSession spark,
                                                                          java.lang.String table)

Returns all partitions in the table.

Parameters:: spark - a Spark session; table - a table name and (optional) database
Returns:: all table's partitions

getPartitions

public static java.util.List<SparkTableUtil.SparkPartition> getPartitions(org.apache.spark.sql.SparkSession spark,
                                                                          org.apache.spark.sql.catalyst.TableIdentifier tableIdent,
                                                                          java.util.Map<java.lang.String,java.lang.String> partitionFilter)

Returns all partitions in the table.

Parameters:: spark - a Spark session; tableIdent - a table identifier; partitionFilter - partition filter, or null if no filter
Returns:: all table's partitions

getPartitionsByFilter

public static java.util.List<SparkTableUtil.SparkPartition> getPartitionsByFilter(org.apache.spark.sql.SparkSession spark,
                                                                                  java.lang.String table,
                                                                                  java.lang.String predicate)

Returns partitions that match the specified 'predicate'.

Parameters:: spark - a Spark session; table - a table name and (optional) database; predicate - a predicate on partition columns
Returns:: matching table's partitions

getPartitionsByFilter

public static java.util.List<SparkTableUtil.SparkPartition> getPartitionsByFilter(org.apache.spark.sql.SparkSession spark,
                                                                                  org.apache.spark.sql.catalyst.TableIdentifier tableIdent,
                                                                                  org.apache.spark.sql.catalyst.expressions.Expression predicateExpr)

Returns partitions that match the specified 'predicate'.

Parameters:: spark - a Spark session; tableIdent - a table identifier; predicateExpr - a predicate expression on partition columns
Returns:: matching table's partitions

listPartition

@Deprecated
public static java.util.List<DataFile> listPartition(SparkTableUtil.SparkPartition partition,
                                                                 PartitionSpec spec,
                                                                 SerializableConfiguration conf,
                                                                 MetricsConfig metricsConfig)

Deprecated. use

TableMigrationUtil.listPartition(Map, String, String, PartitionSpec,
     Configuration, MetricsConfig, NameMapping)

Returns the data files in a partition by listing the partition location.

For Parquet and ORC partitions, this will read metrics from the file footer. For Avro partitions, metrics are set to null.

Parameters:: partition - a partition; conf - a serializable Hadoop conf; metricsConfig - a metrics conf
Returns:: a List of DataFile

listPartition

@Deprecated
public static java.util.List<DataFile> listPartition(SparkTableUtil.SparkPartition partition,
                                                                 PartitionSpec spec,
                                                                 SerializableConfiguration conf,
                                                                 MetricsConfig metricsConfig,
                                                                 NameMapping mapping)

Deprecated. use

TableMigrationUtil.listPartition(Map, String, String, PartitionSpec,
     Configuration, MetricsConfig, NameMapping)

Returns the data files in a partition by listing the partition location.

For Parquet and ORC partitions, this will read metrics from the file footer. For Avro partitions, metrics are set to null.

Parameters:: partition - a partition; conf - a serializable Hadoop conf; metricsConfig - a metrics conf; mapping - a name mapping
Returns:: a List of DataFile

importSparkTable
```
public static void importSparkTable(org.apache.spark.sql.SparkSession spark,
                                    org.apache.spark.sql.catalyst.TableIdentifier sourceTableIdent,
                                    Table targetTable,
                                    java.lang.String stagingDir,
                                    java.util.Map<java.lang.String,java.lang.String> partitionFilter,
                                    boolean checkDuplicateFiles)
```
Import files from an existing Spark table to an Iceberg table.
The import uses the Spark session to get table metadata. It assumes no operation is going on the original and target table and thus is not thread-safe.

Parameters:

spark - a Spark session

sourceTableIdent - an identifier of the source Spark table

targetTable - an Iceberg table where to import the data

stagingDir - a staging directory to store temporary manifest files

partitionFilter - only import partitions whose values match those in the map, can be partially defined

checkDuplicateFiles - if true, throw exception if import results in a duplicate data file

importSparkTable
```
public static void importSparkTable(org.apache.spark.sql.SparkSession spark,
                                    org.apache.spark.sql.catalyst.TableIdentifier sourceTableIdent,
                                    Table targetTable,
                                    java.lang.String stagingDir,
                                    boolean checkDuplicateFiles)
```
Import files from an existing Spark table to an Iceberg table.
The import uses the Spark session to get table metadata. It assumes no operation is going on the original and target table and thus is not thread-safe.

Parameters:

spark - a Spark session

sourceTableIdent - an identifier of the source Spark table

targetTable - an Iceberg table where to import the data

stagingDir - a staging directory to store temporary manifest files

checkDuplicateFiles - if true, throw exception if import results in a duplicate data file

importSparkTable
```
public static void importSparkTable(org.apache.spark.sql.SparkSession spark,
                                    org.apache.spark.sql.catalyst.TableIdentifier sourceTableIdent,
                                    Table targetTable,
                                    java.lang.String stagingDir)
```
Import files from an existing Spark table to an Iceberg table.
The import uses the Spark session to get table metadata. It assumes no operation is going on the original and target table and thus is not thread-safe.

Parameters:

spark - a Spark session

sourceTableIdent - an identifier of the source Spark table

targetTable - an Iceberg table where to import the data

stagingDir - a staging directory to store temporary manifest files

importSparkPartitions

public static void importSparkPartitions(org.apache.spark.sql.SparkSession spark,
                                         java.util.List<SparkTableUtil.SparkPartition> partitions,
                                         Table targetTable,
                                         PartitionSpec spec,
                                         java.lang.String stagingDir,
                                         boolean checkDuplicateFiles)

Import files from given partitions to an Iceberg table.

Parameters:: spark - a Spark session; partitions - partitions to import; targetTable - an Iceberg table where to import the data; spec - a partition spec; stagingDir - a staging directory to store temporary manifest files; checkDuplicateFiles - if true, throw exception if import results in a duplicate data file

importSparkPartitions

public static void importSparkPartitions(org.apache.spark.sql.SparkSession spark,
                                         java.util.List<SparkTableUtil.SparkPartition> partitions,
                                         Table targetTable,
                                         PartitionSpec spec,
                                         java.lang.String stagingDir)

Import files from given partitions to an Iceberg table.

Parameters:: spark - a Spark session; partitions - partitions to import; targetTable - an Iceberg table where to import the data; spec - a partition spec; stagingDir - a staging directory to store temporary manifest files

filterPartitions

public static java.util.List<SparkTableUtil.SparkPartition> filterPartitions(java.util.List<SparkTableUtil.SparkPartition> partitions,
                                                                             java.util.Map<java.lang.String,java.lang.String> partitionFilter)

loadCatalogMetadataTable

@Deprecated
public static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadCatalogMetadataTable(org.apache.spark.sql.SparkSession spark,
                                                                                                          Table table,
                                                                                                          MetadataTableType type)

Deprecated. since 0.14.0, will be removed in 0.15.0; use loadMetadataTable(SparkSession, Table, MetadataTableType).

Loads a metadata table.

loadMetadataTable

public static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(org.apache.spark.sql.SparkSession spark,
                                                                                       Table table,
                                                                                       MetadataTableType type)

loadMetadataTable

public static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(org.apache.spark.sql.SparkSession spark,
                                                                                       Table table,
                                                                                       MetadataTableType type,
                                                                                       java.util.Map<java.lang.String,java.lang.String> extraOptions)

Class SparkTableUtil

Nested Class Summary

Method Summary

Methods inherited from class java.lang.Object

Method Detail

partitionDF

partitionDFByFilter

getPartitions

getPartitions

getPartitionsByFilter

getPartitionsByFilter

listPartition

listPartition

importSparkTable

importSparkTable

importSparkTable

importSparkPartitions

importSparkPartitions

filterPartitions

loadCatalogMetadataTable

loadMetadataTable

loadMetadataTable