java.lang.Object

org.apache.iceberg.spark.SparkTableUtil

public class SparkTableUtil extends Object

Java version of the original SparkTableUtil.scala https://github.com/apache/iceberg/blob/apache-iceberg-0.8.0-incubating/spark/src/main/scala/org/apache/iceberg/spark/SparkTableUtil.scala

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static class

SparkTableUtil.SparkPartition

Class representing a table partition.
Method Summary

Modifier and Type

Method

Description

static String

determineReadBranch(org.apache.spark.sql.SparkSession spark, SparkTable sparkTable, org.apache.spark.sql.util.CaseInsensitiveStringMap options)

static String

determineReadBranch(org.apache.spark.sql.SparkSession spark, Table table, String branch, org.apache.spark.sql.util.CaseInsensitiveStringMap options)

Determine the read branch.

static String

determineWriteBranch(org.apache.spark.sql.SparkSession spark, SparkTable sparkTable, org.apache.spark.sql.util.CaseInsensitiveStringMap options)

static String

determineWriteBranch(org.apache.spark.sql.SparkSession spark, Table table, String branch, org.apache.spark.sql.util.CaseInsensitiveStringMap options)

Determine the write branch.

static List<SparkTableUtil.SparkPartition>

filterPartitions(List<SparkTableUtil.SparkPartition> partitions, Map<String,String> partitionFilter)

Deprecated.
since 1.11.0, will be removed in 1.12.0

static PartitionSpec

findCompatibleSpec(List<String> partitionNames, Table icebergTable)

Returns the first partition spec in an IcebergTable that shares the same names and ordering as the partition columns provided.

static List<SparkTableUtil.SparkPartition>

getPartitions(org.apache.spark.sql.SparkSession spark, String table)

Deprecated.
since 1.11.0, will be removed in 1.12.0

static List<SparkTableUtil.SparkPartition>

getPartitions(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.catalyst.TableIdentifier tableIdent, Map<String,String> partitionFilter)

Returns all partitions in the table.

static List<SparkTableUtil.SparkPartition>

getPartitionsByFilter(org.apache.spark.sql.SparkSession spark, String table, String predicate)

Deprecated.
since 1.11.0, will be removed in 1.12.0

static List<SparkTableUtil.SparkPartition>

getPartitionsByFilter(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.catalyst.TableIdentifier tableIdent, org.apache.spark.sql.catalyst.expressions.Expression predicateExpr)

Deprecated.
since 1.11.0, will be removed in 1.12.0

static void

importSparkPartitions(org.apache.spark.sql.SparkSession spark, List<SparkTableUtil.SparkPartition> partitions, Table targetTable, PartitionSpec spec, String stagingDir)

Deprecated.
since 1.11.0, will be removed in 1.12.0

static void

importSparkPartitions(org.apache.spark.sql.SparkSession spark, List<SparkTableUtil.SparkPartition> partitions, Table targetTable, PartitionSpec spec, String stagingDir, boolean checkDuplicateFiles)

Deprecated.
since 1.11.0, will be removed in 1.12.0

static void

importSparkPartitions(org.apache.spark.sql.SparkSession spark, List<SparkTableUtil.SparkPartition> partitions, Table targetTable, PartitionSpec spec, String stagingDir, boolean checkDuplicateFiles, boolean ignoreMissingFiles, ExecutorService service)

Import files from given partitions to an Iceberg table.

static void

importSparkPartitions(org.apache.spark.sql.SparkSession spark, List<SparkTableUtil.SparkPartition> partitions, Table targetTable, PartitionSpec spec, String stagingDir, boolean checkDuplicateFiles, int parallelism)

Import files from given partitions to an Iceberg table.

static void

importSparkPartitions(org.apache.spark.sql.SparkSession spark, List<SparkTableUtil.SparkPartition> partitions, Table targetTable, PartitionSpec spec, String stagingDir, boolean checkDuplicateFiles, ExecutorService service)

Import files from given partitions to an Iceberg table.

static void

importSparkTable(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.catalyst.TableIdentifier sourceTableIdent, Table targetTable, String stagingDir)

Import files from an existing Spark table to an Iceberg table.

static void

importSparkTable(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.catalyst.TableIdentifier sourceTableIdent, Table targetTable, String stagingDir, boolean checkDuplicateFiles)

Deprecated.
since 1.11.0, will be removed in 1.12.0

static void

importSparkTable(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.catalyst.TableIdentifier sourceTableIdent, Table targetTable, String stagingDir, int parallelism)

Deprecated.
since 1.11.0, will be removed in 1.12.0

static void

importSparkTable(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.catalyst.TableIdentifier sourceTableIdent, Table targetTable, String stagingDir, ExecutorService service)

Import files from an existing Spark table to an Iceberg table.

static void

importSparkTable(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.catalyst.TableIdentifier sourceTableIdent, Table targetTable, String stagingDir, Map<String,String> partitionFilter, boolean checkDuplicateFiles)

Deprecated.
since 1.11.0, will be removed in 1.12.0

static void

importSparkTable(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.catalyst.TableIdentifier sourceTableIdent, Table targetTable, String stagingDir, Map<String,String> partitionFilter, boolean checkDuplicateFiles, boolean ignoreMissingFiles, ExecutorService service)

Import files from an existing Spark table to an Iceberg table.

static void

importSparkTable(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.catalyst.TableIdentifier sourceTableIdent, Table targetTable, String stagingDir, Map<String,String> partitionFilter, boolean checkDuplicateFiles, int parallelism)

Import files from an existing Spark table to an Iceberg table.

static void

importSparkTable(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.catalyst.TableIdentifier sourceTableIdent, Table targetTable, String stagingDir, Map<String,String> partitionFilter, boolean checkDuplicateFiles, ExecutorService service)

Import files from an existing Spark table to an Iceberg table.

static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>

loadMetadataTable(org.apache.spark.sql.SparkSession spark, Table table, MetadataTableType type)

static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>

loadMetadataTable(org.apache.spark.sql.SparkSession spark, Table table, MetadataTableType type, Map<String,String> extraOptions)

static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>

loadTable(org.apache.spark.sql.SparkSession spark, Table table, long snapshotId)

static ExecutorService

migrationService(int parallelism)

static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>

partitionDF(org.apache.spark.sql.SparkSession spark, String table)

Deprecated.
since 1.11.0, will be removed in 1.12.0

static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>

partitionDFByFilter(org.apache.spark.sql.SparkSession spark, String table, String expression)

Deprecated.
since 1.11.0, will be removed in 1.12.0

static void

validatePartitionFilter(PartitionSpec spec, Map<String,String> partitionFilter, String tableName)

static void

validateReadBranch(org.apache.spark.sql.SparkSession spark, Table table, String branch, org.apache.spark.sql.util.CaseInsensitiveStringMap options)

static void

validateWriteBranch(org.apache.spark.sql.SparkSession spark, Table table, String branch, org.apache.spark.sql.util.CaseInsensitiveStringMap options)

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Method Details
- partitionDF
  
  @Deprecated public static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> partitionDF(org.apache.spark.sql.SparkSession spark, String table)
  
  Deprecated.
  since 1.11.0, will be removed in 1.12.0
  
  Returns a DataFrame with a row for each partition in the table.
  The DataFrame has 3 columns, partition key (a=1/b=2), partition location, and format (avro or parquet).
  
  Parameters:
  
  spark - a Spark session
  
  table - a table name and (optional) database
  
  Returns:
  
  a DataFrame of the table's partitions
- partitionDFByFilter
  
  @Deprecated public static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> partitionDFByFilter(org.apache.spark.sql.SparkSession spark, String table, String expression)
  
  Deprecated.
  since 1.11.0, will be removed in 1.12.0
  
  Returns a DataFrame with a row for each partition that matches the specified 'expression'.
  
  Parameters:
  
  spark - a Spark session.
  
  table - name of the table.
  
  expression - The expression whose matching partitions are returned.
  
  Returns:
  
  a DataFrame of the table partitions.
- getPartitions
  
  @Deprecated public static List<SparkTableUtil.SparkPartition> getPartitions(org.apache.spark.sql.SparkSession spark, String table)
  
  Deprecated.
  since 1.11.0, will be removed in 1.12.0
  
  Returns all partitions in the table.
  
  Parameters:
  
  spark - a Spark session
  
  table - a table name and (optional) database
  
  Returns:
  
  all table's partitions
- getPartitions
  
  public static List<SparkTableUtil.SparkPartition> getPartitions(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.catalyst.TableIdentifier tableIdent, Map<String,String> partitionFilter)
  
  Returns all partitions in the table.
  
  Parameters:
  
  spark - a Spark session
  
  tableIdent - a table identifier
  
  partitionFilter - partition filter, or null if no filter
  
  Returns:
  
  all table's partitions
- getPartitionsByFilter
  
  @Deprecated public static List<SparkTableUtil.SparkPartition> getPartitionsByFilter(org.apache.spark.sql.SparkSession spark, String table, String predicate)
  
  Deprecated.
  since 1.11.0, will be removed in 1.12.0
  
  Returns partitions that match the specified 'predicate'.
  
  Parameters:
  
  spark - a Spark session
  
  table - a table name and (optional) database
  
  predicate - a predicate on partition columns
  
  Returns:
  
  matching table's partitions
- getPartitionsByFilter
  
  @Deprecated public static List<SparkTableUtil.SparkPartition> getPartitionsByFilter(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.catalyst.TableIdentifier tableIdent, org.apache.spark.sql.catalyst.expressions.Expression predicateExpr)
  
  Deprecated.
  since 1.11.0, will be removed in 1.12.0
  
  Returns partitions that match the specified 'predicate'.
  
  Parameters:
  
  spark - a Spark session
  
  tableIdent - a table identifier
  
  predicateExpr - a predicate expression on partition columns
  
  Returns:
  
  matching table's partitions
- importSparkTable
  
  @Deprecated public static void importSparkTable(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.catalyst.TableIdentifier sourceTableIdent, Table targetTable, String stagingDir, Map<String,String> partitionFilter, boolean checkDuplicateFiles)
  
  Deprecated.
  since 1.11.0, will be removed in 1.12.0
  
  Import files from an existing Spark table to an Iceberg table.
  The import uses the Spark session to get table metadata. It assumes no operation is going on the original and target table and thus is not thread-safe.
  
  Parameters:
  
  spark - a Spark session
  
  sourceTableIdent - an identifier of the source Spark table
  
  targetTable - an Iceberg table where to import the data
  
  stagingDir - a staging directory to store temporary manifest files
  
  partitionFilter - only import partitions whose values match those in the map, can be partially defined
  
  checkDuplicateFiles - if true, throw exception if import results in a duplicate data file
- importSparkTable
  
  @Deprecated public static void importSparkTable(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.catalyst.TableIdentifier sourceTableIdent, Table targetTable, String stagingDir, int parallelism)
  
  Deprecated.
  since 1.11.0, will be removed in 1.12.0
  
  Import files from an existing Spark table to an Iceberg table.
  The import uses the Spark session to get table metadata. It assumes no operation is going on the original and target table and thus is not thread-safe.
  
  Parameters:
  
  spark - a Spark session
  
  sourceTableIdent - an identifier of the source Spark table
  
  targetTable - an Iceberg table where to import the data
  
  stagingDir - a staging directory to store temporary manifest files
  
  parallelism - number of threads to use for file reading
- importSparkTable
  
  public static void importSparkTable(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.catalyst.TableIdentifier sourceTableIdent, Table targetTable, String stagingDir, ExecutorService service)
  
  Import files from an existing Spark table to an Iceberg table.
  The import uses the Spark session to get table metadata. It assumes no operation is going on the original and target table and thus is not thread-safe.
  
  Parameters:
  
  spark - a Spark session
  
  sourceTableIdent - an identifier of the source Spark table
  
  targetTable - an Iceberg table where to import the data
  
  stagingDir - a staging directory to store temporary manifest files
  
  service - executor service to use for file reading. If null, file reading will be performed on the current thread. * If non-null, the provided ExecutorService will be shutdown within this method after file reading is complete.
- importSparkTable
  
  public static void importSparkTable(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.catalyst.TableIdentifier sourceTableIdent, Table targetTable, String stagingDir, Map<String,String> partitionFilter, boolean checkDuplicateFiles, int parallelism)
  
  Import files from an existing Spark table to an Iceberg table.
  The import uses the Spark session to get table metadata. It assumes no operation is going on the original and target table and thus is not thread-safe.
  
  Parameters:
  
  spark - a Spark session
  
  sourceTableIdent - an identifier of the source Spark table
  
  targetTable - an Iceberg table where to import the data
  
  stagingDir - a staging directory to store temporary manifest files
  
  partitionFilter - only import partitions whose values match those in the map, can be partially defined
  
  checkDuplicateFiles - if true, throw exception if import results in a duplicate data file
  
  parallelism - number of threads to use for file reading
- importSparkTable
  
  public static void importSparkTable(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.catalyst.TableIdentifier sourceTableIdent, Table targetTable, String stagingDir, Map<String,String> partitionFilter, boolean checkDuplicateFiles, ExecutorService service)
  
  Import files from an existing Spark table to an Iceberg table.
  The import uses the Spark session to get table metadata. It assumes no operation is going on the original and target table and thus is not thread-safe.
  
  Parameters:
  
  spark - a Spark session
  
  sourceTableIdent - an identifier of the source Spark table
  
  targetTable - an Iceberg table where to import the data
  
  stagingDir - a staging directory to store temporary manifest files
  
  partitionFilter - only import partitions whose values match those in the map, can be partially defined
  
  checkDuplicateFiles - if true, throw exception if import results in a duplicate data file
  
  service - executor service to use for file reading. If null, file reading will be performed on the current thread. If non-null, the provided ExecutorService will be shutdown within this method after file reading is complete.
- importSparkTable
  
  public static void importSparkTable(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.catalyst.TableIdentifier sourceTableIdent, Table targetTable, String stagingDir, Map<String,String> partitionFilter, boolean checkDuplicateFiles, boolean ignoreMissingFiles, ExecutorService service)
  
  Import files from an existing Spark table to an Iceberg table.
  The import uses the Spark session to get table metadata. It assumes no operation is going on the original and target table and thus is not thread-safe.
  
  Parameters:
  
  spark - a Spark session
  
  sourceTableIdent - an identifier of the source Spark table
  
  targetTable - an Iceberg table where to import the data
  
  stagingDir - a staging directory to store temporary manifest files
  
  partitionFilter - only import partitions whose values match those in the map, can be partially defined
  
  checkDuplicateFiles - if true, throw exception if import results in a duplicate data file
  
  ignoreMissingFiles - if true, ignore FileNotFoundException when running listPartition(org.apache.iceberg.spark.SparkTableUtil.SparkPartition, org.apache.iceberg.PartitionSpec, org.apache.iceberg.hadoop.SerializableConfiguration, org.apache.iceberg.MetricsConfig, org.apache.iceberg.mapping.NameMapping, boolean, java.util.concurrent.ExecutorService) for the Spark partitions
  
  service - executor service to use for file reading. If null, file reading will be performed on the current thread. If non-null, the provided ExecutorService will be shutdown within this method after file reading is complete.
- importSparkTable
  
  @Deprecated public static void importSparkTable(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.catalyst.TableIdentifier sourceTableIdent, Table targetTable, String stagingDir, boolean checkDuplicateFiles)
  
  Deprecated.
  since 1.11.0, will be removed in 1.12.0
  
  Import files from an existing Spark table to an Iceberg table.
  The import uses the Spark session to get table metadata. It assumes no operation is going on the original and target table and thus is not thread-safe.
  
  Parameters:
  
  spark - a Spark session
  
  sourceTableIdent - an identifier of the source Spark table
  
  targetTable - an Iceberg table where to import the data
  
  stagingDir - a staging directory to store temporary manifest files
  
  checkDuplicateFiles - if true, throw exception if import results in a duplicate data file
- importSparkTable
  
  public static void importSparkTable(org.apache.spark.sql.SparkSession spark, org.apache.spark.sql.catalyst.TableIdentifier sourceTableIdent, Table targetTable, String stagingDir)
  
  Import files from an existing Spark table to an Iceberg table.
  The import uses the Spark session to get table metadata. It assumes no operation is going on the original and target table and thus is not thread-safe.
  
  Parameters:
  
  spark - a Spark session
  
  sourceTableIdent - an identifier of the source Spark table
  
  targetTable - an Iceberg table where to import the data
  
  stagingDir - a staging directory to store temporary manifest files
- importSparkPartitions
  
  @Deprecated public static void importSparkPartitions(org.apache.spark.sql.SparkSession spark, List<SparkTableUtil.SparkPartition> partitions, Table targetTable, PartitionSpec spec, String stagingDir, boolean checkDuplicateFiles)
  
  Deprecated.
  since 1.11.0, will be removed in 1.12.0
  
  Import files from given partitions to an Iceberg table.
  
  Parameters:
  
  spark - a Spark session
  
  partitions - partitions to import
  
  targetTable - an Iceberg table where to import the data
  
  spec - a partition spec
  
  stagingDir - a staging directory to store temporary manifest files
  
  checkDuplicateFiles - if true, throw exception if import results in a duplicate data file
- importSparkPartitions
  
  public static void importSparkPartitions(org.apache.spark.sql.SparkSession spark, List<SparkTableUtil.SparkPartition> partitions, Table targetTable, PartitionSpec spec, String stagingDir, boolean checkDuplicateFiles, int parallelism)
  
  Import files from given partitions to an Iceberg table.
  
  Parameters:
  
  spark - a Spark session
  
  partitions - partitions to import
  
  targetTable - an Iceberg table where to import the data
  
  spec - a partition spec
  
  stagingDir - a staging directory to store temporary manifest files
  
  checkDuplicateFiles - if true, throw exception if import results in a duplicate data file
  
  parallelism - number of threads to use for file reading
- importSparkPartitions
  
  public static void importSparkPartitions(org.apache.spark.sql.SparkSession spark, List<SparkTableUtil.SparkPartition> partitions, Table targetTable, PartitionSpec spec, String stagingDir, boolean checkDuplicateFiles, ExecutorService service)
  
  Import files from given partitions to an Iceberg table.
  
  Parameters:
  
  spark - a Spark session
  
  partitions - partitions to import
  
  targetTable - an Iceberg table where to import the data
  
  spec - a partition spec
  
  stagingDir - a staging directory to store temporary manifest files
  
  checkDuplicateFiles - if true, throw exception if import results in a duplicate data file
  
  service - executor service to use for file reading. If null, file reading will be performed on the current thread. If non-null, the provided ExecutorService will be shutdown within this method after file reading is complete.
- importSparkPartitions
  
  public static void importSparkPartitions(org.apache.spark.sql.SparkSession spark, List<SparkTableUtil.SparkPartition> partitions, Table targetTable, PartitionSpec spec, String stagingDir, boolean checkDuplicateFiles, boolean ignoreMissingFiles, ExecutorService service)
  
  Import files from given partitions to an Iceberg table.
  
  Parameters:
  
  spark - a Spark session
  
  partitions - partitions to import
  
  targetTable - an Iceberg table where to import the data
  
  spec - a partition spec
  
  stagingDir - a staging directory to store temporary manifest files
  
  checkDuplicateFiles - if true, throw exception if import results in a duplicate data file
  
  ignoreMissingFiles - if true, ignore FileNotFoundException when running listPartition(org.apache.iceberg.spark.SparkTableUtil.SparkPartition, org.apache.iceberg.PartitionSpec, org.apache.iceberg.hadoop.SerializableConfiguration, org.apache.iceberg.MetricsConfig, org.apache.iceberg.mapping.NameMapping, boolean, java.util.concurrent.ExecutorService) for the Spark partitions
  
  service - executor service to use for file reading. If null, file reading will be performed on the current thread. If non-null, the provided ExecutorService will be shutdown within this method after file reading is complete.
- importSparkPartitions
  
  @Deprecated public static void importSparkPartitions(org.apache.spark.sql.SparkSession spark, List<SparkTableUtil.SparkPartition> partitions, Table targetTable, PartitionSpec spec, String stagingDir)
  
  Deprecated.
  since 1.11.0, will be removed in 1.12.0
  
  Import files from given partitions to an Iceberg table.
  
  Parameters:
  
  spark - a Spark session
  
  partitions - partitions to import
  
  targetTable - an Iceberg table where to import the data
  
  spec - a partition spec
  
  stagingDir - a staging directory to store temporary manifest files
- filterPartitions
  
  @Deprecated public static List<SparkTableUtil.SparkPartition> filterPartitions(List<SparkTableUtil.SparkPartition> partitions, Map<String,String> partitionFilter)
  
  Deprecated.
  since 1.11.0, will be removed in 1.12.0
- loadTable
  
  public static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadTable(org.apache.spark.sql.SparkSession spark, Table table, long snapshotId)
- loadMetadataTable
  
  public static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(org.apache.spark.sql.SparkSession spark, Table table, MetadataTableType type)
- loadMetadataTable
  
  public static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(org.apache.spark.sql.SparkSession spark, Table table, MetadataTableType type, Map<String,String> extraOptions)
- validateWriteBranch
  
  public static void validateWriteBranch(org.apache.spark.sql.SparkSession spark, Table table, String branch, org.apache.spark.sql.util.CaseInsensitiveStringMap options)
- validateReadBranch
  
  public static void validateReadBranch(org.apache.spark.sql.SparkSession spark, Table table, String branch, org.apache.spark.sql.util.CaseInsensitiveStringMap options)
- determineWriteBranch
  
  public static String determineWriteBranch(org.apache.spark.sql.SparkSession spark, SparkTable sparkTable, org.apache.spark.sql.util.CaseInsensitiveStringMap options)
- determineWriteBranch
  
  public static String determineWriteBranch(org.apache.spark.sql.SparkSession spark, Table table, String branch, org.apache.spark.sql.util.CaseInsensitiveStringMap options)
  Determine the write branch.
  The target branch can be specified via table identifier, write option, or in SQL:
  
  The identifier and option branches can't conflict. If both are set, they must match.
  Identifier and option branches take priority over the session WAP branch.
  If neither the option nor the identifier branch is set and WAP is enabled for this table, use the WAP branch from the session SQL config.
  
  Note: WAP ID and WAP branch cannot be set at the same time.
  Note: The target branch may be created during the write operation if it does not exist.
  Parameters:
  
  spark - a Spark Session
  
  table - the table being written to
  
  branch - write branch configured via table identifier, or null
  
  options - write options
  
  Returns:
  
  branch for write operation, or null for main branch
- determineReadBranch
  
  public static String determineReadBranch(org.apache.spark.sql.SparkSession spark, SparkTable sparkTable, org.apache.spark.sql.util.CaseInsensitiveStringMap options)
- determineReadBranch
  
  public static String determineReadBranch(org.apache.spark.sql.SparkSession spark, Table table, String branch, org.apache.spark.sql.util.CaseInsensitiveStringMap options)
  Determine the read branch.
  The target branch can be specified via table identifier, read option, or in SQL:
  
  The identifier and option branches can't conflict. If both are set, they must match.
  Identifier and option branches take priority over the session WAP branch.
  If neither the option nor the identifier branch is set and WAP is enabled for this table, use the WAP branch from the session SQL config (only if the branch already exists).
  
  Note: WAP ID and WAP branch cannot be set at the same time.
  Parameters:
  
  spark - a Spark Session
  
  table - the table being read from
  
  branch - read branch configured via table identifier, or null
  
  options - read options
  
  Returns:
  
  branch for read operation, or null for main branch
- migrationService
  
  @Nullable public static ExecutorService migrationService(int parallelism)
- findCompatibleSpec
  
  public static PartitionSpec findCompatibleSpec(List<String> partitionNames, Table icebergTable)
  
  Returns the first partition spec in an IcebergTable that shares the same names and ordering as the partition columns provided. Throws an error if not found
- validatePartitionFilter
  
  public static void validatePartitionFilter(PartitionSpec spec, Map<String,String> partitionFilter, String tableName)

Class SparkTableUtil

Nested Class Summary

Method Summary

Methods inherited from class java.lang.Object

Method Details

partitionDF

partitionDFByFilter

getPartitions

getPartitions

getPartitionsByFilter

getPartitionsByFilter

importSparkTable

importSparkTable

importSparkTable

importSparkTable

importSparkTable

importSparkTable

importSparkTable

importSparkTable

importSparkPartitions

importSparkPartitions

importSparkPartitions

importSparkPartitions

importSparkPartitions

filterPartitions

loadTable

loadMetadataTable

loadMetadataTable

validateWriteBranch

validateReadBranch

determineWriteBranch

determineWriteBranch

determineReadBranch

determineReadBranch

migrationService

findCompatibleSpec

validatePartitionFilter