Class Spark3Util

java.lang.Object
org.apache.iceberg.spark.Spark3Util

public class Spark3Util extends Object
  • Method Details

    • setOption

      public static org.apache.spark.sql.util.CaseInsensitiveStringMap setOption(String key, String value, org.apache.spark.sql.util.CaseInsensitiveStringMap options)
    • rebuildCreateProperties

      public static Map<String,String> rebuildCreateProperties(Map<String,String> createProperties)
    • applyPropertyChanges

      public static UpdateProperties applyPropertyChanges(UpdateProperties pendingUpdate, List<org.apache.spark.sql.connector.catalog.TableChange> changes)
      Applies a list of Spark table changes to an UpdateProperties operation.
      Parameters:
      pendingUpdate - an uncommitted UpdateProperties operation to configure
      changes - a list of Spark table changes
      Returns:
      the UpdateProperties operation configured with the changes
    • applySchemaChanges

      public static UpdateSchema applySchemaChanges(UpdateSchema pendingUpdate, List<org.apache.spark.sql.connector.catalog.TableChange> changes)
      Applies a list of Spark table changes to an UpdateSchema operation.
      Parameters:
      pendingUpdate - an uncommitted UpdateSchema operation to configure
      changes - a list of Spark table changes
      Returns:
      the UpdateSchema operation configured with the changes
    • toIcebergTable

      public static Table toIcebergTable(org.apache.spark.sql.connector.catalog.Table table)
    • toOrdering

      public static org.apache.spark.sql.connector.expressions.SortOrder[] toOrdering(SortOrder sortOrder)
    • toTransforms

      public static org.apache.spark.sql.connector.expressions.Transform[] toTransforms(Schema schema, List<PartitionField> fields)
    • toTransforms

      public static org.apache.spark.sql.connector.expressions.Transform[] toTransforms(PartitionSpec spec)
      Converts a PartitionSpec to Spark transforms.
      Parameters:
      spec - a PartitionSpec
      Returns:
      an array of Transforms
    • toNamedReference

      public static org.apache.spark.sql.connector.expressions.NamedReference toNamedReference(String name)
    • toIcebergTerm

      public static Term toIcebergTerm(org.apache.spark.sql.connector.expressions.Expression expr)
    • toPartitionSpec

      public static PartitionSpec toPartitionSpec(Schema schema, org.apache.spark.sql.connector.expressions.Transform[] partitioning)
      Converts Spark transforms into a PartitionSpec.
      Parameters:
      schema - the table schema
      partitioning - Spark Transforms
      Returns:
      a PartitionSpec
    • describe

      public static String describe(List<Expression> exprs)
    • describe

      public static String describe(Expression expr)
    • describe

      public static String describe(Schema schema)
    • describe

      public static String describe(Type type)
    • describe

      public static String describe(SortOrder order)
    • extensionsEnabled

      public static boolean extensionsEnabled(org.apache.spark.sql.SparkSession spark)
    • loadIcebergTable

      public static Table loadIcebergTable(org.apache.spark.sql.SparkSession spark, String name) throws org.apache.spark.sql.catalyst.parser.ParseException, org.apache.spark.sql.catalyst.analysis.NoSuchTableException
      Returns an Iceberg Table by its name from a Spark V2 Catalog. If cache is enabled in SparkCatalog, the TableOperations of the table may be stale, please refresh the table to get the latest one.
      Parameters:
      spark - SparkSession used for looking up catalog references and tables
      name - The multipart identifier of the Iceberg table
      Returns:
      an Iceberg table
      Throws:
      org.apache.spark.sql.catalyst.parser.ParseException
      org.apache.spark.sql.catalyst.analysis.NoSuchTableException
    • loadIcebergCatalog

      public static Catalog loadIcebergCatalog(org.apache.spark.sql.SparkSession spark, String catalogName)
      Returns the underlying Iceberg Catalog object represented by a Spark Catalog
      Parameters:
      spark - SparkSession used for looking up catalog reference
      catalogName - The name of the Spark Catalog being referenced
      Returns:
      the Iceberg catalog class being wrapped by the Spark Catalog
    • catalogAndIdentifier

      public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(org.apache.spark.sql.SparkSession spark, String name) throws org.apache.spark.sql.catalyst.parser.ParseException
      Throws:
      org.apache.spark.sql.catalyst.parser.ParseException
    • catalogAndIdentifier

      public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(org.apache.spark.sql.SparkSession spark, String name, org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog) throws org.apache.spark.sql.catalyst.parser.ParseException
      Throws:
      org.apache.spark.sql.catalyst.parser.ParseException
    • catalogAndIdentifier

      public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(String description, org.apache.spark.sql.SparkSession spark, String name)
    • catalogAndIdentifier

      public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(String description, org.apache.spark.sql.SparkSession spark, String name, org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog)
    • catalogAndIdentifier

      public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(org.apache.spark.sql.SparkSession spark, List<String> nameParts)
    • catalogAndIdentifier

      public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(org.apache.spark.sql.SparkSession spark, List<String> nameParts, org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog)
      A modified version of Spark's LookupCatalog.CatalogAndIdentifier.unapply Attempts to find the catalog and identifier a multipart identifier represents
      Parameters:
      spark - Spark session to use for resolution
      nameParts - Multipart identifier representing a table
      defaultCatalog - Catalog to use if none is specified
      Returns:
      The CatalogPlugin and Identifier for the table
    • identifierToTableIdentifier

      public static TableIdentifier identifierToTableIdentifier(org.apache.spark.sql.connector.catalog.Identifier identifier)
    • quotedFullIdentifier

      public static String quotedFullIdentifier(String catalogName, org.apache.spark.sql.connector.catalog.Identifier identifier)
    • getPartitions

      public static List<SparkTableUtil.SparkPartition> getPartitions(org.apache.spark.sql.SparkSession spark, org.apache.hadoop.fs.Path rootPath, String format, Map<String,String> partitionFilter, PartitionSpec partitionSpec)
      Use Spark to list all partitions in the table.
      Parameters:
      spark - a Spark session
      rootPath - a table identifier
      format - format of the file
      partitionFilter - partitionFilter of the file
      partitionSpec - partitionSpec of the table
      Returns:
      all table's partitions
    • toV1TableIdentifier

      public static org.apache.spark.sql.catalyst.TableIdentifier toV1TableIdentifier(org.apache.spark.sql.connector.catalog.Identifier identifier)