Class Spark3Util


  • public class Spark3Util
    extends java.lang.Object
    • Method Detail

      • setOption

        public static org.apache.spark.sql.util.CaseInsensitiveStringMap setOption​(java.lang.String key,
                                                                                   java.lang.String value,
                                                                                   org.apache.spark.sql.util.CaseInsensitiveStringMap options)
      • rebuildCreateProperties

        public static java.util.Map<java.lang.String,​java.lang.String> rebuildCreateProperties​(java.util.Map<java.lang.String,​java.lang.String> createProperties)
      • applyPropertyChanges

        public static UpdateProperties applyPropertyChanges​(UpdateProperties pendingUpdate,
                                                            java.util.List<org.apache.spark.sql.connector.catalog.TableChange> changes)
        Applies a list of Spark table changes to an UpdateProperties operation.
        Parameters:
        pendingUpdate - an uncommitted UpdateProperties operation to configure
        changes - a list of Spark table changes
        Returns:
        the UpdateProperties operation configured with the changes
      • applySchemaChanges

        public static UpdateSchema applySchemaChanges​(UpdateSchema pendingUpdate,
                                                      java.util.List<org.apache.spark.sql.connector.catalog.TableChange> changes)
        Applies a list of Spark table changes to an UpdateSchema operation.
        Parameters:
        pendingUpdate - an uncommitted UpdateSchema operation to configure
        changes - a list of Spark table changes
        Returns:
        the UpdateSchema operation configured with the changes
      • toIcebergTable

        public static Table toIcebergTable​(org.apache.spark.sql.connector.catalog.Table table)
      • toOrdering

        public static org.apache.spark.sql.connector.expressions.SortOrder[] toOrdering​(SortOrder sortOrder)
      • toTransforms

        public static org.apache.spark.sql.connector.expressions.Transform[] toTransforms​(Schema schema,
                                                                                          java.util.List<PartitionField> fields)
      • toTransforms

        public static org.apache.spark.sql.connector.expressions.Transform[] toTransforms​(PartitionSpec spec)
        Converts a PartitionSpec to Spark transforms.
        Parameters:
        spec - a PartitionSpec
        Returns:
        an array of Transforms
      • toNamedReference

        public static org.apache.spark.sql.connector.expressions.NamedReference toNamedReference​(java.lang.String name)
      • toIcebergTerm

        public static Term toIcebergTerm​(org.apache.spark.sql.connector.expressions.Expression expr)
      • toPartitionSpec

        public static PartitionSpec toPartitionSpec​(Schema schema,
                                                    org.apache.spark.sql.connector.expressions.Transform[] partitioning)
        Converts Spark transforms into a PartitionSpec.
        Parameters:
        schema - the table schema
        partitioning - Spark Transforms
        Returns:
        a PartitionSpec
      • describe

        public static java.lang.String describe​(java.util.List<Expression> exprs)
      • describe

        public static java.lang.String describe​(Expression expr)
      • describe

        public static java.lang.String describe​(Schema schema)
      • describe

        public static java.lang.String describe​(Type type)
      • describe

        public static java.lang.String describe​(SortOrder order)
      • extensionsEnabled

        public static boolean extensionsEnabled​(org.apache.spark.sql.SparkSession spark)
      • loadIcebergTable

        public static Table loadIcebergTable​(org.apache.spark.sql.SparkSession spark,
                                             java.lang.String name)
                                      throws org.apache.spark.sql.catalyst.parser.ParseException,
                                             org.apache.spark.sql.catalyst.analysis.NoSuchTableException
        Returns an Iceberg Table by its name from a Spark V2 Catalog. If cache is enabled in SparkCatalog, the TableOperations of the table may be stale, please refresh the table to get the latest one.
        Parameters:
        spark - SparkSession used for looking up catalog references and tables
        name - The multipart identifier of the Iceberg table
        Returns:
        an Iceberg table
        Throws:
        org.apache.spark.sql.catalyst.parser.ParseException
        org.apache.spark.sql.catalyst.analysis.NoSuchTableException
      • loadIcebergCatalog

        public static Catalog loadIcebergCatalog​(org.apache.spark.sql.SparkSession spark,
                                                 java.lang.String catalogName)
        Returns the underlying Iceberg Catalog object represented by a Spark Catalog
        Parameters:
        spark - SparkSession used for looking up catalog reference
        catalogName - The name of the Spark Catalog being referenced
        Returns:
        the Iceberg catalog class being wrapped by the Spark Catalog
      • catalogAndIdentifier

        public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier​(org.apache.spark.sql.SparkSession spark,
                                                                           java.lang.String name)
                                                                    throws org.apache.spark.sql.catalyst.parser.ParseException
        Throws:
        org.apache.spark.sql.catalyst.parser.ParseException
      • catalogAndIdentifier

        public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier​(org.apache.spark.sql.SparkSession spark,
                                                                           java.lang.String name,
                                                                           org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog)
                                                                    throws org.apache.spark.sql.catalyst.parser.ParseException
        Throws:
        org.apache.spark.sql.catalyst.parser.ParseException
      • catalogAndIdentifier

        public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier​(java.lang.String description,
                                                                           org.apache.spark.sql.SparkSession spark,
                                                                           java.lang.String name)
      • catalogAndIdentifier

        public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier​(java.lang.String description,
                                                                           org.apache.spark.sql.SparkSession spark,
                                                                           java.lang.String name,
                                                                           org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog)
      • catalogAndIdentifier

        public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier​(org.apache.spark.sql.SparkSession spark,
                                                                           java.util.List<java.lang.String> nameParts)
      • catalogAndIdentifier

        public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier​(org.apache.spark.sql.SparkSession spark,
                                                                           java.util.List<java.lang.String> nameParts,
                                                                           org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog)
        A modified version of Spark's LookupCatalog.CatalogAndIdentifier.unapply Attempts to find the catalog and identifier a multipart identifier represents
        Parameters:
        spark - Spark session to use for resolution
        nameParts - Multipart identifier representing a table
        defaultCatalog - Catalog to use if none is specified
        Returns:
        The CatalogPlugin and Identifier for the table
      • identifierToTableIdentifier

        public static TableIdentifier identifierToTableIdentifier​(org.apache.spark.sql.connector.catalog.Identifier identifier)
      • quotedFullIdentifier

        public static java.lang.String quotedFullIdentifier​(java.lang.String catalogName,
                                                            org.apache.spark.sql.connector.catalog.Identifier identifier)
      • getPartitions

        public static java.util.List<SparkTableUtil.SparkPartition> getPartitions​(org.apache.spark.sql.SparkSession spark,
                                                                                  org.apache.hadoop.fs.Path rootPath,
                                                                                  java.lang.String format,
                                                                                  java.util.Map<java.lang.String,​java.lang.String> partitionFilter,
                                                                                  PartitionSpec partitionSpec)
        Use Spark to list all partitions in the table.
        Parameters:
        spark - a Spark session
        rootPath - a table identifier
        format - format of the file
        partitionFilter - partitionFilter of the file
        partitionSpec - partitionSpec of the table
        Returns:
        all table's partitions
      • toV1TableIdentifier

        public static org.apache.spark.sql.catalyst.TableIdentifier toV1TableIdentifier​(org.apache.spark.sql.connector.catalog.Identifier identifier)