Class Spark3Util


  • public class Spark3Util
    extends java.lang.Object
    • Method Detail

      • rebuildCreateProperties

        public static java.util.Map<java.lang.String,​java.lang.String> rebuildCreateProperties​(java.util.Map<java.lang.String,​java.lang.String> createProperties)
      • applyPropertyChanges

        public static UpdateProperties applyPropertyChanges​(UpdateProperties pendingUpdate,
                                                            java.util.List<org.apache.spark.sql.connector.catalog.TableChange> changes)
        Applies a list of Spark table changes to an UpdateProperties operation.
        Parameters:
        pendingUpdate - an uncommitted UpdateProperties operation to configure
        changes - a list of Spark table changes
        Returns:
        the UpdateProperties operation configured with the changes
      • applySchemaChanges

        public static UpdateSchema applySchemaChanges​(UpdateSchema pendingUpdate,
                                                      java.util.List<org.apache.spark.sql.connector.catalog.TableChange> changes)
        Applies a list of Spark table changes to an UpdateSchema operation.
        Parameters:
        pendingUpdate - an uncommitted UpdateSchema operation to configure
        changes - a list of Spark table changes
        Returns:
        the UpdateSchema operation configured with the changes
      • toIcebergTable

        public static Table toIcebergTable​(org.apache.spark.sql.connector.catalog.Table table)
      • toTransforms

        public static org.apache.spark.sql.connector.expressions.Transform[] toTransforms​(PartitionSpec spec)
        Converts a PartitionSpec to Spark transforms.
        Parameters:
        spec - a PartitionSpec
        Returns:
        an array of Transforms
      • buildRequiredDistribution

        public static Distribution buildRequiredDistribution​(Table table)
      • toIcebergTerm

        public static Term toIcebergTerm​(org.apache.spark.sql.connector.expressions.Transform transform)
      • toPartitionSpec

        public static PartitionSpec toPartitionSpec​(Schema schema,
                                                    org.apache.spark.sql.connector.expressions.Transform[] partitioning)
        Converts Spark transforms into a PartitionSpec.
        Parameters:
        schema - the table schema
        partitioning - Spark Transforms
        Returns:
        a PartitionSpec
      • describe

        public static java.lang.String describe​(Expression expr)
      • describe

        public static java.lang.String describe​(Schema schema)
      • describe

        public static java.lang.String describe​(Type type)
      • describe

        public static java.lang.String describe​(SortOrder order)
      • isLocalityEnabled

        public static boolean isLocalityEnabled​(FileIO io,
                                                java.lang.String location,
                                                org.apache.spark.sql.util.CaseInsensitiveStringMap readOptions)
      • isVectorizationEnabled

        public static boolean isVectorizationEnabled​(FileFormat fileFormat,
                                                     java.util.Map<java.lang.String,​java.lang.String> properties,
                                                     org.apache.spark.sql.RuntimeConfig sessionConf,
                                                     org.apache.spark.sql.util.CaseInsensitiveStringMap readOptions)
      • batchSize

        public static int batchSize​(java.util.Map<java.lang.String,​java.lang.String> properties,
                                    org.apache.spark.sql.util.CaseInsensitiveStringMap readOptions)
      • propertyAsLong

        public static java.lang.Long propertyAsLong​(org.apache.spark.sql.util.CaseInsensitiveStringMap options,
                                                    java.lang.String property,
                                                    java.lang.Long defaultValue)
      • propertyAsInt

        public static java.lang.Integer propertyAsInt​(org.apache.spark.sql.util.CaseInsensitiveStringMap options,
                                                      java.lang.String property,
                                                      java.lang.Integer defaultValue)
      • propertyAsBoolean

        public static java.lang.Boolean propertyAsBoolean​(org.apache.spark.sql.util.CaseInsensitiveStringMap options,
                                                          java.lang.String property,
                                                          java.lang.Boolean defaultValue)
      • catalogAndIdentifier

        public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier​(org.apache.spark.sql.SparkSession spark,
                                                                           java.lang.String name)
                                                                    throws org.apache.spark.sql.catalyst.parser.ParseException
        Throws:
        org.apache.spark.sql.catalyst.parser.ParseException
      • catalogAndIdentifier

        public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier​(org.apache.spark.sql.SparkSession spark,
                                                                           java.lang.String name,
                                                                           org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog)
                                                                    throws org.apache.spark.sql.catalyst.parser.ParseException
        Throws:
        org.apache.spark.sql.catalyst.parser.ParseException
      • catalogAndIdentifier

        public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier​(java.lang.String description,
                                                                           org.apache.spark.sql.SparkSession spark,
                                                                           java.lang.String name)
      • catalogAndIdentifier

        public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier​(java.lang.String description,
                                                                           org.apache.spark.sql.SparkSession spark,
                                                                           java.lang.String name,
                                                                           org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog)
      • catalogAndIdentifier

        public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier​(org.apache.spark.sql.SparkSession spark,
                                                                           java.util.List<java.lang.String> nameParts)
      • catalogAndIdentifier

        public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier​(org.apache.spark.sql.SparkSession spark,
                                                                           java.util.List<java.lang.String> nameParts,
                                                                           org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog)
        A modified version of Spark's LookupCatalog.CatalogAndIdentifier.unapply Attempts to find the catalog and identifier a multipart identifier represents
        Parameters:
        spark - Spark session to use for resolution
        nameParts - Multipart identifier representing a table
        defaultCatalog - Catalog to use if none is specified
        Returns:
        The CatalogPlugin and Identifier for the table
      • identifierToTableIdentifier

        public static TableIdentifier identifierToTableIdentifier​(org.apache.spark.sql.connector.catalog.Identifier identifier)
      • getPartitions

        public static java.util.List<SparkTableUtil.SparkPartition> getPartitions​(org.apache.spark.sql.SparkSession spark,
                                                                                  org.apache.hadoop.fs.Path rootPath,
                                                                                  java.lang.String format)
        Use Spark to list all partitions in the table.
        Parameters:
        spark - a Spark session
        rootPath - a table identifier
        format - format of the file
        Returns:
        all table's partitions
      • toV1TableIdentifier

        public static org.apache.spark.sql.catalyst.TableIdentifier toV1TableIdentifier​(org.apache.spark.sql.connector.catalog.Identifier identifier)