Class SparkUtil


  • public class SparkUtil
    extends java.lang.Object
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static boolean caseSensitive​(org.apache.spark.sql.SparkSession spark)  
      static <C,​T>
      Pair<C,​T>
      catalogAndIdentifier​(java.util.List<java.lang.String> nameParts, java.util.function.Function<java.lang.String,​C> catalogProvider, java.util.function.BiFunction<java.lang.String[],​java.lang.String,​T> identiferProvider, C currentCatalog, java.lang.String[] currentNamespace)
      A modified version of Spark's LookupCatalog.CatalogAndIdentifier.unapply Attempts to find the catalog and identifier a multipart identifier represents
      static org.apache.hadoop.conf.Configuration hadoopConfCatalogOverrides​(org.apache.spark.sql.SparkSession spark, java.lang.String catalogName)
      Pulls any Catalog specific overrides for the Hadoop conf from the current SparkSession, which can be set via `spark.sql.catalog.$catalogName.hadoop.*`
      static java.util.List<org.apache.spark.sql.catalyst.expressions.Expression> partitionMapToExpression​(org.apache.spark.sql.types.StructType schema, java.util.Map<java.lang.String,​java.lang.String> filters)
      Get a List of Spark filter Expression.
      static java.lang.String toColumnName​(org.apache.spark.sql.connector.expressions.NamedReference ref)  
      static void validatePartitionTransforms​(PartitionSpec spec)
      Check whether the partition transforms in a spec can be used to write data.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • validatePartitionTransforms

        public static void validatePartitionTransforms​(PartitionSpec spec)
        Check whether the partition transforms in a spec can be used to write data.
        Parameters:
        spec - a PartitionSpec
        Throws:
        java.lang.UnsupportedOperationException - if the spec contains unknown partition transforms
      • catalogAndIdentifier

        public static <C,​T> Pair<C,​T> catalogAndIdentifier​(java.util.List<java.lang.String> nameParts,
                                                                       java.util.function.Function<java.lang.String,​C> catalogProvider,
                                                                       java.util.function.BiFunction<java.lang.String[],​java.lang.String,​T> identiferProvider,
                                                                       C currentCatalog,
                                                                       java.lang.String[] currentNamespace)
        A modified version of Spark's LookupCatalog.CatalogAndIdentifier.unapply Attempts to find the catalog and identifier a multipart identifier represents
        Parameters:
        nameParts - Multipart identifier representing a table
        Returns:
        The CatalogPlugin and Identifier for the table
      • hadoopConfCatalogOverrides

        public static org.apache.hadoop.conf.Configuration hadoopConfCatalogOverrides​(org.apache.spark.sql.SparkSession spark,
                                                                                      java.lang.String catalogName)
        Pulls any Catalog specific overrides for the Hadoop conf from the current SparkSession, which can be set via `spark.sql.catalog.$catalogName.hadoop.*`

        Mirrors the override of hadoop configurations for a given spark session using `spark.hadoop.*`.

        The SparkCatalog allows for hadoop configurations to be overridden per catalog, by setting them on the SQLConf, where the following will add the property "fs.default.name" with value "hdfs://hanksnamenode:8020" to the catalog's hadoop configuration. SparkSession.builder() .config(s"spark.sql.catalog.$catalogName.hadoop.fs.default.name", "hdfs://hanksnamenode:8020") .getOrCreate()

        Parameters:
        spark - The current Spark session
        catalogName - Name of the catalog to find overrides for.
        Returns:
        the Hadoop Configuration that should be used for this catalog, with catalog specific overrides applied.
      • partitionMapToExpression

        public static java.util.List<org.apache.spark.sql.catalyst.expressions.Expression> partitionMapToExpression​(org.apache.spark.sql.types.StructType schema,
                                                                                                                    java.util.Map<java.lang.String,​java.lang.String> filters)
        Get a List of Spark filter Expression.
        Parameters:
        schema - table schema
        filters - filters in the format of a Map, where key is one of the table column name, and value is the specific value to be filtered on the column.
        Returns:
        a List of filters in the format of Spark Expression.
      • toColumnName

        public static java.lang.String toColumnName​(org.apache.spark.sql.connector.expressions.NamedReference ref)
      • caseSensitive

        public static boolean caseSensitive​(org.apache.spark.sql.SparkSession spark)