Class SparkUtil


  • public class SparkUtil
    extends java.lang.Object
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static <C,​T>
      Pair<C,​T>
      catalogAndIdentifier​(java.util.List<java.lang.String> nameParts, java.util.function.Function<java.lang.String,​C> catalogProvider, java.util.function.BiFunction<java.lang.String[],​java.lang.String,​T> identiferProvider, C currentCatalog, java.lang.String[] currentNamespace)
      A modified version of Spark's LookupCatalog.CatalogAndIdentifier.unapply Attempts to find the catalog and identifier a multipart identifier represents
      static org.apache.hadoop.conf.Configuration hadoopConfCatalogOverrides​(org.apache.spark.sql.SparkSession spark, java.lang.String catalogName)
      Pulls any Catalog specific overrides for the Hadoop conf from the current SparkSession, which can be set via `spark.sql.catalog.$catalogName.hadoop.*` Mirrors the override of hadoop configurations for a given spark session using `spark.hadoop.*`.
      static boolean hasTimestampWithoutZone​(Schema schema)
      Responsible for checking if the table schema has a timestamp without timezone column
      static java.util.List<org.apache.spark.sql.catalyst.expressions.Expression> partitionMapToExpression​(org.apache.spark.sql.types.StructType schema, java.util.Map<java.lang.String,​java.lang.String> filters)
      Get a List of Spark filter Expression.
      static FileIO serializableFileIO​(Table table)  
      static boolean useTimestampWithoutZoneInNewTables​(org.apache.spark.sql.RuntimeConfig sessionConf)
      Checks whether timestamp types for new tables should be stored with timezone info.
      static void validatePartitionTransforms​(PartitionSpec spec)
      Check whether the partition transforms in a spec can be used to write data.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • TIMESTAMP_WITHOUT_TIMEZONE_ERROR

        public static final java.lang.String TIMESTAMP_WITHOUT_TIMEZONE_ERROR
    • Method Detail

      • serializableFileIO

        public static FileIO serializableFileIO​(Table table)
      • validatePartitionTransforms

        public static void validatePartitionTransforms​(PartitionSpec spec)
        Check whether the partition transforms in a spec can be used to write data.
        Parameters:
        spec - a PartitionSpec
        Throws:
        java.lang.UnsupportedOperationException - if the spec contains unknown partition transforms
      • catalogAndIdentifier

        public static <C,​T> Pair<C,​T> catalogAndIdentifier​(java.util.List<java.lang.String> nameParts,
                                                                       java.util.function.Function<java.lang.String,​C> catalogProvider,
                                                                       java.util.function.BiFunction<java.lang.String[],​java.lang.String,​T> identiferProvider,
                                                                       C currentCatalog,
                                                                       java.lang.String[] currentNamespace)
        A modified version of Spark's LookupCatalog.CatalogAndIdentifier.unapply Attempts to find the catalog and identifier a multipart identifier represents
        Parameters:
        nameParts - Multipart identifier representing a table
        Returns:
        The CatalogPlugin and Identifier for the table
      • hasTimestampWithoutZone

        public static boolean hasTimestampWithoutZone​(Schema schema)
        Responsible for checking if the table schema has a timestamp without timezone column
        Parameters:
        schema - table schema to check if it contains a timestamp without timezone column
        Returns:
        boolean indicating if the schema passed in has a timestamp field without a timezone
      • useTimestampWithoutZoneInNewTables

        public static boolean useTimestampWithoutZoneInNewTables​(org.apache.spark.sql.RuntimeConfig sessionConf)
        Checks whether timestamp types for new tables should be stored with timezone info.

        The default value is false and all timestamp fields are stored as Types.TimestampType.withZone(). If enabled, all timestamp fields in new tables will be stored as Types.TimestampType.withoutZone().

        Parameters:
        sessionConf - a Spark runtime config
        Returns:
        true if timestamp types for new tables should be stored with timezone info
      • hadoopConfCatalogOverrides

        public static org.apache.hadoop.conf.Configuration hadoopConfCatalogOverrides​(org.apache.spark.sql.SparkSession spark,
                                                                                      java.lang.String catalogName)
        Pulls any Catalog specific overrides for the Hadoop conf from the current SparkSession, which can be set via `spark.sql.catalog.$catalogName.hadoop.*` Mirrors the override of hadoop configurations for a given spark session using `spark.hadoop.*`. The SparkCatalog allows for hadoop configurations to be overridden per catalog, by setting them on the SQLConf, where the following will add the property "fs.default.name" with value "hdfs://hanksnamenode:8020" to the catalog's hadoop configuration. SparkSession.builder() .config(s"spark.sql.catalog.$catalogName.hadoop.fs.default.name", "hdfs://hanksnamenode:8020") .getOrCreate()
        Parameters:
        spark - The current Spark session
        catalogName - Name of the catalog to find overrides for.
        Returns:
        the Hadoop Configuration that should be used for this catalog, with catalog specific overrides applied.
      • partitionMapToExpression

        public static java.util.List<org.apache.spark.sql.catalyst.expressions.Expression> partitionMapToExpression​(org.apache.spark.sql.types.StructType schema,
                                                                                                                    java.util.Map<java.lang.String,​java.lang.String> filters)
        Get a List of Spark filter Expression.
        Parameters:
        schema - table schema
        filters - filters in the format of a Map, where key is one of the table column name, and value is the specific value to be filtered on the column.
        Returns:
        a List of filters in the format of Spark Expression.