Class SparkUtil


  • public class SparkUtil
    extends java.lang.Object
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static boolean canHandleTimestampWithoutZone​(java.util.Map<java.lang.String,​java.lang.String> readerConfig, org.apache.spark.sql.RuntimeConfig sessionConf)
      Allow reading/writing timestamp without time zone as timestamp with time zone.
      static <C,​T>
      Pair<C,​T>
      catalogAndIdentifier​(java.util.List<java.lang.String> nameParts, java.util.function.Function<java.lang.String,​C> catalogProvider, java.util.function.BiFunction<java.lang.String[],​java.lang.String,​T> identiferProvider, C currentCatalog, java.lang.String[] currentNamespace)
      A modified version of Spark's LookupCatalog.CatalogAndIdentifier.unapply Attempts to find the catalog and identifier a multipart identifier represents
      static org.apache.hadoop.conf.Configuration hadoopConfCatalogOverrides​(org.apache.spark.sql.SparkSession spark, java.lang.String catalogName)
      Pulls any Catalog specific overrides for the Hadoop conf from the current SparkSession, which can be set via `spark.sql.catalog.$catalogName.hadoop.*` Mirrors the override of hadoop configurations for a given spark session using `spark.hadoop.*`.
      static boolean hasTimestampWithoutZone​(Schema schema)
      Responsible for checking if the table schema has a timestamp without timezone column
      static FileIO serializableFileIO​(Table table)  
      static boolean useTimestampWithoutZoneInNewTables​(org.apache.spark.sql.RuntimeConfig sessionConf)
      Check whether the spark session config contains a USE_TIMESTAMP_WITHOUT_TIME_ZONE_IN_NEW_TABLES property.
      static void validatePartitionTransforms​(PartitionSpec spec)
      Check whether the partition transforms in a spec can be used to write data.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • HANDLE_TIMESTAMP_WITHOUT_TIMEZONE

        public static final java.lang.String HANDLE_TIMESTAMP_WITHOUT_TIMEZONE
        See Also:
        Constant Field Values
      • TIMESTAMP_WITHOUT_TIMEZONE_ERROR

        public static final java.lang.String TIMESTAMP_WITHOUT_TIMEZONE_ERROR
      • USE_TIMESTAMP_WITHOUT_TIME_ZONE_IN_NEW_TABLES

        public static final java.lang.String USE_TIMESTAMP_WITHOUT_TIME_ZONE_IN_NEW_TABLES
        See Also:
        Constant Field Values
    • Method Detail

      • serializableFileIO

        public static FileIO serializableFileIO​(Table table)
      • validatePartitionTransforms

        public static void validatePartitionTransforms​(PartitionSpec spec)
        Check whether the partition transforms in a spec can be used to write data.
        Parameters:
        spec - a PartitionSpec
        Throws:
        java.lang.UnsupportedOperationException - if the spec contains unknown partition transforms
      • catalogAndIdentifier

        public static <C,​T> Pair<C,​T> catalogAndIdentifier​(java.util.List<java.lang.String> nameParts,
                                                                       java.util.function.Function<java.lang.String,​C> catalogProvider,
                                                                       java.util.function.BiFunction<java.lang.String[],​java.lang.String,​T> identiferProvider,
                                                                       C currentCatalog,
                                                                       java.lang.String[] currentNamespace)
        A modified version of Spark's LookupCatalog.CatalogAndIdentifier.unapply Attempts to find the catalog and identifier a multipart identifier represents
        Parameters:
        nameParts - Multipart identifier representing a table
        Returns:
        The CatalogPlugin and Identifier for the table
      • hasTimestampWithoutZone

        public static boolean hasTimestampWithoutZone​(Schema schema)
        Responsible for checking if the table schema has a timestamp without timezone column
        Parameters:
        schema - table schema to check if it contains a timestamp without timezone column
        Returns:
        boolean indicating if the schema passed in has a timestamp field without a timezone
      • canHandleTimestampWithoutZone

        public static boolean canHandleTimestampWithoutZone​(java.util.Map<java.lang.String,​java.lang.String> readerConfig,
                                                            org.apache.spark.sql.RuntimeConfig sessionConf)
        Allow reading/writing timestamp without time zone as timestamp with time zone. Generally, this is not safe as timestamp without time zone is supposed to represent wall clock time semantics, i.e. no matter the reader/writer timezone 3PM should always be read as 3PM, but timestamp with time zone represents instant semantics, i.e the timestamp is adjusted so that the corresponding time in the reader timezone is displayed. When set to false (default), we throw an exception at runtime "Spark does not support timestamp without time zone fields" if reading timestamp without time zone fields
        Parameters:
        readerConfig - table read options
        sessionConf - spark session configurations
        Returns:
        boolean indicating if reading timestamps without timezone is allowed
      • hadoopConfCatalogOverrides

        public static org.apache.hadoop.conf.Configuration hadoopConfCatalogOverrides​(org.apache.spark.sql.SparkSession spark,
                                                                                      java.lang.String catalogName)
        Pulls any Catalog specific overrides for the Hadoop conf from the current SparkSession, which can be set via `spark.sql.catalog.$catalogName.hadoop.*` Mirrors the override of hadoop configurations for a given spark session using `spark.hadoop.*`. The SparkCatalog allows for hadoop configurations to be overridden per catalog, by setting them on the SQLConf, where the following will add the property "fs.default.name" with value "hdfs://hanksnamenode:8020" to the catalog's hadoop configuration. SparkSession.builder() .config(s"spark.sql.catalog.$catalogName.hadoop.fs.default.name", "hdfs://hanksnamenode:8020") .getOrCreate()
        Parameters:
        spark - The current Spark session
        catalogName - Name of the catalog to find overrides for.
        Returns:
        the Hadoop Configuration that should be used for this catalog, with catalog specific overrides applied.