Package org.apache.iceberg.spark
Class SparkUtil
- java.lang.Object
-
- org.apache.iceberg.spark.SparkUtil
-
public class SparkUtil extends java.lang.Object
-
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String
TIMESTAMP_WITHOUT_TIMEZONE_ERROR
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static boolean
caseSensitive(org.apache.spark.sql.SparkSession spark)
static <C,T>
Pair<C,T>catalogAndIdentifier(java.util.List<java.lang.String> nameParts, java.util.function.Function<java.lang.String,C> catalogProvider, java.util.function.BiFunction<java.lang.String[],java.lang.String,T> identiferProvider, C currentCatalog, java.lang.String[] currentNamespace)
A modified version of Spark's LookupCatalog.CatalogAndIdentifier.unapply Attempts to find the catalog and identifier a multipart identifier representsstatic org.apache.hadoop.conf.Configuration
hadoopConfCatalogOverrides(org.apache.spark.sql.SparkSession spark, java.lang.String catalogName)
Pulls any Catalog specific overrides for the Hadoop conf from the current SparkSession, which can be set via `spark.sql.catalog.$catalogName.hadoop.*`static boolean
hasTimestampWithoutZone(Schema schema)
Responsible for checking if the table schema has a timestamp without timezone columnstatic java.util.List<org.apache.spark.sql.catalyst.expressions.Expression>
partitionMapToExpression(org.apache.spark.sql.types.StructType schema, java.util.Map<java.lang.String,java.lang.String> filters)
Get a List of Spark filter Expression.static FileIO
serializableFileIO(Table table)
static java.lang.String
toColumnName(org.apache.spark.sql.connector.expressions.NamedReference ref)
static boolean
useTimestampWithoutZoneInNewTables(org.apache.spark.sql.RuntimeConfig sessionConf)
Checks whether timestamp types for new tables should be stored with timezone info.static void
validatePartitionTransforms(PartitionSpec spec)
Check whether the partition transforms in a spec can be used to write data.
-
-
-
Method Detail
-
validatePartitionTransforms
public static void validatePartitionTransforms(PartitionSpec spec)
Check whether the partition transforms in a spec can be used to write data.- Parameters:
spec
- a PartitionSpec- Throws:
java.lang.UnsupportedOperationException
- if the spec contains unknown partition transforms
-
catalogAndIdentifier
public static <C,T> Pair<C,T> catalogAndIdentifier(java.util.List<java.lang.String> nameParts, java.util.function.Function<java.lang.String,C> catalogProvider, java.util.function.BiFunction<java.lang.String[],java.lang.String,T> identiferProvider, C currentCatalog, java.lang.String[] currentNamespace)
A modified version of Spark's LookupCatalog.CatalogAndIdentifier.unapply Attempts to find the catalog and identifier a multipart identifier represents- Parameters:
nameParts
- Multipart identifier representing a table- Returns:
- The CatalogPlugin and Identifier for the table
-
hasTimestampWithoutZone
public static boolean hasTimestampWithoutZone(Schema schema)
Responsible for checking if the table schema has a timestamp without timezone column- Parameters:
schema
- table schema to check if it contains a timestamp without timezone column- Returns:
- boolean indicating if the schema passed in has a timestamp field without a timezone
-
useTimestampWithoutZoneInNewTables
public static boolean useTimestampWithoutZoneInNewTables(org.apache.spark.sql.RuntimeConfig sessionConf)
Checks whether timestamp types for new tables should be stored with timezone info.The default value is false and all timestamp fields are stored as
Types.TimestampType.withZone()
. If enabled, all timestamp fields in new tables will be stored asTypes.TimestampType.withoutZone()
.- Parameters:
sessionConf
- a Spark runtime config- Returns:
- true if timestamp types for new tables should be stored with timezone info
-
hadoopConfCatalogOverrides
public static org.apache.hadoop.conf.Configuration hadoopConfCatalogOverrides(org.apache.spark.sql.SparkSession spark, java.lang.String catalogName)
Pulls any Catalog specific overrides for the Hadoop conf from the current SparkSession, which can be set via `spark.sql.catalog.$catalogName.hadoop.*`Mirrors the override of hadoop configurations for a given spark session using `spark.hadoop.*`.
The SparkCatalog allows for hadoop configurations to be overridden per catalog, by setting them on the SQLConf, where the following will add the property "fs.default.name" with value "hdfs://hanksnamenode:8020" to the catalog's hadoop configuration. SparkSession.builder() .config(s"spark.sql.catalog.$catalogName.hadoop.fs.default.name", "hdfs://hanksnamenode:8020") .getOrCreate()
- Parameters:
spark
- The current Spark sessioncatalogName
- Name of the catalog to find overrides for.- Returns:
- the Hadoop Configuration that should be used for this catalog, with catalog specific overrides applied.
-
partitionMapToExpression
public static java.util.List<org.apache.spark.sql.catalyst.expressions.Expression> partitionMapToExpression(org.apache.spark.sql.types.StructType schema, java.util.Map<java.lang.String,java.lang.String> filters)
Get a List of Spark filter Expression.- Parameters:
schema
- table schemafilters
- filters in the format of a Map, where key is one of the table column name, and value is the specific value to be filtered on the column.- Returns:
- a List of filters in the format of Spark Expression.
-
toColumnName
public static java.lang.String toColumnName(org.apache.spark.sql.connector.expressions.NamedReference ref)
-
caseSensitive
public static boolean caseSensitive(org.apache.spark.sql.SparkSession spark)
-
-