Package org.apache.iceberg.spark
Class SparkUtil
- java.lang.Object
-
- org.apache.iceberg.spark.SparkUtil
-
public class SparkUtil extends java.lang.Object
-
-
Field Summary
Fields Modifier and Type Field Description static java.lang.StringTIMESTAMP_WITHOUT_TIMEZONE_ERROR
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static <C,T>
Pair<C,T>catalogAndIdentifier(java.util.List<java.lang.String> nameParts, java.util.function.Function<java.lang.String,C> catalogProvider, java.util.function.BiFunction<java.lang.String[],java.lang.String,T> identiferProvider, C currentCatalog, java.lang.String[] currentNamespace)A modified version of Spark's LookupCatalog.CatalogAndIdentifier.unapply Attempts to find the catalog and identifier a multipart identifier representsstatic org.apache.hadoop.conf.ConfigurationhadoopConfCatalogOverrides(org.apache.spark.sql.SparkSession spark, java.lang.String catalogName)Pulls any Catalog specific overrides for the Hadoop conf from the current SparkSession, which can be set via `spark.sql.catalog.$catalogName.hadoop.*` Mirrors the override of hadoop configurations for a given spark session using `spark.hadoop.*`.static booleanhasTimestampWithoutZone(Schema schema)Responsible for checking if the table schema has a timestamp without timezone columnstatic java.util.List<org.apache.spark.sql.catalyst.expressions.Expression>partitionMapToExpression(org.apache.spark.sql.types.StructType schema, java.util.Map<java.lang.String,java.lang.String> filters)Get a List of Spark filter Expression.static FileIOserializableFileIO(Table table)static booleanuseTimestampWithoutZoneInNewTables(org.apache.spark.sql.RuntimeConfig sessionConf)Checks whether timestamp types for new tables should be stored with timezone info.static voidvalidatePartitionTransforms(PartitionSpec spec)Check whether the partition transforms in a spec can be used to write data.
-
-
-
Method Detail
-
validatePartitionTransforms
public static void validatePartitionTransforms(PartitionSpec spec)
Check whether the partition transforms in a spec can be used to write data.- Parameters:
spec- a PartitionSpec- Throws:
java.lang.UnsupportedOperationException- if the spec contains unknown partition transforms
-
catalogAndIdentifier
public static <C,T> Pair<C,T> catalogAndIdentifier(java.util.List<java.lang.String> nameParts, java.util.function.Function<java.lang.String,C> catalogProvider, java.util.function.BiFunction<java.lang.String[],java.lang.String,T> identiferProvider, C currentCatalog, java.lang.String[] currentNamespace)
A modified version of Spark's LookupCatalog.CatalogAndIdentifier.unapply Attempts to find the catalog and identifier a multipart identifier represents- Parameters:
nameParts- Multipart identifier representing a table- Returns:
- The CatalogPlugin and Identifier for the table
-
hasTimestampWithoutZone
public static boolean hasTimestampWithoutZone(Schema schema)
Responsible for checking if the table schema has a timestamp without timezone column- Parameters:
schema- table schema to check if it contains a timestamp without timezone column- Returns:
- boolean indicating if the schema passed in has a timestamp field without a timezone
-
useTimestampWithoutZoneInNewTables
public static boolean useTimestampWithoutZoneInNewTables(org.apache.spark.sql.RuntimeConfig sessionConf)
Checks whether timestamp types for new tables should be stored with timezone info.The default value is false and all timestamp fields are stored as
Types.TimestampType.withZone(). If enabled, all timestamp fields in new tables will be stored asTypes.TimestampType.withoutZone().- Parameters:
sessionConf- a Spark runtime config- Returns:
- true if timestamp types for new tables should be stored with timezone info
-
hadoopConfCatalogOverrides
public static org.apache.hadoop.conf.Configuration hadoopConfCatalogOverrides(org.apache.spark.sql.SparkSession spark, java.lang.String catalogName)Pulls any Catalog specific overrides for the Hadoop conf from the current SparkSession, which can be set via `spark.sql.catalog.$catalogName.hadoop.*` Mirrors the override of hadoop configurations for a given spark session using `spark.hadoop.*`. The SparkCatalog allows for hadoop configurations to be overridden per catalog, by setting them on the SQLConf, where the following will add the property "fs.default.name" with value "hdfs://hanksnamenode:8020" to the catalog's hadoop configuration. SparkSession.builder() .config(s"spark.sql.catalog.$catalogName.hadoop.fs.default.name", "hdfs://hanksnamenode:8020") .getOrCreate()- Parameters:
spark- The current Spark sessioncatalogName- Name of the catalog to find overrides for.- Returns:
- the Hadoop Configuration that should be used for this catalog, with catalog specific overrides applied.
-
partitionMapToExpression
public static java.util.List<org.apache.spark.sql.catalyst.expressions.Expression> partitionMapToExpression(org.apache.spark.sql.types.StructType schema, java.util.Map<java.lang.String,java.lang.String> filters)Get a List of Spark filter Expression.- Parameters:
schema- table schemafilters- filters in the format of a Map, where key is one of the table column name, and value is the specific value to be filtered on the column.- Returns:
- a List of filters in the format of Spark Expression.
-
-