Package org.apache.iceberg.spark
Class SparkUtil
java.lang.Object
org.apache.iceberg.spark.SparkUtil
-
Method Summary
Modifier and TypeMethodDescriptionstatic boolean
caseSensitive
(org.apache.spark.sql.SparkSession spark) static <C,
T> Pair<C, T> catalogAndIdentifier
(List<String> nameParts, Function<String, C> catalogProvider, BiFunction<String[], String, T> identiferProvider, C currentCatalog, String[] currentNamespace) A modified version of Spark's LookupCatalog.CatalogAndIdentifier.unapply Attempts to find the catalog and identifier a multipart identifier representsstatic org.apache.hadoop.conf.Configuration
hadoopConfCatalogOverrides
(org.apache.spark.sql.SparkSession spark, String catalogName) Pulls any Catalog specific overrides for the Hadoop conf from the current SparkSession, which can be set via `spark.sql.catalog.static List<org.apache.spark.sql.catalyst.expressions.Expression>
partitionMapToExpression
(org.apache.spark.sql.types.StructType schema, Map<String, String> filters) Get a List of Spark filter Expression.static String
toColumnName
(org.apache.spark.sql.connector.expressions.NamedReference ref) static void
Check whether the partition transforms in a spec can be used to write data.
-
Method Details
-
validatePartitionTransforms
Check whether the partition transforms in a spec can be used to write data.- Parameters:
spec
- a PartitionSpec- Throws:
UnsupportedOperationException
- if the spec contains unknown partition transforms
-
catalogAndIdentifier
public static <C,T> Pair<C,T> catalogAndIdentifier(List<String> nameParts, Function<String, C> catalogProvider, BiFunction<String[], String, T> identiferProvider, C currentCatalog, String[] currentNamespace) A modified version of Spark's LookupCatalog.CatalogAndIdentifier.unapply Attempts to find the catalog and identifier a multipart identifier represents- Parameters:
nameParts
- Multipart identifier representing a table- Returns:
- The CatalogPlugin and Identifier for the table
-
hadoopConfCatalogOverrides
public static org.apache.hadoop.conf.Configuration hadoopConfCatalogOverrides(org.apache.spark.sql.SparkSession spark, String catalogName) Pulls any Catalog specific overrides for the Hadoop conf from the current SparkSession, which can be set via `spark.sql.catalog.$catalogName.hadoop.*`Mirrors the override of hadoop configurations for a given spark session using `spark.hadoop.*`.
The SparkCatalog allows for hadoop configurations to be overridden per catalog, by setting them on the SQLConf, where the following will add the property "fs.default.name" with value "hdfs://hanksnamenode:8020" to the catalog's hadoop configuration. SparkSession.builder() .config(s"spark.sql.catalog.$catalogName.hadoop.fs.default.name", "hdfs://hanksnamenode:8020") .getOrCreate()
- Parameters:
spark
- The current Spark sessioncatalogName
- Name of the catalog to find overrides for.- Returns:
- the Hadoop Configuration that should be used for this catalog, with catalog specific overrides applied.
-
partitionMapToExpression
public static List<org.apache.spark.sql.catalyst.expressions.Expression> partitionMapToExpression(org.apache.spark.sql.types.StructType schema, Map<String, String> filters) Get a List of Spark filter Expression.- Parameters:
schema
- table schemafilters
- filters in the format of a Map, where key is one of the table column name, and value is the specific value to be filtered on the column.- Returns:
- a List of filters in the format of Spark Expression.
-
toColumnName
-
caseSensitive
public static boolean caseSensitive(org.apache.spark.sql.SparkSession spark) -
executorLocations
-