java.lang.Object

org.apache.iceberg.spark.SparkUtil

public class SparkUtil extends Object

Method Summary

Modifier and Type

Method

Description

static boolean

caseSensitive(org.apache.spark.sql.SparkSession spark)

static <C, T> Pair<C,T>

catalogAndIdentifier(List<String> nameParts, Function<String,C> catalogProvider, BiFunction<String[],String,T> identiferProvider, C currentCatalog, String[] currentNamespace)

A modified version of Spark's LookupCatalog.CatalogAndIdentifier.unapply Attempts to find the catalog and identifier a multipart identifier represents

static List<String>

executorLocations()

static org.apache.hadoop.conf.Configuration

hadoopConfCatalogOverrides(org.apache.spark.sql.SparkSession spark, String catalogName)

Pulls any Catalog specific overrides for the Hadoop conf from the current SparkSession, which can be set via `spark.sql.catalog.

static List<org.apache.spark.sql.catalyst.expressions.Expression>

partitionMapToExpression(org.apache.spark.sql.types.StructType schema, Map<String,String> filters)

Get a List of Spark filter Expression.

static String

toColumnName(org.apache.spark.sql.connector.expressions.NamedReference ref)

static void

validatePartitionTransforms(PartitionSpec spec)

Check whether the partition transforms in a spec can be used to write data.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Method Details
- validatePartitionTransforms
  
  public static void validatePartitionTransforms(PartitionSpec spec)
  
  Check whether the partition transforms in a spec can be used to write data.
  
  Parameters:
  
  spec - a PartitionSpec
  
  Throws:
  
  UnsupportedOperationException - if the spec contains unknown partition transforms
- catalogAndIdentifier
  
  public static <C, T> Pair<C,T> catalogAndIdentifier(List<String> nameParts, Function<String,C> catalogProvider, BiFunction<String[],String,T> identiferProvider, C currentCatalog, String[] currentNamespace)
  
  A modified version of Spark's LookupCatalog.CatalogAndIdentifier.unapply Attempts to find the catalog and identifier a multipart identifier represents
  
  Parameters:
  
  nameParts - Multipart identifier representing a table
  
  Returns:
  
  The CatalogPlugin and Identifier for the table
- hadoopConfCatalogOverrides
  
  public static org.apache.hadoop.conf.Configuration hadoopConfCatalogOverrides(org.apache.spark.sql.SparkSession spark, String catalogName)
  
  Pulls any Catalog specific overrides for the Hadoop conf from the current SparkSession, which can be set via `spark.sql.catalog.$catalogName.hadoop.*`
  Mirrors the override of hadoop configurations for a given spark session using `spark.hadoop.*`.
  The SparkCatalog allows for hadoop configurations to be overridden per catalog, by setting them on the SQLConf, where the following will add the property "fs.default.name" with value "hdfs://hanksnamenode:8020" to the catalog's hadoop configuration. SparkSession.builder() .config(s"spark.sql.catalog.$catalogName.hadoop.fs.default.name", "hdfs://hanksnamenode:8020") .getOrCreate()
  
  Parameters:
  
  spark - The current Spark session
  
  catalogName - Name of the catalog to find overrides for.
  
  Returns:
  
  the Hadoop Configuration that should be used for this catalog, with catalog specific overrides applied.
- partitionMapToExpression
  
  public static List<org.apache.spark.sql.catalyst.expressions.Expression> partitionMapToExpression(org.apache.spark.sql.types.StructType schema, Map<String,String> filters)
  
  Get a List of Spark filter Expression.
  
  Parameters:
  
  schema - table schema
  
  filters - filters in the format of a Map, where key is one of the table column name, and value is the specific value to be filtered on the column.
  
  Returns:
  
  a List of filters in the format of Spark Expression.
- toColumnName
  
  public static String toColumnName(org.apache.spark.sql.connector.expressions.NamedReference ref)
- caseSensitive
  
  public static boolean caseSensitive(org.apache.spark.sql.SparkSession spark)
- executorLocations
  
  public static List<String> executorLocations()

Class SparkUtil

Method Summary

Methods inherited from class java.lang.Object

Method Details

validatePartitionTransforms

catalogAndIdentifier

hadoopConfCatalogOverrides

partitionMapToExpression

toColumnName

caseSensitive

executorLocations