java.lang.Object
- org.apache.iceberg.spark.SparkUtil

public class SparkUtil
extends java.lang.Object

Field Summary

Fields
Modifier and Type Field Description

static java.lang.String TIMESTAMP_WITHOUT_TIMEZONE_ERROR

Method Summary

All Methods Static Methods Concrete Methods
Modifier and Type	Method	Description
`static <C,T> Pair<C,T>`	`catalogAndIdentifier(java.util.List<java.lang.String> nameParts, java.util.function.Function<java.lang.String,C> catalogProvider, java.util.function.BiFunction<java.lang.String[],java.lang.String,T> identiferProvider, C currentCatalog, java.lang.String[] currentNamespace)`	A modified version of Spark's LookupCatalog.CatalogAndIdentifier.unapply Attempts to find the catalog and identifier a multipart identifier represents
`static org.apache.hadoop.conf.Configuration`	`hadoopConfCatalogOverrides(org.apache.spark.sql.SparkSession spark, java.lang.String catalogName)`	Pulls any Catalog specific overrides for the Hadoop conf from the current SparkSession, which can be set via `spark.sql.catalog.$catalogName.hadoop.` Mirrors the override of hadoop configurations for a given spark session using `spark.hadoop.`.
`static boolean`	`hasTimestampWithoutZone(Schema schema)`	Responsible for checking if the table schema has a timestamp without timezone column
`static java.util.List<org.apache.spark.sql.catalyst.expressions.Expression>`	`partitionMapToExpression(org.apache.spark.sql.types.StructType schema, java.util.Map<java.lang.String,java.lang.String> filters)`	Get a List of Spark filter Expression.
`static FileIO`	`serializableFileIO(Table table)`
`static boolean`	`useTimestampWithoutZoneInNewTables(org.apache.spark.sql.RuntimeConfig sessionConf)`	Checks whether timestamp types for new tables should be stored with timezone info.
`static void`	`validatePartitionTransforms(PartitionSpec spec)`	Check whether the partition transforms in a spec can be used to write data.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - TIMESTAMP_WITHOUT_TIMEZONE_ERROR
```
public static final java.lang.String TIMESTAMP_WITHOUT_TIMEZONE_ERROR
```
- Method Detail
  - serializableFileIO
```
public static FileIO serializableFileIO(Table table)
```
  - validatePartitionTransforms
```
public static void validatePartitionTransforms(PartitionSpec spec)
```
    Check whether the partition transforms in a spec can be used to write data.
    
    Parameters:
    
    spec - a PartitionSpec
    
    Throws:
    
    java.lang.UnsupportedOperationException - if the spec contains unknown partition transforms
  - catalogAndIdentifier
```
public static <C,T> Pair<C,T> catalogAndIdentifier(java.util.List<java.lang.String> nameParts,
                                                               java.util.function.Function<java.lang.String,C> catalogProvider,
                                                               java.util.function.BiFunction<java.lang.String[],java.lang.String,T> identiferProvider,
                                                               C currentCatalog,
                                                               java.lang.String[] currentNamespace)
```
    A modified version of Spark's LookupCatalog.CatalogAndIdentifier.unapply Attempts to find the catalog and identifier a multipart identifier represents
    
    Parameters:
    
    nameParts - Multipart identifier representing a table
    
    Returns:
    
    The CatalogPlugin and Identifier for the table
  - hasTimestampWithoutZone
```
public static boolean hasTimestampWithoutZone(Schema schema)
```
    Responsible for checking if the table schema has a timestamp without timezone column
    
    Parameters:
    
    schema - table schema to check if it contains a timestamp without timezone column
    
    Returns:
    
    boolean indicating if the schema passed in has a timestamp field without a timezone
  - useTimestampWithoutZoneInNewTables
```
public static boolean useTimestampWithoutZoneInNewTables(org.apache.spark.sql.RuntimeConfig sessionConf)
```
    Checks whether timestamp types for new tables should be stored with timezone info.
    The default value is false and all timestamp fields are stored as Types.TimestampType.withZone(). If enabled, all timestamp fields in new tables will be stored as Types.TimestampType.withoutZone().
    
    Parameters:
    
    sessionConf - a Spark runtime config
    
    Returns:
    
    true if timestamp types for new tables should be stored with timezone info
  - hadoopConfCatalogOverrides
```
public static org.apache.hadoop.conf.Configuration hadoopConfCatalogOverrides(org.apache.spark.sql.SparkSession spark,
                                                                              java.lang.String catalogName)
```
    Pulls any Catalog specific overrides for the Hadoop conf from the current SparkSession, which can be set via `spark.sql.catalog.$catalogName.hadoop.*` Mirrors the override of hadoop configurations for a given spark session using `spark.hadoop.*`. The SparkCatalog allows for hadoop configurations to be overridden per catalog, by setting them on the SQLConf, where the following will add the property "fs.default.name" with value "hdfs://hanksnamenode:8020" to the catalog's hadoop configuration. SparkSession.builder() .config(s"spark.sql.catalog.$catalogName.hadoop.fs.default.name", "hdfs://hanksnamenode:8020") .getOrCreate()
    
    Parameters:
    
    spark - The current Spark session
    
    catalogName - Name of the catalog to find overrides for.
    
    Returns:
    
    the Hadoop Configuration that should be used for this catalog, with catalog specific overrides applied.
  - partitionMapToExpression
```
public static java.util.List<org.apache.spark.sql.catalyst.expressions.Expression> partitionMapToExpression(org.apache.spark.sql.types.StructType schema,
                                                                                                            java.util.Map<java.lang.String,java.lang.String> filters)
```
    Get a List of Spark filter Expression.
    
    Parameters:
    
    schema - table schema
    
    filters - filters in the format of a Map, where key is one of the table column name, and value is the specific value to be filtered on the column.
    
    Returns:
    
    a List of filters in the format of Spark Expression.

Class SparkUtil

Field Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

TIMESTAMP_WITHOUT_TIMEZONE_ERROR

Method Detail

serializableFileIO

validatePartitionTransforms

catalogAndIdentifier

hasTimestampWithoutZone

useTimestampWithoutZoneInNewTables

hadoopConfCatalogOverrides

partitionMapToExpression