SparkUtil

java.lang.Object
- org.apache.iceberg.spark.SparkUtil

public class SparkUtil
extends java.lang.Object

Field Summary

Fields
Modifier and Type Field and Description

static java.lang.String TIMESTAMP_WITHOUT_TIMEZONE_ERROR

Fields
Modifier and Type	Field and Description
`static java.lang.String`	`TIMESTAMP_WITHOUT_TIMEZONE_ERROR`

Method Summary

All Methods Static Methods Concrete Methods
Modifier and Type	Method and Description
`static <C,T> Pair<C,T>`	`catalogAndIdentifier(java.util.List<java.lang.String> nameParts, java.util.function.Function<java.lang.String,C> catalogProvider, java.util.function.BiFunction<java.lang.String[],java.lang.String,T> identiferProvider, C currentCatalog, java.lang.String[] currentNamespace)` A modified version of Spark's LookupCatalog.CatalogAndIdentifier.unapply Attempts to find the catalog and identifier a multipart identifier represents
`static org.apache.hadoop.conf.Configuration`	`hadoopConfCatalogOverrides(org.apache.spark.sql.SparkSession spark, java.lang.String catalogName)` Pulls any Catalog specific overrides for the Hadoop conf from the current SparkSession, which can be set via `spark.sql.catalog.$catalogName.hadoop.` Mirrors the override of hadoop configurations for a given spark session using `spark.hadoop.`.
`static boolean`	`hasTimestampWithoutZone(Schema schema)` Responsible for checking if the table schema has a timestamp without timezone column
`static java.util.List<org.apache.spark.sql.catalyst.expressions.Expression>`	`partitionMapToExpression(org.apache.spark.sql.types.StructType schema, java.util.Map<java.lang.String,java.lang.String> filters)` Get a List of Spark filter Expression.
`static FileIO`	`serializableFileIO(Table table)`
`static boolean`	`useTimestampWithoutZoneInNewTables(org.apache.spark.sql.RuntimeConfig sessionConf)` Checks whether timestamp types for new tables should be stored with timezone info.
`static void`	`validatePartitionTransforms(PartitionSpec spec)` Check whether the partition transforms in a spec can be used to write data.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - TIMESTAMP_WITHOUT_TIMEZONE_ERROR
```
public static final java.lang.String TIMESTAMP_WITHOUT_TIMEZONE_ERROR
```
- Method Detail
  - serializableFileIO
```
public static FileIO serializableFileIO(Table table)
```
  - validatePartitionTransforms
```
public static void validatePartitionTransforms(PartitionSpec spec)
```
    Check whether the partition transforms in a spec can be used to write data.
    
    Parameters:
    
    spec - a PartitionSpec
    
    Throws:
    
    java.lang.UnsupportedOperationException - if the spec contains unknown partition transforms
  - catalogAndIdentifier
```
public static <C,T> Pair<C,T> catalogAndIdentifier(java.util.List<java.lang.String> nameParts,
                                                   java.util.function.Function<java.lang.String,C> catalogProvider,
                                                   java.util.function.BiFunction<java.lang.String[],java.lang.String,T> identiferProvider,
                                                   C currentCatalog,
                                                   java.lang.String[] currentNamespace)
```
    A modified version of Spark's LookupCatalog.CatalogAndIdentifier.unapply Attempts to find the catalog and identifier a multipart identifier represents
    
    Parameters:
    
    nameParts - Multipart identifier representing a table
    
    Returns:
    
    The CatalogPlugin and Identifier for the table
  - hasTimestampWithoutZone
```
public static boolean hasTimestampWithoutZone(Schema schema)
```
    Responsible for checking if the table schema has a timestamp without timezone column
    
    Parameters:
    
    schema - table schema to check if it contains a timestamp without timezone column
    
    Returns:
    
    boolean indicating if the schema passed in has a timestamp field without a timezone
  - useTimestampWithoutZoneInNewTables
```
public static boolean useTimestampWithoutZoneInNewTables(org.apache.spark.sql.RuntimeConfig sessionConf)
```
    Checks whether timestamp types for new tables should be stored with timezone info.
    The default value is false and all timestamp fields are stored as Types.TimestampType#withZone(). If enabled, all timestamp fields in new tables will be stored as Types.TimestampType#withoutZone().
    
    Parameters:
    
    sessionConf - a Spark runtime config
    
    Returns:
    
    true if timestamp types for new tables should be stored with timezone info
  - hadoopConfCatalogOverrides
```
public static org.apache.hadoop.conf.Configuration hadoopConfCatalogOverrides(org.apache.spark.sql.SparkSession spark,
                                                                              java.lang.String catalogName)
```
    Pulls any Catalog specific overrides for the Hadoop conf from the current SparkSession, which can be set via `spark.sql.catalog.$catalogName.hadoop.*` Mirrors the override of hadoop configurations for a given spark session using `spark.hadoop.*`. The SparkCatalog allows for hadoop configurations to be overridden per catalog, by setting them on the SQLConf, where the following will add the property "fs.default.name" with value "hdfs://hanksnamenode:8020" to the catalog's hadoop configuration. SparkSession.builder() .config(s"spark.sql.catalog.$catalogName.hadoop.fs.default.name", "hdfs://hanksnamenode:8020") .getOrCreate()
    
    Parameters:
    
    spark - The current Spark session
    
    catalogName - Name of the catalog to find overrides for.
    
    Returns:
    
    the Hadoop Configuration that should be used for this catalog, with catalog specific overrides applied.
  - partitionMapToExpression
```
public static java.util.List<org.apache.spark.sql.catalyst.expressions.Expression> partitionMapToExpression(org.apache.spark.sql.types.StructType schema,
                                                                                                            java.util.Map<java.lang.String,java.lang.String> filters)
```
    Get a List of Spark filter Expression.
    
    Parameters:
    
    schema - table schema
    
    filters - filters in the format of a Map, where key is one of the table column name, and value is the specific value to be filtered on the column.
    
    Returns:
    
    a List of filters in the format of Spark Expression.

Class SparkUtil

Field Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

TIMESTAMP_WITHOUT_TIMEZONE_ERROR

Method Detail

serializableFileIO

validatePartitionTransforms

catalogAndIdentifier

hasTimestampWithoutZone

useTimestampWithoutZoneInNewTables

hadoopConfCatalogOverrides

partitionMapToExpression