SparkSchemaUtil

java.lang.Object
- org.apache.iceberg.spark.SparkSchemaUtil

public class SparkSchemaUtil
extends java.lang.Object

Helper methods for working with Spark/Hive metadata.

Method Summary

All Methods Static Methods Concrete Methods
Modifier and Type	Method and Description
`static Type`	`convert(org.apache.spark.sql.types.DataType sparkType)` Convert a Spark `struct` to a `Type` with new field ids.
`static org.apache.spark.sql.types.StructType`	`convert(Schema schema)` Convert a `Schema` to a `Spark type`.
`static Schema`	`convert(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType)` Convert a Spark `struct` to a `Schema` based on the given schema.
`static Schema`	`convert(org.apache.spark.sql.types.StructType sparkType)` Convert a Spark `struct` to a `Schema` with new field ids.
`static org.apache.spark.sql.types.DataType`	`convert(Type type)` Convert a `Type` to a `Spark type`.
`static long`	`estimateSize(org.apache.spark.sql.types.StructType tableSchema, long totalRecords)` estimate approximate table size based on spark schema and total records.
`static Schema`	`prune(Schema schema, org.apache.spark.sql.types.StructType requestedType)` Prune columns from a `Schema` using a `Spark type` projection.
`static Schema`	`prune(Schema schema, org.apache.spark.sql.types.StructType requestedType, Expression filter, boolean caseSensitive)` Prune columns from a `Schema` using a `Spark type` projection.
`static Schema`	`prune(Schema schema, org.apache.spark.sql.types.StructType requestedType, java.util.List<Expression> filters)` Prune columns from a `Schema` using a `Spark type` projection.
`static Schema`	`schemaForTable(org.apache.spark.sql.SparkSession spark, java.lang.String name)` Returns a `Schema` for the given table with fresh field ids.
`static PartitionSpec`	`specForTable(org.apache.spark.sql.SparkSession spark, java.lang.String name)` Returns a `PartitionSpec` for the given table.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Method Detail
  - schemaForTable
```
public static Schema schemaForTable(org.apache.spark.sql.SparkSession spark,
                                    java.lang.String name)
```
    Returns a Schema for the given table with fresh field ids.
    This creates a Schema for an existing table by looking up the table's schema with Spark and converting that schema. Spark/Hive partition columns are included in the schema.
    
    Parameters:
    
    spark - a Spark session
    
    name - a table name and (optional) database
    
    Returns:
    
    a Schema for the table, if found
  - specForTable
```
public static PartitionSpec specForTable(org.apache.spark.sql.SparkSession spark,
                                         java.lang.String name)
                                  throws org.apache.spark.sql.AnalysisException
```
    Returns a PartitionSpec for the given table.
    This creates a partition spec for an existing table by looking up the table's schema and creating a spec with identity partitions for each partition column.
    
    Parameters:
    
    spark - a Spark session
    
    name - a table name and (optional) database
    
    Returns:
    
    a PartitionSpec for the table
    
    Throws:
    
    org.apache.spark.sql.AnalysisException - if thrown by the Spark catalog
  - convert
```
public static org.apache.spark.sql.types.StructType convert(Schema schema)
```
    Convert a Schema to a Spark type.
    
    Parameters:
    
    schema - a Schema
    
    Returns:
    
    the equivalent Spark type
    
    Throws:
    
    java.lang.IllegalArgumentException - if the type cannot be converted to Spark
  - convert
```
public static org.apache.spark.sql.types.DataType convert(Type type)
```
    Convert a Type to a Spark type.
    
    Parameters:
    
    type - a Type
    
    Returns:
    
    the equivalent Spark type
    
    Throws:
    
    java.lang.IllegalArgumentException - if the type cannot be converted to Spark
  - convert
```
public static Schema convert(org.apache.spark.sql.types.StructType sparkType)
```
    Convert a Spark struct to a Schema with new field ids.
    This conversion assigns fresh ids.
    Some data types are represented as the same Spark type. These are converted to a default type.
    To convert using a reference schema for field ids and ambiguous types, use convert(Schema, StructType).
    
    Parameters:
    
    sparkType - a Spark StructType
    
    Returns:
    
    the equivalent Schema
    
    Throws:
    
    java.lang.IllegalArgumentException - if the type cannot be converted
  - convert
```
public static Type convert(org.apache.spark.sql.types.DataType sparkType)
```
    Convert a Spark struct to a Type with new field ids.
    This conversion assigns fresh ids.
    Some data types are represented as the same Spark type. These are converted to a default type.
    To convert using a reference schema for field ids and ambiguous types, use convert(Schema, StructType).
    
    Parameters:
    
    sparkType - a Spark DataType
    
    Returns:
    
    the equivalent Type
    
    Throws:
    
    java.lang.IllegalArgumentException - if the type cannot be converted
  - convert
```
public static Schema convert(Schema baseSchema,
                             org.apache.spark.sql.types.StructType sparkType)
```
    Convert a Spark struct to a Schema based on the given schema.
    This conversion does not assign new ids; it uses ids from the base schema.
    Data types, field order, and nullability will match the spark type. This conversion may return a schema that is not compatible with base schema.
    
    Parameters:
    
    baseSchema - a Schema on which conversion is based
    
    sparkType - a Spark StructType
    
    Returns:
    
    the equivalent Schema
    
    Throws:
    
    java.lang.IllegalArgumentException - if the type cannot be converted or there are missing ids
  - prune
```
public static Schema prune(Schema schema,
                           org.apache.spark.sql.types.StructType requestedType)
```
    Prune columns from a Schema using a Spark type projection.
    This requires that the Spark type is a projection of the Schema. Nullability and types must match.
    
    Parameters:
    
    schema - a Schema
    
    requestedType - a projection of the Spark representation of the Schema
    
    Returns:
    
    a Schema corresponding to the Spark projection
    
    Throws:
    
    java.lang.IllegalArgumentException - if the Spark type does not match the Schema
  - prune
```
public static Schema prune(Schema schema,
                           org.apache.spark.sql.types.StructType requestedType,
                           java.util.List<Expression> filters)
```
    Prune columns from a Schema using a Spark type projection.
    This requires that the Spark type is a projection of the Schema. Nullability and types must match.
    The filters list of Expression is used to ensure that columns referenced by filters are projected.
    
    Parameters:
    
    schema - a Schema
    
    requestedType - a projection of the Spark representation of the Schema
    
    filters - a list of filters
    
    Returns:
    
    a Schema corresponding to the Spark projection
    
    Throws:
    
    java.lang.IllegalArgumentException - if the Spark type does not match the Schema
  - prune
```
public static Schema prune(Schema schema,
                           org.apache.spark.sql.types.StructType requestedType,
                           Expression filter,
                           boolean caseSensitive)
```
    Prune columns from a Schema using a Spark type projection.
    This requires that the Spark type is a projection of the Schema. Nullability and types must match.
    The filters list of Expression is used to ensure that columns referenced by filters are projected.
    
    Parameters:
    
    schema - a Schema
    
    requestedType - a projection of the Spark representation of the Schema
    
    filter - a filters
    
    Returns:
    
    a Schema corresponding to the Spark projection
    
    Throws:
    
    java.lang.IllegalArgumentException - if the Spark type does not match the Schema
  - estimateSize
```
public static long estimateSize(org.apache.spark.sql.types.StructType tableSchema,
                                long totalRecords)
```
    estimate approximate table size based on spark schema and total records.
    
    Parameters:
    
    tableSchema - spark schema
    
    totalRecords - total records in the table
    
    Returns:
    
    approxiate size based on table schema

Class SparkSchemaUtil

Method Summary

Methods inherited from class java.lang.Object

Method Detail

schemaForTable

specForTable

convert

convert

convert

convert

convert

prune

prune

prune

estimateSize