java.lang.Object

org.apache.iceberg.spark.SparkSchemaUtil

public class SparkSchemaUtil extends Object

Helper methods for working with Spark/Hive metadata.

Method Summary

Modifier and Type

Method

Description

static org.apache.spark.sql.types.StructType

convert(Schema schema)

Convert a Schema to a Spark type.

static Schema

convert(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType)

Convert a Spark struct to a Schema based on the given schema.

static Schema

convert(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType, boolean caseSensitive)

Convert a Spark struct to a Schema based on the given schema.

static org.apache.spark.sql.types.DataType

convert(Type type)

Convert a Type to a Spark type.

static Type

convert(org.apache.spark.sql.types.DataType sparkType)

Convert a Spark struct to a Type with new field ids.

static Schema

convert(org.apache.spark.sql.types.StructType sparkType)

Convert a Spark struct to a Schema with new field ids.

static Schema

convertWithFreshIds(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType)

Convert a Spark struct to a Schema based on the given schema.

static Schema

convertWithFreshIds(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType, boolean caseSensitive)

Convert a Spark struct to a Schema based on the given schema.

static long

estimateSize(org.apache.spark.sql.types.StructType tableSchema, long totalRecords)

Estimate approximate table size based on Spark schema and total records.

static Map<Integer,String>

indexQuotedNameById(Schema schema)

static Schema

prune(Schema schema, org.apache.spark.sql.types.StructType requestedType)

Prune columns from a Schema using a Spark type projection.

static Schema

prune(Schema schema, org.apache.spark.sql.types.StructType requestedType, List<Expression> filters)

Prune columns from a Schema using a Spark type projection.

static Schema

prune(Schema schema, org.apache.spark.sql.types.StructType requestedType, Expression filter, boolean caseSensitive)

Prune columns from a Schema using a Spark type projection.

static Schema

schemaForTable(org.apache.spark.sql.SparkSession spark, String name)

Returns a Schema for the given table with fresh field ids.

static PartitionSpec

specForTable(org.apache.spark.sql.SparkSession spark, String name)

Returns a PartitionSpec for the given table.

static void

validateMetadataColumnReferences(Schema tableSchema, Schema readSchema)

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Method Details
- schemaForTable
  
  public static Schema schemaForTable(org.apache.spark.sql.SparkSession spark, String name)
  
  Returns a Schema for the given table with fresh field ids.
  This creates a Schema for an existing table by looking up the table's schema with Spark and converting that schema. Spark/Hive partition columns are included in the schema.
  
  Parameters:
  
  spark - a Spark session
  
  name - a table name and (optional) database
  
  Returns:
  
  a Schema for the table, if found
- specForTable
  
  public static PartitionSpec specForTable(org.apache.spark.sql.SparkSession spark, String name) throws org.apache.spark.sql.AnalysisException
  
  Returns a PartitionSpec for the given table.
  This creates a partition spec for an existing table by looking up the table's schema and creating a spec with identity partitions for each partition column.
  
  Parameters:
  
  spark - a Spark session
  
  name - a table name and (optional) database
  
  Returns:
  
  a PartitionSpec for the table
  
  Throws:
  
  org.apache.spark.sql.AnalysisException - if thrown by the Spark catalog
- convert
  
  public static org.apache.spark.sql.types.StructType convert(Schema schema)
  
  Convert a Schema to a Spark type.
  
  Parameters:
  
  schema - a Schema
  
  Returns:
  
  the equivalent Spark type
  
  Throws:
  
  IllegalArgumentException - if the type cannot be converted to Spark
- convert
  
  public static org.apache.spark.sql.types.DataType convert(Type type)
  
  Convert a Type to a Spark type.
  
  Parameters:
  
  type - a Type
  
  Returns:
  
  the equivalent Spark type
  
  Throws:
  
  IllegalArgumentException - if the type cannot be converted to Spark
- convert
  
  public static Schema convert(org.apache.spark.sql.types.StructType sparkType)
  
  Convert a Spark struct to a Schema with new field ids.
  This conversion assigns fresh ids.
  Some data types are represented as the same Spark type. These are converted to a default type.
  To convert using a reference schema for field ids and ambiguous types, use convert(Schema, StructType).
  
  Parameters:
  
  sparkType - a Spark StructType
  
  Returns:
  
  the equivalent Schema
  
  Throws:
  
  IllegalArgumentException - if the type cannot be converted
- convert
  
  public static Type convert(org.apache.spark.sql.types.DataType sparkType)
  
  Convert a Spark struct to a Type with new field ids.
  This conversion assigns fresh ids.
  Some data types are represented as the same Spark type. These are converted to a default type.
  To convert using a reference schema for field ids and ambiguous types, use convert(Schema, StructType).
  
  Parameters:
  
  sparkType - a Spark DataType
  
  Returns:
  
  the equivalent Type
  
  Throws:
  
  IllegalArgumentException - if the type cannot be converted
- convert
  
  public static Schema convert(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType)
  
  Convert a Spark struct to a Schema based on the given schema.
  This conversion does not assign new ids; it uses ids from the base schema.
  Data types, field order, and nullability will match the spark type. This conversion may return a schema that is not compatible with base schema.
  
  Parameters:
  
  baseSchema - a Schema on which conversion is based
  
  sparkType - a Spark StructType
  
  Returns:
  
  the equivalent Schema
  
  Throws:
  
  IllegalArgumentException - if the type cannot be converted or there are missing ids
- convert
  
  public static Schema convert(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType, boolean caseSensitive)
  
  Convert a Spark struct to a Schema based on the given schema.
  This conversion does not assign new ids; it uses ids from the base schema.
  Data types, field order, and nullability will match the spark type. This conversion may return a schema that is not compatible with base schema.
  
  Parameters:
  
  baseSchema - a Schema on which conversion is based
  
  sparkType - a Spark StructType
  
  caseSensitive - when false, the case of schema fields is ignored
  
  Returns:
  
  the equivalent Schema
  
  Throws:
  
  IllegalArgumentException - if the type cannot be converted or there are missing ids
- convertWithFreshIds
  
  public static Schema convertWithFreshIds(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType)
  
  Convert a Spark struct to a Schema based on the given schema.
  This conversion will assign new ids for fields that are not found in the base schema.
  Data types, field order, and nullability will match the spark type. This conversion may return a schema that is not compatible with base schema.
  
  Parameters:
  
  baseSchema - a Schema on which conversion is based
  
  sparkType - a Spark StructType
  
  Returns:
  
  the equivalent Schema
  
  Throws:
  
  IllegalArgumentException - if the type cannot be converted or there are missing ids
- convertWithFreshIds
  
  public static Schema convertWithFreshIds(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType, boolean caseSensitive)
  
  Convert a Spark struct to a Schema based on the given schema.
  This conversion will assign new ids for fields that are not found in the base schema.
  Data types, field order, and nullability will match the spark type. This conversion may return a schema that is not compatible with base schema.
  
  Parameters:
  
  baseSchema - a Schema on which conversion is based
  
  sparkType - a Spark StructType
  
  caseSensitive - when false, case of field names in schema is ignored
  
  Returns:
  
  the equivalent Schema
  
  Throws:
  
  IllegalArgumentException - if the type cannot be converted or there are missing ids
- prune
  
  public static Schema prune(Schema schema, org.apache.spark.sql.types.StructType requestedType)
  
  Prune columns from a Schema using a Spark type projection.
  This requires that the Spark type is a projection of the Schema. Nullability and types must match.
  
  Parameters:
  
  schema - a Schema
  
  requestedType - a projection of the Spark representation of the Schema
  
  Returns:
  
  a Schema corresponding to the Spark projection
  
  Throws:
  
  IllegalArgumentException - if the Spark type does not match the Schema
- prune
  
  public static Schema prune(Schema schema, org.apache.spark.sql.types.StructType requestedType, List<Expression> filters)
  
  Prune columns from a Schema using a Spark type projection.
  This requires that the Spark type is a projection of the Schema. Nullability and types must match.
  The filters list of Expression is used to ensure that columns referenced by filters are projected.
  
  Parameters:
  
  schema - a Schema
  
  requestedType - a projection of the Spark representation of the Schema
  
  filters - a list of filters
  
  Returns:
  
  a Schema corresponding to the Spark projection
  
  Throws:
  
  IllegalArgumentException - if the Spark type does not match the Schema
- prune
  
  public static Schema prune(Schema schema, org.apache.spark.sql.types.StructType requestedType, Expression filter, boolean caseSensitive)
  
  Prune columns from a Schema using a Spark type projection.
  This requires that the Spark type is a projection of the Schema. Nullability and types must match.
  The filters list of Expression is used to ensure that columns referenced by filters are projected.
  
  Parameters:
  
  schema - a Schema
  
  requestedType - a projection of the Spark representation of the Schema
  
  filter - a filters
  
  Returns:
  
  a Schema corresponding to the Spark projection
  
  Throws:
  
  IllegalArgumentException - if the Spark type does not match the Schema
- estimateSize
  
  public static long estimateSize(org.apache.spark.sql.types.StructType tableSchema, long totalRecords)
  
  Estimate approximate table size based on Spark schema and total records.
  
  Parameters:
  
  tableSchema - Spark schema
  
  totalRecords - total records in the table
  
  Returns:
  
  approximate size based on table schema
- validateMetadataColumnReferences
  
  public static void validateMetadataColumnReferences(Schema tableSchema, Schema readSchema)
- indexQuotedNameById
  
  public static Map<Integer,String> indexQuotedNameById(Schema schema)

Class SparkSchemaUtil

Method Summary

Methods inherited from class java.lang.Object

Method Details

schemaForTable

specForTable

convert

convert

convert

convert

convert

convert

convertWithFreshIds

convertWithFreshIds

prune

prune

prune

estimateSize

validateMetadataColumnReferences

indexQuotedNameById