Class SparkSchemaUtil
- java.lang.Object
-
- org.apache.iceberg.spark.SparkSchemaUtil
-
public class SparkSchemaUtil extends java.lang.Object
Helper methods for working with Spark/Hive metadata.
-
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static org.apache.spark.sql.types.StructType
convert(Schema schema)
Convert aSchema
to aSpark type
.static Schema
convert(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType)
Convert a Sparkstruct
to aSchema
based on the given schema.static Schema
convert(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType, boolean caseSensitive)
Convert a Sparkstruct
to aSchema
based on the given schema.static org.apache.spark.sql.types.DataType
convert(Type type)
Convert aType
to aSpark type
.static Type
convert(org.apache.spark.sql.types.DataType sparkType)
Convert a Sparkstruct
to aType
with new field ids.static Schema
convert(org.apache.spark.sql.types.StructType sparkType)
Convert a Sparkstruct
to aSchema
with new field ids.static Schema
convert(org.apache.spark.sql.types.StructType sparkType, boolean useTimestampWithoutZone)
Convert a Sparkstruct
to aSchema
with new field ids.static Schema
convertWithFreshIds(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType)
Convert a Sparkstruct
to aSchema
based on the given schema.static Schema
convertWithFreshIds(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType, boolean caseSensitive)
Convert a Sparkstruct
to aSchema
based on the given schema.static long
estimateSize(org.apache.spark.sql.types.StructType tableSchema, long totalRecords)
Estimate approximate table size based on Spark schema and total records.static java.util.Map<java.lang.Integer,java.lang.String>
indexQuotedNameById(Schema schema)
static Schema
prune(Schema schema, org.apache.spark.sql.types.StructType requestedType)
Prune columns from aSchema
using aSpark type
projection.static Schema
prune(Schema schema, org.apache.spark.sql.types.StructType requestedType, java.util.List<Expression> filters)
Prune columns from aSchema
using aSpark type
projection.static Schema
prune(Schema schema, org.apache.spark.sql.types.StructType requestedType, Expression filter, boolean caseSensitive)
Prune columns from aSchema
using aSpark type
projection.static Schema
schemaForTable(org.apache.spark.sql.SparkSession spark, java.lang.String name)
Returns aSchema
for the given table with fresh field ids.static PartitionSpec
specForTable(org.apache.spark.sql.SparkSession spark, java.lang.String name)
Returns aPartitionSpec
for the given table.static void
validateMetadataColumnReferences(Schema tableSchema, Schema readSchema)
-
-
-
Method Detail
-
schemaForTable
public static Schema schemaForTable(org.apache.spark.sql.SparkSession spark, java.lang.String name)
Returns aSchema
for the given table with fresh field ids.This creates a Schema for an existing table by looking up the table's schema with Spark and converting that schema. Spark/Hive partition columns are included in the schema.
- Parameters:
spark
- a Spark sessionname
- a table name and (optional) database- Returns:
- a Schema for the table, if found
-
specForTable
public static PartitionSpec specForTable(org.apache.spark.sql.SparkSession spark, java.lang.String name) throws org.apache.spark.sql.AnalysisException
Returns aPartitionSpec
for the given table.This creates a partition spec for an existing table by looking up the table's schema and creating a spec with identity partitions for each partition column.
- Parameters:
spark
- a Spark sessionname
- a table name and (optional) database- Returns:
- a PartitionSpec for the table
- Throws:
org.apache.spark.sql.AnalysisException
- if thrown by the Spark catalog
-
convert
public static org.apache.spark.sql.types.StructType convert(Schema schema)
Convert aSchema
to aSpark type
.- Parameters:
schema
- a Schema- Returns:
- the equivalent Spark type
- Throws:
java.lang.IllegalArgumentException
- if the type cannot be converted to Spark
-
convert
public static org.apache.spark.sql.types.DataType convert(Type type)
Convert aType
to aSpark type
.- Parameters:
type
- a Type- Returns:
- the equivalent Spark type
- Throws:
java.lang.IllegalArgumentException
- if the type cannot be converted to Spark
-
convert
public static Schema convert(org.apache.spark.sql.types.StructType sparkType)
Convert a Sparkstruct
to aSchema
with new field ids.This conversion assigns fresh ids.
Some data types are represented as the same Spark type. These are converted to a default type.
To convert using a reference schema for field ids and ambiguous types, use
convert(Schema, StructType)
.- Parameters:
sparkType
- a Spark StructType- Returns:
- the equivalent Schema
- Throws:
java.lang.IllegalArgumentException
- if the type cannot be converted
-
convert
public static Schema convert(org.apache.spark.sql.types.StructType sparkType, boolean useTimestampWithoutZone)
Convert a Sparkstruct
to aSchema
with new field ids.This conversion assigns fresh ids.
Some data types are represented as the same Spark type. These are converted to a default type.
To convert using a reference schema for field ids and ambiguous types, use
convert(Schema, StructType)
.- Parameters:
sparkType
- a Spark StructTypeuseTimestampWithoutZone
- boolean flag indicates that timestamp should be stored without timezone- Returns:
- the equivalent Schema
- Throws:
java.lang.IllegalArgumentException
- if the type cannot be converted
-
convert
public static Type convert(org.apache.spark.sql.types.DataType sparkType)
Convert a Sparkstruct
to aType
with new field ids.This conversion assigns fresh ids.
Some data types are represented as the same Spark type. These are converted to a default type.
To convert using a reference schema for field ids and ambiguous types, use
convert(Schema, StructType)
.- Parameters:
sparkType
- a Spark DataType- Returns:
- the equivalent Type
- Throws:
java.lang.IllegalArgumentException
- if the type cannot be converted
-
convert
public static Schema convert(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType)
Convert a Sparkstruct
to aSchema
based on the given schema.This conversion does not assign new ids; it uses ids from the base schema.
Data types, field order, and nullability will match the spark type. This conversion may return a schema that is not compatible with base schema.
- Parameters:
baseSchema
- a Schema on which conversion is basedsparkType
- a Spark StructType- Returns:
- the equivalent Schema
- Throws:
java.lang.IllegalArgumentException
- if the type cannot be converted or there are missing ids
-
convert
public static Schema convert(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType, boolean caseSensitive)
Convert a Sparkstruct
to aSchema
based on the given schema.This conversion does not assign new ids; it uses ids from the base schema.
Data types, field order, and nullability will match the spark type. This conversion may return a schema that is not compatible with base schema.
- Parameters:
baseSchema
- a Schema on which conversion is basedsparkType
- a Spark StructTypecaseSensitive
- when false, the case of schema fields is ignored- Returns:
- the equivalent Schema
- Throws:
java.lang.IllegalArgumentException
- if the type cannot be converted or there are missing ids
-
convertWithFreshIds
public static Schema convertWithFreshIds(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType)
Convert a Sparkstruct
to aSchema
based on the given schema.This conversion will assign new ids for fields that are not found in the base schema.
Data types, field order, and nullability will match the spark type. This conversion may return a schema that is not compatible with base schema.
- Parameters:
baseSchema
- a Schema on which conversion is basedsparkType
- a Spark StructType- Returns:
- the equivalent Schema
- Throws:
java.lang.IllegalArgumentException
- if the type cannot be converted or there are missing ids
-
convertWithFreshIds
public static Schema convertWithFreshIds(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType, boolean caseSensitive)
Convert a Sparkstruct
to aSchema
based on the given schema.This conversion will assign new ids for fields that are not found in the base schema.
Data types, field order, and nullability will match the spark type. This conversion may return a schema that is not compatible with base schema.
- Parameters:
baseSchema
- a Schema on which conversion is basedsparkType
- a Spark StructTypecaseSensitive
- when false, case of field names in schema is ignored- Returns:
- the equivalent Schema
- Throws:
java.lang.IllegalArgumentException
- if the type cannot be converted or there are missing ids
-
prune
public static Schema prune(Schema schema, org.apache.spark.sql.types.StructType requestedType)
Prune columns from aSchema
using aSpark type
projection.This requires that the Spark type is a projection of the Schema. Nullability and types must match.
- Parameters:
schema
- a SchemarequestedType
- a projection of the Spark representation of the Schema- Returns:
- a Schema corresponding to the Spark projection
- Throws:
java.lang.IllegalArgumentException
- if the Spark type does not match the Schema
-
prune
public static Schema prune(Schema schema, org.apache.spark.sql.types.StructType requestedType, java.util.List<Expression> filters)
Prune columns from aSchema
using aSpark type
projection.This requires that the Spark type is a projection of the Schema. Nullability and types must match.
The filters list of
Expression
is used to ensure that columns referenced by filters are projected.- Parameters:
schema
- a SchemarequestedType
- a projection of the Spark representation of the Schemafilters
- a list of filters- Returns:
- a Schema corresponding to the Spark projection
- Throws:
java.lang.IllegalArgumentException
- if the Spark type does not match the Schema
-
prune
public static Schema prune(Schema schema, org.apache.spark.sql.types.StructType requestedType, Expression filter, boolean caseSensitive)
Prune columns from aSchema
using aSpark type
projection.This requires that the Spark type is a projection of the Schema. Nullability and types must match.
The filters list of
Expression
is used to ensure that columns referenced by filters are projected.- Parameters:
schema
- a SchemarequestedType
- a projection of the Spark representation of the Schemafilter
- a filters- Returns:
- a Schema corresponding to the Spark projection
- Throws:
java.lang.IllegalArgumentException
- if the Spark type does not match the Schema
-
estimateSize
public static long estimateSize(org.apache.spark.sql.types.StructType tableSchema, long totalRecords)
Estimate approximate table size based on Spark schema and total records.- Parameters:
tableSchema
- Spark schematotalRecords
- total records in the table- Returns:
- approximate size based on table schema
-
validateMetadataColumnReferences
public static void validateMetadataColumnReferences(Schema tableSchema, Schema readSchema)
-
indexQuotedNameById
public static java.util.Map<java.lang.Integer,java.lang.String> indexQuotedNameById(Schema schema)
-
-