Class SparkSchemaUtil
- java.lang.Object
-
- org.apache.iceberg.spark.SparkSchemaUtil
-
public class SparkSchemaUtil extends java.lang.ObjectHelper methods for working with Spark/Hive metadata.
-
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static org.apache.spark.sql.types.StructTypeconvert(Schema schema)Convert aSchemato aSpark type.static Schemaconvert(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType)Convert a Sparkstructto aSchemabased on the given schema.static org.apache.spark.sql.types.DataTypeconvert(Type type)Convert aTypeto aSpark type.static Typeconvert(org.apache.spark.sql.types.DataType sparkType)Convert a Sparkstructto aTypewith new field ids.static Schemaconvert(org.apache.spark.sql.types.StructType sparkType)Convert a Sparkstructto aSchemawith new field ids.static Schemaconvert(org.apache.spark.sql.types.StructType sparkType, boolean useTimestampWithoutZone)Convert a Sparkstructto aSchemawith new field ids.static longestimateSize(org.apache.spark.sql.types.StructType tableSchema, long totalRecords)estimate approximate table size based on spark schema and total records.static Schemaprune(Schema schema, org.apache.spark.sql.types.StructType requestedType)Prune columns from aSchemausing aSpark typeprojection.static Schemaprune(Schema schema, org.apache.spark.sql.types.StructType requestedType, java.util.List<Expression> filters)Prune columns from aSchemausing aSpark typeprojection.static Schemaprune(Schema schema, org.apache.spark.sql.types.StructType requestedType, Expression filter, boolean caseSensitive)Prune columns from aSchemausing aSpark typeprojection.static SchemaschemaForTable(org.apache.spark.sql.SparkSession spark, java.lang.String name)Returns aSchemafor the given table with fresh field ids.static PartitionSpecspecForTable(org.apache.spark.sql.SparkSession spark, java.lang.String name)Returns aPartitionSpecfor the given table.
-
-
-
Method Detail
-
schemaForTable
public static Schema schemaForTable(org.apache.spark.sql.SparkSession spark, java.lang.String name)
Returns aSchemafor the given table with fresh field ids.This creates a Schema for an existing table by looking up the table's schema with Spark and converting that schema. Spark/Hive partition columns are included in the schema.
- Parameters:
spark- a Spark sessionname- a table name and (optional) database- Returns:
- a Schema for the table, if found
-
specForTable
public static PartitionSpec specForTable(org.apache.spark.sql.SparkSession spark, java.lang.String name) throws org.apache.spark.sql.AnalysisException
Returns aPartitionSpecfor the given table.This creates a partition spec for an existing table by looking up the table's schema and creating a spec with identity partitions for each partition column.
- Parameters:
spark- a Spark sessionname- a table name and (optional) database- Returns:
- a PartitionSpec for the table
- Throws:
org.apache.spark.sql.AnalysisException- if thrown by the Spark catalog
-
convert
public static org.apache.spark.sql.types.StructType convert(Schema schema)
Convert aSchemato aSpark type.- Parameters:
schema- a Schema- Returns:
- the equivalent Spark type
- Throws:
java.lang.IllegalArgumentException- if the type cannot be converted to Spark
-
convert
public static org.apache.spark.sql.types.DataType convert(Type type)
Convert aTypeto aSpark type.- Parameters:
type- a Type- Returns:
- the equivalent Spark type
- Throws:
java.lang.IllegalArgumentException- if the type cannot be converted to Spark
-
convert
public static Schema convert(org.apache.spark.sql.types.StructType sparkType)
Convert a Sparkstructto aSchemawith new field ids.This conversion assigns fresh ids.
Some data types are represented as the same Spark type. These are converted to a default type.
To convert using a reference schema for field ids and ambiguous types, use
convert(Schema, StructType).- Parameters:
sparkType- a Spark StructType- Returns:
- the equivalent Schema
- Throws:
java.lang.IllegalArgumentException- if the type cannot be converted
-
convert
public static Schema convert(org.apache.spark.sql.types.StructType sparkType, boolean useTimestampWithoutZone)
Convert a Sparkstructto aSchemawith new field ids.This conversion assigns fresh ids.
Some data types are represented as the same Spark type. These are converted to a default type.
To convert using a reference schema for field ids and ambiguous types, use
convert(Schema, StructType).- Parameters:
sparkType- a Spark StructTypeuseTimestampWithoutZone- boolean flag indicates that timestamp should be stored without timezone- Returns:
- the equivalent Schema
- Throws:
java.lang.IllegalArgumentException- if the type cannot be converted
-
convert
public static Type convert(org.apache.spark.sql.types.DataType sparkType)
Convert a Sparkstructto aTypewith new field ids.This conversion assigns fresh ids.
Some data types are represented as the same Spark type. These are converted to a default type.
To convert using a reference schema for field ids and ambiguous types, use
convert(Schema, StructType).- Parameters:
sparkType- a Spark DataType- Returns:
- the equivalent Type
- Throws:
java.lang.IllegalArgumentException- if the type cannot be converted
-
convert
public static Schema convert(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType)
Convert a Sparkstructto aSchemabased on the given schema.This conversion does not assign new ids; it uses ids from the base schema.
Data types, field order, and nullability will match the spark type. This conversion may return a schema that is not compatible with base schema.
- Parameters:
baseSchema- a Schema on which conversion is basedsparkType- a Spark StructType- Returns:
- the equivalent Schema
- Throws:
java.lang.IllegalArgumentException- if the type cannot be converted or there are missing ids
-
prune
public static Schema prune(Schema schema, org.apache.spark.sql.types.StructType requestedType)
Prune columns from aSchemausing aSpark typeprojection.This requires that the Spark type is a projection of the Schema. Nullability and types must match.
- Parameters:
schema- a SchemarequestedType- a projection of the Spark representation of the Schema- Returns:
- a Schema corresponding to the Spark projection
- Throws:
java.lang.IllegalArgumentException- if the Spark type does not match the Schema
-
prune
public static Schema prune(Schema schema, org.apache.spark.sql.types.StructType requestedType, java.util.List<Expression> filters)
Prune columns from aSchemausing aSpark typeprojection.This requires that the Spark type is a projection of the Schema. Nullability and types must match.
The filters list of
Expressionis used to ensure that columns referenced by filters are projected.- Parameters:
schema- a SchemarequestedType- a projection of the Spark representation of the Schemafilters- a list of filters- Returns:
- a Schema corresponding to the Spark projection
- Throws:
java.lang.IllegalArgumentException- if the Spark type does not match the Schema
-
prune
public static Schema prune(Schema schema, org.apache.spark.sql.types.StructType requestedType, Expression filter, boolean caseSensitive)
Prune columns from aSchemausing aSpark typeprojection.This requires that the Spark type is a projection of the Schema. Nullability and types must match.
The filters list of
Expressionis used to ensure that columns referenced by filters are projected.- Parameters:
schema- a SchemarequestedType- a projection of the Spark representation of the Schemafilter- a filters- Returns:
- a Schema corresponding to the Spark projection
- Throws:
java.lang.IllegalArgumentException- if the Spark type does not match the Schema
-
estimateSize
public static long estimateSize(org.apache.spark.sql.types.StructType tableSchema, long totalRecords)estimate approximate table size based on spark schema and total records.- Parameters:
tableSchema- spark schematotalRecords- total records in the table- Returns:
- approxiate size based on table schema
-
-