Class SparkSchemaUtil
-
Method Summary
Modifier and TypeMethodDescriptionstatic org.apache.spark.sql.types.StructTypeConvert aSchemato aSpark type.static SchemaConvert a Sparkstructto aSchemabased on the given schema.static SchemaConvert a Sparkstructto aSchemabased on the given schema.static org.apache.spark.sql.types.DataTypeConvert aTypeto aSpark type.static Typeconvert(org.apache.spark.sql.types.DataType sparkType) Convert a Sparkstructto aTypewith new field ids.static Schemaconvert(org.apache.spark.sql.types.StructType sparkType) Convert a Sparkstructto aSchemawith new field ids.static SchemaconvertWithFreshIds(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType) Convert a Sparkstructto aSchemabased on the given schema.static SchemaconvertWithFreshIds(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType, boolean caseSensitive) Convert a Sparkstructto aSchemabased on the given schema.static longestimateSize(org.apache.spark.sql.types.StructType tableSchema, long totalRecords) Estimate approximate table size based on Spark schema and total records.indexQuotedNameById(Schema schema) static SchemaPrune columns from aSchemausing aSpark typeprojection.static Schemaprune(Schema schema, org.apache.spark.sql.types.StructType requestedType, List<Expression> filters) Prune columns from aSchemausing aSpark typeprojection.static Schemaprune(Schema schema, org.apache.spark.sql.types.StructType requestedType, Expression filter, boolean caseSensitive) Prune columns from aSchemausing aSpark typeprojection.static SchemaschemaForTable(org.apache.spark.sql.SparkSession spark, String name) Returns aSchemafor the given table with fresh field ids.static PartitionSpecspecForTable(org.apache.spark.sql.SparkSession spark, String name) Returns aPartitionSpecfor the given table.static voidvalidateMetadataColumnReferences(Schema tableSchema, Schema readSchema)
-
Method Details
-
schemaForTable
Returns aSchemafor the given table with fresh field ids.This creates a Schema for an existing table by looking up the table's schema with Spark and converting that schema. Spark/Hive partition columns are included in the schema.
- Parameters:
spark- a Spark sessionname- a table name and (optional) database- Returns:
- a Schema for the table, if found
-
specForTable
public static PartitionSpec specForTable(org.apache.spark.sql.SparkSession spark, String name) throws org.apache.spark.sql.AnalysisException Returns aPartitionSpecfor the given table.This creates a partition spec for an existing table by looking up the table's schema and creating a spec with identity partitions for each partition column.
- Parameters:
spark- a Spark sessionname- a table name and (optional) database- Returns:
- a PartitionSpec for the table
- Throws:
org.apache.spark.sql.AnalysisException- if thrown by the Spark catalog
-
convert
Convert aSchemato aSpark type.- Parameters:
schema- a Schema- Returns:
- the equivalent Spark type
- Throws:
IllegalArgumentException- if the type cannot be converted to Spark
-
convert
Convert aTypeto aSpark type.- Parameters:
type- a Type- Returns:
- the equivalent Spark type
- Throws:
IllegalArgumentException- if the type cannot be converted to Spark
-
convert
Convert a Sparkstructto aSchemawith new field ids.This conversion assigns fresh ids.
Some data types are represented as the same Spark type. These are converted to a default type.
To convert using a reference schema for field ids and ambiguous types, use
convert(Schema, StructType).- Parameters:
sparkType- a Spark StructType- Returns:
- the equivalent Schema
- Throws:
IllegalArgumentException- if the type cannot be converted
-
convert
Convert a Sparkstructto aTypewith new field ids.This conversion assigns fresh ids.
Some data types are represented as the same Spark type. These are converted to a default type.
To convert using a reference schema for field ids and ambiguous types, use
convert(Schema, StructType).- Parameters:
sparkType- a Spark DataType- Returns:
- the equivalent Type
- Throws:
IllegalArgumentException- if the type cannot be converted
-
convert
Convert a Sparkstructto aSchemabased on the given schema.This conversion does not assign new ids; it uses ids from the base schema.
Data types, field order, and nullability will match the spark type. This conversion may return a schema that is not compatible with base schema.
- Parameters:
baseSchema- a Schema on which conversion is basedsparkType- a Spark StructType- Returns:
- the equivalent Schema
- Throws:
IllegalArgumentException- if the type cannot be converted or there are missing ids
-
convert
public static Schema convert(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType, boolean caseSensitive) Convert a Sparkstructto aSchemabased on the given schema.This conversion does not assign new ids; it uses ids from the base schema.
Data types, field order, and nullability will match the spark type. This conversion may return a schema that is not compatible with base schema.
- Parameters:
baseSchema- a Schema on which conversion is basedsparkType- a Spark StructTypecaseSensitive- when false, the case of schema fields is ignored- Returns:
- the equivalent Schema
- Throws:
IllegalArgumentException- if the type cannot be converted or there are missing ids
-
convertWithFreshIds
public static Schema convertWithFreshIds(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType) Convert a Sparkstructto aSchemabased on the given schema.This conversion will assign new ids for fields that are not found in the base schema.
Data types, field order, and nullability will match the spark type. This conversion may return a schema that is not compatible with base schema.
- Parameters:
baseSchema- a Schema on which conversion is basedsparkType- a Spark StructType- Returns:
- the equivalent Schema
- Throws:
IllegalArgumentException- if the type cannot be converted or there are missing ids
-
convertWithFreshIds
public static Schema convertWithFreshIds(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType, boolean caseSensitive) Convert a Sparkstructto aSchemabased on the given schema.This conversion will assign new ids for fields that are not found in the base schema.
Data types, field order, and nullability will match the spark type. This conversion may return a schema that is not compatible with base schema.
- Parameters:
baseSchema- a Schema on which conversion is basedsparkType- a Spark StructTypecaseSensitive- when false, case of field names in schema is ignored- Returns:
- the equivalent Schema
- Throws:
IllegalArgumentException- if the type cannot be converted or there are missing ids
-
prune
Prune columns from aSchemausing aSpark typeprojection.This requires that the Spark type is a projection of the Schema. Nullability and types must match.
- Parameters:
schema- a SchemarequestedType- a projection of the Spark representation of the Schema- Returns:
- a Schema corresponding to the Spark projection
- Throws:
IllegalArgumentException- if the Spark type does not match the Schema
-
prune
public static Schema prune(Schema schema, org.apache.spark.sql.types.StructType requestedType, List<Expression> filters) Prune columns from aSchemausing aSpark typeprojection.This requires that the Spark type is a projection of the Schema. Nullability and types must match.
The filters list of
Expressionis used to ensure that columns referenced by filters are projected.- Parameters:
schema- a SchemarequestedType- a projection of the Spark representation of the Schemafilters- a list of filters- Returns:
- a Schema corresponding to the Spark projection
- Throws:
IllegalArgumentException- if the Spark type does not match the Schema
-
prune
public static Schema prune(Schema schema, org.apache.spark.sql.types.StructType requestedType, Expression filter, boolean caseSensitive) Prune columns from aSchemausing aSpark typeprojection.This requires that the Spark type is a projection of the Schema. Nullability and types must match.
The filters list of
Expressionis used to ensure that columns referenced by filters are projected.- Parameters:
schema- a SchemarequestedType- a projection of the Spark representation of the Schemafilter- a filters- Returns:
- a Schema corresponding to the Spark projection
- Throws:
IllegalArgumentException- if the Spark type does not match the Schema
-
estimateSize
public static long estimateSize(org.apache.spark.sql.types.StructType tableSchema, long totalRecords) Estimate approximate table size based on Spark schema and total records.- Parameters:
tableSchema- Spark schematotalRecords- total records in the table- Returns:
- approximate size based on table schema
-
validateMetadataColumnReferences
-
indexQuotedNameById
-