Class SparkSchemaUtil
-
Method Summary
Modifier and TypeMethodDescriptionstatic org.apache.spark.sql.types.StructType
Convert aSchema
to aSpark type
.static Schema
Convert a Sparkstruct
to aSchema
based on the given schema.static Schema
Convert a Sparkstruct
to aSchema
based on the given schema.static org.apache.spark.sql.types.DataType
Convert aType
to aSpark type
.static Type
convert
(org.apache.spark.sql.types.DataType sparkType) Convert a Sparkstruct
to aType
with new field ids.static Schema
convert
(org.apache.spark.sql.types.StructType sparkType) Convert a Sparkstruct
to aSchema
with new field ids.static Schema
convertWithFreshIds
(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType) Convert a Sparkstruct
to aSchema
based on the given schema.static Schema
convertWithFreshIds
(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType, boolean caseSensitive) Convert a Sparkstruct
to aSchema
based on the given schema.static long
estimateSize
(org.apache.spark.sql.types.StructType tableSchema, long totalRecords) Estimate approximate table size based on Spark schema and total records.indexQuotedNameById
(Schema schema) static Schema
Prune columns from aSchema
using aSpark type
projection.static Schema
prune
(Schema schema, org.apache.spark.sql.types.StructType requestedType, List<Expression> filters) Prune columns from aSchema
using aSpark type
projection.static Schema
prune
(Schema schema, org.apache.spark.sql.types.StructType requestedType, Expression filter, boolean caseSensitive) Prune columns from aSchema
using aSpark type
projection.static Schema
schemaForTable
(org.apache.spark.sql.SparkSession spark, String name) Returns aSchema
for the given table with fresh field ids.static PartitionSpec
specForTable
(org.apache.spark.sql.SparkSession spark, String name) Returns aPartitionSpec
for the given table.static void
validateMetadataColumnReferences
(Schema tableSchema, Schema readSchema)
-
Method Details
-
schemaForTable
Returns aSchema
for the given table with fresh field ids.This creates a Schema for an existing table by looking up the table's schema with Spark and converting that schema. Spark/Hive partition columns are included in the schema.
- Parameters:
spark
- a Spark sessionname
- a table name and (optional) database- Returns:
- a Schema for the table, if found
-
specForTable
public static PartitionSpec specForTable(org.apache.spark.sql.SparkSession spark, String name) throws org.apache.spark.sql.AnalysisException Returns aPartitionSpec
for the given table.This creates a partition spec for an existing table by looking up the table's schema and creating a spec with identity partitions for each partition column.
- Parameters:
spark
- a Spark sessionname
- a table name and (optional) database- Returns:
- a PartitionSpec for the table
- Throws:
org.apache.spark.sql.AnalysisException
- if thrown by the Spark catalog
-
convert
Convert aSchema
to aSpark type
.- Parameters:
schema
- a Schema- Returns:
- the equivalent Spark type
- Throws:
IllegalArgumentException
- if the type cannot be converted to Spark
-
convert
Convert aType
to aSpark type
.- Parameters:
type
- a Type- Returns:
- the equivalent Spark type
- Throws:
IllegalArgumentException
- if the type cannot be converted to Spark
-
convert
Convert a Sparkstruct
to aSchema
with new field ids.This conversion assigns fresh ids.
Some data types are represented as the same Spark type. These are converted to a default type.
To convert using a reference schema for field ids and ambiguous types, use
convert(Schema, StructType)
.- Parameters:
sparkType
- a Spark StructType- Returns:
- the equivalent Schema
- Throws:
IllegalArgumentException
- if the type cannot be converted
-
convert
Convert a Sparkstruct
to aType
with new field ids.This conversion assigns fresh ids.
Some data types are represented as the same Spark type. These are converted to a default type.
To convert using a reference schema for field ids and ambiguous types, use
convert(Schema, StructType)
.- Parameters:
sparkType
- a Spark DataType- Returns:
- the equivalent Type
- Throws:
IllegalArgumentException
- if the type cannot be converted
-
convert
Convert a Sparkstruct
to aSchema
based on the given schema.This conversion does not assign new ids; it uses ids from the base schema.
Data types, field order, and nullability will match the spark type. This conversion may return a schema that is not compatible with base schema.
- Parameters:
baseSchema
- a Schema on which conversion is basedsparkType
- a Spark StructType- Returns:
- the equivalent Schema
- Throws:
IllegalArgumentException
- if the type cannot be converted or there are missing ids
-
convert
public static Schema convert(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType, boolean caseSensitive) Convert a Sparkstruct
to aSchema
based on the given schema.This conversion does not assign new ids; it uses ids from the base schema.
Data types, field order, and nullability will match the spark type. This conversion may return a schema that is not compatible with base schema.
- Parameters:
baseSchema
- a Schema on which conversion is basedsparkType
- a Spark StructTypecaseSensitive
- when false, the case of schema fields is ignored- Returns:
- the equivalent Schema
- Throws:
IllegalArgumentException
- if the type cannot be converted or there are missing ids
-
convertWithFreshIds
public static Schema convertWithFreshIds(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType) Convert a Sparkstruct
to aSchema
based on the given schema.This conversion will assign new ids for fields that are not found in the base schema.
Data types, field order, and nullability will match the spark type. This conversion may return a schema that is not compatible with base schema.
- Parameters:
baseSchema
- a Schema on which conversion is basedsparkType
- a Spark StructType- Returns:
- the equivalent Schema
- Throws:
IllegalArgumentException
- if the type cannot be converted or there are missing ids
-
convertWithFreshIds
public static Schema convertWithFreshIds(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType, boolean caseSensitive) Convert a Sparkstruct
to aSchema
based on the given schema.This conversion will assign new ids for fields that are not found in the base schema.
Data types, field order, and nullability will match the spark type. This conversion may return a schema that is not compatible with base schema.
- Parameters:
baseSchema
- a Schema on which conversion is basedsparkType
- a Spark StructTypecaseSensitive
- when false, case of field names in schema is ignored- Returns:
- the equivalent Schema
- Throws:
IllegalArgumentException
- if the type cannot be converted or there are missing ids
-
prune
Prune columns from aSchema
using aSpark type
projection.This requires that the Spark type is a projection of the Schema. Nullability and types must match.
- Parameters:
schema
- a SchemarequestedType
- a projection of the Spark representation of the Schema- Returns:
- a Schema corresponding to the Spark projection
- Throws:
IllegalArgumentException
- if the Spark type does not match the Schema
-
prune
public static Schema prune(Schema schema, org.apache.spark.sql.types.StructType requestedType, List<Expression> filters) Prune columns from aSchema
using aSpark type
projection.This requires that the Spark type is a projection of the Schema. Nullability and types must match.
The filters list of
Expression
is used to ensure that columns referenced by filters are projected.- Parameters:
schema
- a SchemarequestedType
- a projection of the Spark representation of the Schemafilters
- a list of filters- Returns:
- a Schema corresponding to the Spark projection
- Throws:
IllegalArgumentException
- if the Spark type does not match the Schema
-
prune
public static Schema prune(Schema schema, org.apache.spark.sql.types.StructType requestedType, Expression filter, boolean caseSensitive) Prune columns from aSchema
using aSpark type
projection.This requires that the Spark type is a projection of the Schema. Nullability and types must match.
The filters list of
Expression
is used to ensure that columns referenced by filters are projected.- Parameters:
schema
- a SchemarequestedType
- a projection of the Spark representation of the Schemafilter
- a filters- Returns:
- a Schema corresponding to the Spark projection
- Throws:
IllegalArgumentException
- if the Spark type does not match the Schema
-
estimateSize
public static long estimateSize(org.apache.spark.sql.types.StructType tableSchema, long totalRecords) Estimate approximate table size based on Spark schema and total records.- Parameters:
tableSchema
- Spark schematotalRecords
- total records in the table- Returns:
- approximate size based on table schema
-
validateMetadataColumnReferences
-
indexQuotedNameById
-