public class SparkSchemaUtil
extends java.lang.Object
| Modifier and Type | Method and Description |
|---|---|
static Type |
convert(org.apache.spark.sql.types.DataType sparkType)
Convert a Spark
struct to a Type with new field ids. |
static org.apache.spark.sql.types.StructType |
convert(Schema schema)
Convert a
Schema to a Spark type. |
static Schema |
convert(Schema baseSchema,
org.apache.spark.sql.types.StructType sparkType)
Convert a Spark
struct to a Schema based on the given schema. |
static Schema |
convert(org.apache.spark.sql.types.StructType sparkType)
Convert a Spark
struct to a Schema with new field ids. |
static org.apache.spark.sql.types.DataType |
convert(Type type)
Convert a
Type to a Spark type. |
static Schema |
prune(Schema schema,
org.apache.spark.sql.types.StructType requestedType)
Prune columns from a
Schema using a Spark type projection. |
static Schema |
prune(Schema schema,
org.apache.spark.sql.types.StructType requestedType,
Expression filter,
boolean caseSensitive)
Prune columns from a
Schema using a Spark type projection. |
static Schema |
prune(Schema schema,
org.apache.spark.sql.types.StructType requestedType,
java.util.List<Expression> filters)
Prune columns from a
Schema using a Spark type projection. |
static Schema |
schemaForTable(org.apache.spark.sql.SparkSession spark,
java.lang.String name)
Returns a
Schema for the given table with fresh field ids. |
static PartitionSpec |
specForTable(org.apache.spark.sql.SparkSession spark,
java.lang.String name)
Returns a
PartitionSpec for the given table. |
public static Schema schemaForTable(org.apache.spark.sql.SparkSession spark, java.lang.String name)
Schema for the given table with fresh field ids.
This creates a Schema for an existing table by looking up the table's schema with Spark and converting that schema. Spark/Hive partition columns are included in the schema.
spark - a Spark sessionname - a table name and (optional) databasepublic static PartitionSpec specForTable(org.apache.spark.sql.SparkSession spark, java.lang.String name) throws org.apache.spark.sql.AnalysisException
PartitionSpec for the given table.
This creates a partition spec for an existing table by looking up the table's schema and creating a spec with identity partitions for each partition column.
spark - a Spark sessionname - a table name and (optional) databaseorg.apache.spark.sql.AnalysisException - if thrown by the Spark catalogpublic static org.apache.spark.sql.types.StructType convert(Schema schema)
Schema to a Spark type.schema - a Schemajava.lang.IllegalArgumentException - if the type cannot be converted to Sparkpublic static org.apache.spark.sql.types.DataType convert(Type type)
Type to a Spark type.type - a Typejava.lang.IllegalArgumentException - if the type cannot be converted to Sparkpublic static Schema convert(org.apache.spark.sql.types.StructType sparkType)
struct to a Schema with new field ids.
This conversion assigns fresh ids.
Some data types are represented as the same Spark type. These are converted to a default type.
To convert using a reference schema for field ids and ambiguous types, use
convert(Schema, StructType).
sparkType - a Spark StructTypejava.lang.IllegalArgumentException - if the type cannot be convertedpublic static Type convert(org.apache.spark.sql.types.DataType sparkType)
struct to a Type with new field ids.
This conversion assigns fresh ids.
Some data types are represented as the same Spark type. These are converted to a default type.
To convert using a reference schema for field ids and ambiguous types, use
convert(Schema, StructType).
sparkType - a Spark DataTypejava.lang.IllegalArgumentException - if the type cannot be convertedpublic static Schema convert(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType)
struct to a Schema based on the given schema.
This conversion does not assign new ids; it uses ids from the base schema.
Data types, field order, and nullability will match the spark type. This conversion may return a schema that is not compatible with base schema.
baseSchema - a Schema on which conversion is basedsparkType - a Spark StructTypejava.lang.IllegalArgumentException - if the type cannot be converted or there are missing idspublic static Schema prune(Schema schema, org.apache.spark.sql.types.StructType requestedType)
Schema using a Spark type projection.
This requires that the Spark type is a projection of the Schema. Nullability and types must match.
schema - a SchemarequestedType - a projection of the Spark representation of the Schemajava.lang.IllegalArgumentException - if the Spark type does not match the Schemapublic static Schema prune(Schema schema, org.apache.spark.sql.types.StructType requestedType, java.util.List<Expression> filters)
Schema using a Spark type projection.
This requires that the Spark type is a projection of the Schema. Nullability and types must match.
The filters list of Expression is used to ensure that columns referenced by filters
are projected.
schema - a SchemarequestedType - a projection of the Spark representation of the Schemafilters - a list of filtersjava.lang.IllegalArgumentException - if the Spark type does not match the Schemapublic static Schema prune(Schema schema, org.apache.spark.sql.types.StructType requestedType, Expression filter, boolean caseSensitive)
Schema using a Spark type projection.
This requires that the Spark type is a projection of the Schema. Nullability and types must match.
The filters list of Expression is used to ensure that columns referenced by filters
are projected.
schema - a SchemarequestedType - a projection of the Spark representation of the Schemafilter - a filtersjava.lang.IllegalArgumentException - if the Spark type does not match the Schema