public class SparkSchemaUtil
extends java.lang.Object
Modifier and Type | Method and Description |
---|---|
static Type |
convert(org.apache.spark.sql.types.DataType sparkType)
Convert a Spark
struct to a Type with new field ids. |
static org.apache.spark.sql.types.StructType |
convert(Schema schema)
Convert a
Schema to a Spark type . |
static Schema |
convert(Schema baseSchema,
org.apache.spark.sql.types.StructType sparkType)
Convert a Spark
struct to a Schema based on the given schema. |
static Schema |
convert(org.apache.spark.sql.types.StructType sparkType)
Convert a Spark
struct to a Schema with new field ids. |
static Schema |
convert(org.apache.spark.sql.types.StructType sparkType,
boolean useTimestampWithoutZone)
Convert a Spark
struct to a Schema with new field ids. |
static org.apache.spark.sql.types.DataType |
convert(Type type)
Convert a
Type to a Spark type . |
static long |
estimateSize(org.apache.spark.sql.types.StructType tableSchema,
long totalRecords)
Estimate approximate table size based on Spark schema and total records.
|
static java.util.Map<java.lang.Integer,java.lang.String> |
indexQuotedNameById(Schema schema) |
static Schema |
prune(Schema schema,
org.apache.spark.sql.types.StructType requestedType)
Prune columns from a
Schema using a Spark type projection. |
static Schema |
prune(Schema schema,
org.apache.spark.sql.types.StructType requestedType,
Expression filter,
boolean caseSensitive)
Prune columns from a
Schema using a Spark type projection. |
static Schema |
prune(Schema schema,
org.apache.spark.sql.types.StructType requestedType,
java.util.List<Expression> filters)
Prune columns from a
Schema using a Spark type projection. |
static Schema |
schemaForTable(org.apache.spark.sql.SparkSession spark,
java.lang.String name)
Returns a
Schema for the given table with fresh field ids. |
static PartitionSpec |
specForTable(org.apache.spark.sql.SparkSession spark,
java.lang.String name)
Returns a
PartitionSpec for the given table. |
static void |
validateMetadataColumnReferences(Schema tableSchema,
Schema readSchema) |
public static Schema schemaForTable(org.apache.spark.sql.SparkSession spark, java.lang.String name)
Schema
for the given table with fresh field ids.
This creates a Schema for an existing table by looking up the table's schema with Spark and converting that schema. Spark/Hive partition columns are included in the schema.
spark
- a Spark sessionname
- a table name and (optional) databasepublic static PartitionSpec specForTable(org.apache.spark.sql.SparkSession spark, java.lang.String name) throws org.apache.spark.sql.AnalysisException
PartitionSpec
for the given table.
This creates a partition spec for an existing table by looking up the table's schema and creating a spec with identity partitions for each partition column.
spark
- a Spark sessionname
- a table name and (optional) databaseorg.apache.spark.sql.AnalysisException
- if thrown by the Spark catalogpublic static org.apache.spark.sql.types.StructType convert(Schema schema)
Schema
to a Spark type
.schema
- a Schemajava.lang.IllegalArgumentException
- if the type cannot be converted to Sparkpublic static org.apache.spark.sql.types.DataType convert(Type type)
Type
to a Spark type
.type
- a Typejava.lang.IllegalArgumentException
- if the type cannot be converted to Sparkpublic static Schema convert(org.apache.spark.sql.types.StructType sparkType)
struct
to a Schema
with new field ids.
This conversion assigns fresh ids.
Some data types are represented as the same Spark type. These are converted to a default type.
To convert using a reference schema for field ids and ambiguous types, use
convert(Schema, StructType)
.
sparkType
- a Spark StructTypejava.lang.IllegalArgumentException
- if the type cannot be convertedpublic static Schema convert(org.apache.spark.sql.types.StructType sparkType, boolean useTimestampWithoutZone)
struct
to a Schema
with new field ids.
This conversion assigns fresh ids.
Some data types are represented as the same Spark type. These are converted to a default type.
To convert using a reference schema for field ids and ambiguous types, use
convert(Schema, StructType)
.
sparkType
- a Spark StructTypeuseTimestampWithoutZone
- boolean flag indicates that timestamp should be stored without timezonejava.lang.IllegalArgumentException
- if the type cannot be convertedpublic static Type convert(org.apache.spark.sql.types.DataType sparkType)
struct
to a Type
with new field ids.
This conversion assigns fresh ids.
Some data types are represented as the same Spark type. These are converted to a default type.
To convert using a reference schema for field ids and ambiguous types, use
convert(Schema, StructType)
.
sparkType
- a Spark DataTypejava.lang.IllegalArgumentException
- if the type cannot be convertedpublic static Schema convert(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType)
struct
to a Schema
based on the given schema.
This conversion does not assign new ids; it uses ids from the base schema.
Data types, field order, and nullability will match the spark type. This conversion may return a schema that is not compatible with base schema.
baseSchema
- a Schema on which conversion is basedsparkType
- a Spark StructTypejava.lang.IllegalArgumentException
- if the type cannot be converted or there are missing idspublic static Schema prune(Schema schema, org.apache.spark.sql.types.StructType requestedType)
Schema
using a Spark type
projection.
This requires that the Spark type is a projection of the Schema. Nullability and types must match.
schema
- a SchemarequestedType
- a projection of the Spark representation of the Schemajava.lang.IllegalArgumentException
- if the Spark type does not match the Schemapublic static Schema prune(Schema schema, org.apache.spark.sql.types.StructType requestedType, java.util.List<Expression> filters)
Schema
using a Spark type
projection.
This requires that the Spark type is a projection of the Schema. Nullability and types must match.
The filters list of Expression
is used to ensure that columns referenced by filters
are projected.
schema
- a SchemarequestedType
- a projection of the Spark representation of the Schemafilters
- a list of filtersjava.lang.IllegalArgumentException
- if the Spark type does not match the Schemapublic static Schema prune(Schema schema, org.apache.spark.sql.types.StructType requestedType, Expression filter, boolean caseSensitive)
Schema
using a Spark type
projection.
This requires that the Spark type is a projection of the Schema. Nullability and types must match.
The filters list of Expression
is used to ensure that columns referenced by filters
are projected.
schema
- a SchemarequestedType
- a projection of the Spark representation of the Schemafilter
- a filtersjava.lang.IllegalArgumentException
- if the Spark type does not match the Schemapublic static long estimateSize(org.apache.spark.sql.types.StructType tableSchema, long totalRecords)
tableSchema
- Spark schematotalRecords
- total records in the tablepublic static void validateMetadataColumnReferences(Schema tableSchema, Schema readSchema)
public static java.util.Map<java.lang.Integer,java.lang.String> indexQuotedNameById(Schema schema)