public class SparkSchemaUtil
extends java.lang.Object
| Modifier and Type | Method and Description | 
|---|---|
static Type | 
convert(org.apache.spark.sql.types.DataType sparkType)
Convert a Spark  
struct to a Type with new field ids. | 
static org.apache.spark.sql.types.StructType | 
convert(Schema schema)
Convert a  
Schema to a Spark type. | 
static Schema | 
convert(Schema baseSchema,
       org.apache.spark.sql.types.StructType sparkType)
Convert a Spark  
struct to a Schema based on the given schema. | 
static Schema | 
convert(org.apache.spark.sql.types.StructType sparkType)
Convert a Spark  
struct to a Schema with new field ids. | 
static Schema | 
convert(org.apache.spark.sql.types.StructType sparkType,
       boolean useTimestampWithoutZone)
Convert a Spark  
struct to a Schema with new field ids. | 
static org.apache.spark.sql.types.DataType | 
convert(Type type)
Convert a  
Type to a Spark type. | 
static long | 
estimateSize(org.apache.spark.sql.types.StructType tableSchema,
            long totalRecords)
Estimate approximate table size based on Spark schema and total records. 
 | 
static java.util.Map<java.lang.Integer,java.lang.String> | 
indexQuotedNameById(Schema schema)  | 
static Schema | 
prune(Schema schema,
     org.apache.spark.sql.types.StructType requestedType)
Prune columns from a  
Schema using a Spark type projection. | 
static Schema | 
prune(Schema schema,
     org.apache.spark.sql.types.StructType requestedType,
     Expression filter,
     boolean caseSensitive)
Prune columns from a  
Schema using a Spark type projection. | 
static Schema | 
prune(Schema schema,
     org.apache.spark.sql.types.StructType requestedType,
     java.util.List<Expression> filters)
Prune columns from a  
Schema using a Spark type projection. | 
static Schema | 
schemaForTable(org.apache.spark.sql.SparkSession spark,
              java.lang.String name)
Returns a  
Schema for the given table with fresh field ids. | 
static PartitionSpec | 
specForTable(org.apache.spark.sql.SparkSession spark,
            java.lang.String name)
Returns a  
PartitionSpec for the given table. | 
static void | 
validateMetadataColumnReferences(Schema tableSchema,
                                Schema readSchema)  | 
public static Schema schemaForTable(org.apache.spark.sql.SparkSession spark, java.lang.String name)
Schema for the given table with fresh field ids.
 This creates a Schema for an existing table by looking up the table's schema with Spark and converting that schema. Spark/Hive partition columns are included in the schema.
spark - a Spark sessionname - a table name and (optional) databasepublic static PartitionSpec specForTable(org.apache.spark.sql.SparkSession spark, java.lang.String name) throws org.apache.spark.sql.AnalysisException
PartitionSpec for the given table.
 This creates a partition spec for an existing table by looking up the table's schema and creating a spec with identity partitions for each partition column.
spark - a Spark sessionname - a table name and (optional) databaseorg.apache.spark.sql.AnalysisException - if thrown by the Spark catalogpublic static org.apache.spark.sql.types.StructType convert(Schema schema)
Schema to a Spark type.schema - a Schemajava.lang.IllegalArgumentException - if the type cannot be converted to Sparkpublic static org.apache.spark.sql.types.DataType convert(Type type)
Type to a Spark type.type - a Typejava.lang.IllegalArgumentException - if the type cannot be converted to Sparkpublic static Schema convert(org.apache.spark.sql.types.StructType sparkType)
struct to a Schema with new field ids.
 This conversion assigns fresh ids.
Some data types are represented as the same Spark type. These are converted to a default type.
 To convert using a reference schema for field ids and ambiguous types, use
 convert(Schema, StructType).
sparkType - a Spark StructTypejava.lang.IllegalArgumentException - if the type cannot be convertedpublic static Schema convert(org.apache.spark.sql.types.StructType sparkType, boolean useTimestampWithoutZone)
struct to a Schema with new field ids.
 This conversion assigns fresh ids.
Some data types are represented as the same Spark type. These are converted to a default type.
 To convert using a reference schema for field ids and ambiguous types, use
 convert(Schema, StructType).
sparkType - a Spark StructTypeuseTimestampWithoutZone - boolean flag indicates that timestamp should be stored without timezonejava.lang.IllegalArgumentException - if the type cannot be convertedpublic static Type convert(org.apache.spark.sql.types.DataType sparkType)
struct to a Type with new field ids.
 This conversion assigns fresh ids.
Some data types are represented as the same Spark type. These are converted to a default type.
 To convert using a reference schema for field ids and ambiguous types, use
 convert(Schema, StructType).
sparkType - a Spark DataTypejava.lang.IllegalArgumentException - if the type cannot be convertedpublic static Schema convert(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType)
struct to a Schema based on the given schema.
 This conversion does not assign new ids; it uses ids from the base schema.
Data types, field order, and nullability will match the spark type. This conversion may return a schema that is not compatible with base schema.
baseSchema - a Schema on which conversion is basedsparkType - a Spark StructTypejava.lang.IllegalArgumentException - if the type cannot be converted or there are missing idspublic static Schema prune(Schema schema, org.apache.spark.sql.types.StructType requestedType)
Schema using a Spark type projection.
 This requires that the Spark type is a projection of the Schema. Nullability and types must match.
schema - a SchemarequestedType - a projection of the Spark representation of the Schemajava.lang.IllegalArgumentException - if the Spark type does not match the Schemapublic static Schema prune(Schema schema, org.apache.spark.sql.types.StructType requestedType, java.util.List<Expression> filters)
Schema using a Spark type projection.
 This requires that the Spark type is a projection of the Schema. Nullability and types must match.
 The filters list of Expression is used to ensure that columns referenced by filters
 are projected.
schema - a SchemarequestedType - a projection of the Spark representation of the Schemafilters - a list of filtersjava.lang.IllegalArgumentException - if the Spark type does not match the Schemapublic static Schema prune(Schema schema, org.apache.spark.sql.types.StructType requestedType, Expression filter, boolean caseSensitive)
Schema using a Spark type projection.
 This requires that the Spark type is a projection of the Schema. Nullability and types must match.
 The filters list of Expression is used to ensure that columns referenced by filters
 are projected.
schema - a SchemarequestedType - a projection of the Spark representation of the Schemafilter - a filtersjava.lang.IllegalArgumentException - if the Spark type does not match the Schemapublic static long estimateSize(org.apache.spark.sql.types.StructType tableSchema,
                                long totalRecords)
tableSchema - Spark schematotalRecords - total records in the tablepublic static void validateMetadataColumnReferences(Schema tableSchema, Schema readSchema)
public static java.util.Map<java.lang.Integer,java.lang.String> indexQuotedNameById(Schema schema)