public class SparkSchemaUtil
extends java.lang.Object
| Modifier and Type | Method and Description | 
|---|---|
| static Type | convert(org.apache.spark.sql.types.DataType sparkType)Convert a Spark  structto aTypewith new field ids. | 
| static org.apache.spark.sql.types.StructType | convert(Schema schema)Convert a  Schemato aSpark type. | 
| static Schema | convert(Schema baseSchema,
       org.apache.spark.sql.types.StructType sparkType)Convert a Spark  structto aSchemabased on the given schema. | 
| static Schema | convert(Schema baseSchema,
       org.apache.spark.sql.types.StructType sparkType,
       boolean caseSensitive)Convert a Spark  structto aSchemabased on the given schema. | 
| static Schema | convert(org.apache.spark.sql.types.StructType sparkType)Convert a Spark  structto aSchemawith new field ids. | 
| static org.apache.spark.sql.types.DataType | convert(Type type)Convert a  Typeto aSpark type. | 
| static Schema | convertWithFreshIds(Schema baseSchema,
                   org.apache.spark.sql.types.StructType sparkType)Convert a Spark  structto aSchemabased on the given schema. | 
| static Schema | convertWithFreshIds(Schema baseSchema,
                   org.apache.spark.sql.types.StructType sparkType,
                   boolean caseSensitive)Convert a Spark  structto aSchemabased on the given schema. | 
| static long | estimateSize(org.apache.spark.sql.types.StructType tableSchema,
            long totalRecords)Estimate approximate table size based on Spark schema and total records. | 
| static java.util.Map<java.lang.Integer,java.lang.String> | indexQuotedNameById(Schema schema) | 
| static Schema | prune(Schema schema,
     org.apache.spark.sql.types.StructType requestedType)Prune columns from a  Schemausing aSpark typeprojection. | 
| static Schema | prune(Schema schema,
     org.apache.spark.sql.types.StructType requestedType,
     Expression filter,
     boolean caseSensitive)Prune columns from a  Schemausing aSpark typeprojection. | 
| static Schema | prune(Schema schema,
     org.apache.spark.sql.types.StructType requestedType,
     java.util.List<Expression> filters)Prune columns from a  Schemausing aSpark typeprojection. | 
| static Schema | schemaForTable(org.apache.spark.sql.SparkSession spark,
              java.lang.String name)Returns a  Schemafor the given table with fresh field ids. | 
| static PartitionSpec | specForTable(org.apache.spark.sql.SparkSession spark,
            java.lang.String name)Returns a  PartitionSpecfor the given table. | 
| static void | validateMetadataColumnReferences(Schema tableSchema,
                                Schema readSchema) | 
public static Schema schemaForTable(org.apache.spark.sql.SparkSession spark, java.lang.String name)
Schema for the given table with fresh field ids.
 This creates a Schema for an existing table by looking up the table's schema with Spark and converting that schema. Spark/Hive partition columns are included in the schema.
spark - a Spark sessionname - a table name and (optional) databasepublic static PartitionSpec specForTable(org.apache.spark.sql.SparkSession spark, java.lang.String name) throws org.apache.spark.sql.AnalysisException
PartitionSpec for the given table.
 This creates a partition spec for an existing table by looking up the table's schema and creating a spec with identity partitions for each partition column.
spark - a Spark sessionname - a table name and (optional) databaseorg.apache.spark.sql.AnalysisException - if thrown by the Spark catalogpublic static org.apache.spark.sql.types.StructType convert(Schema schema)
Schema to a Spark type.schema - a Schemajava.lang.IllegalArgumentException - if the type cannot be converted to Sparkpublic static org.apache.spark.sql.types.DataType convert(Type type)
Type to a Spark type.type - a Typejava.lang.IllegalArgumentException - if the type cannot be converted to Sparkpublic static Schema convert(org.apache.spark.sql.types.StructType sparkType)
struct to a Schema with new field ids.
 This conversion assigns fresh ids.
Some data types are represented as the same Spark type. These are converted to a default type.
To convert using a reference schema for field ids and ambiguous types, use convert(Schema, StructType).
sparkType - a Spark StructTypejava.lang.IllegalArgumentException - if the type cannot be convertedpublic static Type convert(org.apache.spark.sql.types.DataType sparkType)
struct to a Type with new field ids.
 This conversion assigns fresh ids.
Some data types are represented as the same Spark type. These are converted to a default type.
To convert using a reference schema for field ids and ambiguous types, use convert(Schema, StructType).
sparkType - a Spark DataTypejava.lang.IllegalArgumentException - if the type cannot be convertedpublic static Schema convert(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType)
struct to a Schema based on the given schema.
 This conversion does not assign new ids; it uses ids from the base schema.
Data types, field order, and nullability will match the spark type. This conversion may return a schema that is not compatible with base schema.
baseSchema - a Schema on which conversion is basedsparkType - a Spark StructTypejava.lang.IllegalArgumentException - if the type cannot be converted or there are missing idspublic static Schema convert(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType, boolean caseSensitive)
struct to a Schema based on the given schema.
 This conversion does not assign new ids; it uses ids from the base schema.
Data types, field order, and nullability will match the spark type. This conversion may return a schema that is not compatible with base schema.
baseSchema - a Schema on which conversion is basedsparkType - a Spark StructTypecaseSensitive - when false, the case of schema fields is ignoredjava.lang.IllegalArgumentException - if the type cannot be converted or there are missing idspublic static Schema convertWithFreshIds(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType)
struct to a Schema based on the given schema.
 This conversion will assign new ids for fields that are not found in the base schema.
Data types, field order, and nullability will match the spark type. This conversion may return a schema that is not compatible with base schema.
baseSchema - a Schema on which conversion is basedsparkType - a Spark StructTypejava.lang.IllegalArgumentException - if the type cannot be converted or there are missing idspublic static Schema convertWithFreshIds(Schema baseSchema, org.apache.spark.sql.types.StructType sparkType, boolean caseSensitive)
struct to a Schema based on the given schema.
 This conversion will assign new ids for fields that are not found in the base schema.
Data types, field order, and nullability will match the spark type. This conversion may return a schema that is not compatible with base schema.
baseSchema - a Schema on which conversion is basedsparkType - a Spark StructTypecaseSensitive - when false, case of field names in schema is ignoredjava.lang.IllegalArgumentException - if the type cannot be converted or there are missing idspublic static Schema prune(Schema schema, org.apache.spark.sql.types.StructType requestedType)
Schema using a Spark type projection.
 This requires that the Spark type is a projection of the Schema. Nullability and types must match.
schema - a SchemarequestedType - a projection of the Spark representation of the Schemajava.lang.IllegalArgumentException - if the Spark type does not match the Schemapublic static Schema prune(Schema schema, org.apache.spark.sql.types.StructType requestedType, java.util.List<Expression> filters)
Schema using a Spark type projection.
 This requires that the Spark type is a projection of the Schema. Nullability and types must match.
The filters list of Expression is used to ensure that columns referenced by filters
 are projected.
schema - a SchemarequestedType - a projection of the Spark representation of the Schemafilters - a list of filtersjava.lang.IllegalArgumentException - if the Spark type does not match the Schemapublic static Schema prune(Schema schema, org.apache.spark.sql.types.StructType requestedType, Expression filter, boolean caseSensitive)
Schema using a Spark type projection.
 This requires that the Spark type is a projection of the Schema. Nullability and types must match.
The filters list of Expression is used to ensure that columns referenced by filters
 are projected.
schema - a SchemarequestedType - a projection of the Spark representation of the Schemafilter - a filtersjava.lang.IllegalArgumentException - if the Spark type does not match the Schemapublic static long estimateSize(org.apache.spark.sql.types.StructType tableSchema,
                                long totalRecords)
tableSchema - Spark schematotalRecords - total records in the tablepublic static void validateMetadataColumnReferences(Schema tableSchema, Schema readSchema)
public static java.util.Map<java.lang.Integer,java.lang.String> indexQuotedNameById(Schema schema)