Class ParquetSchemaUtil

java.lang.Object
org.apache.iceberg.parquet.ParquetSchemaUtil

public class ParquetSchemaUtil extends Object
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    static class 
     
  • Method Summary

    Modifier and Type
    Method
    Description
    static org.apache.parquet.schema.MessageType
    addFallbackIds(org.apache.parquet.schema.MessageType fileSchema)
     
    static org.apache.parquet.schema.MessageType
    applyNameMapping(org.apache.parquet.schema.MessageType fileSchema, NameMapping nameMapping)
     
    static org.apache.parquet.schema.MessageType
    convert(Schema schema, String name)
    Convert an Iceberg schema to Parquet.
    static org.apache.parquet.schema.MessageType
    convert(Schema schema, String name, VariantShreddingFunction variantShreddingFunc)
    Convert an Iceberg schema to Parquet.
    static Schema
    convert(org.apache.parquet.schema.MessageType parquetSchema)
    Converts a Parquet schema to an Iceberg schema.
    static Schema
    convertAndPrune(org.apache.parquet.schema.MessageType parquetSchema)
    Converts a Parquet schema to an Iceberg schema and prunes fields without IDs.
    static org.apache.parquet.schema.Type
    determineListElementType(org.apache.parquet.schema.GroupType array)
     
    static org.apache.parquet.schema.Type
    fieldType(org.apache.parquet.schema.GroupType group, String name)
    Returns the Type of the named field in the struct/group, or null.
    static boolean
    hasField(org.apache.parquet.schema.GroupType group, String name)
    Returns true if the name identifies a field in the struct/group.
    static boolean
    hasIds(org.apache.parquet.schema.MessageType fileSchema)
     
    static org.apache.parquet.schema.MessageType
    pruneColumns(org.apache.parquet.schema.MessageType fileSchema, Schema expectedSchema)
     
    static org.apache.parquet.schema.MessageType
    pruneColumnsFallback(org.apache.parquet.schema.MessageType fileSchema, Schema expectedSchema)
    Prunes columns from a Parquet file schema that was written without field ids.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Method Details

    • convert

      public static org.apache.parquet.schema.MessageType convert(Schema schema, String name)
      Convert an Iceberg schema to Parquet.
      Parameters:
      schema - an Iceberg Schema
      name - name for the Parquet schema
      Returns:
      the schema converted to a Parquet MessageType
    • convert

      public static org.apache.parquet.schema.MessageType convert(Schema schema, String name, VariantShreddingFunction variantShreddingFunc)
      Convert an Iceberg schema to Parquet.

      Variant fields are converted by calling the VariantShreddingFunction with the variant's and field ID and name to produce the shredding type as a typed_value field. This field is added to the variant struct alongside the metadata and value fields.

      Parameters:
      schema - an Iceberg Schema
      name - name for the Parquet schema
      variantShreddingFunc - VariantShreddingFunction that produces a shredded type
      Returns:
      the schema converted to a Parquet MessageType
    • convert

      public static Schema convert(org.apache.parquet.schema.MessageType parquetSchema)
      Converts a Parquet schema to an Iceberg schema. Fields without IDs are kept and assigned fallback IDs.
      Parameters:
      parquetSchema - a Parquet schema
      Returns:
      a matching Iceberg schema for the provided Parquet schema
    • convertAndPrune

      public static Schema convertAndPrune(org.apache.parquet.schema.MessageType parquetSchema)
      Converts a Parquet schema to an Iceberg schema and prunes fields without IDs.
      Parameters:
      parquetSchema - a Parquet schema
      Returns:
      a matching Iceberg schema for the provided Parquet schema
    • hasField

      public static boolean hasField(org.apache.parquet.schema.GroupType group, String name)
      Returns true if the name identifies a field in the struct/group.
      Parameters:
      group - a GroupType
      name - a String name
      Returns:
      true if the group contains a field with the given name
    • fieldType

      public static org.apache.parquet.schema.Type fieldType(org.apache.parquet.schema.GroupType group, String name)
      Returns the Type of the named field in the struct/group, or null.
      Parameters:
      group - a GroupType
      name - a String name
      Returns:
      the Type of the field in the group, or null if it is not present.
    • pruneColumns

      public static org.apache.parquet.schema.MessageType pruneColumns(org.apache.parquet.schema.MessageType fileSchema, Schema expectedSchema)
    • pruneColumnsFallback

      public static org.apache.parquet.schema.MessageType pruneColumnsFallback(org.apache.parquet.schema.MessageType fileSchema, Schema expectedSchema)
      Prunes columns from a Parquet file schema that was written without field ids.

      Files that were written without field ids are read assuming that schema evolution preserved column order. Deleting columns was not allowed.

      The order of columns in the resulting Parquet schema matches the Parquet file.

      Parameters:
      fileSchema - schema from a Parquet file that does not have field ids.
      expectedSchema - expected schema
      Returns:
      a parquet schema pruned using the expected schema
    • hasIds

      public static boolean hasIds(org.apache.parquet.schema.MessageType fileSchema)
    • addFallbackIds

      public static org.apache.parquet.schema.MessageType addFallbackIds(org.apache.parquet.schema.MessageType fileSchema)
    • applyNameMapping

      public static org.apache.parquet.schema.MessageType applyNameMapping(org.apache.parquet.schema.MessageType fileSchema, NameMapping nameMapping)
    • determineListElementType

      public static org.apache.parquet.schema.Type determineListElementType(org.apache.parquet.schema.GroupType array)