Class ParquetSchemaUtil


  • public class ParquetSchemaUtil
    extends java.lang.Object
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static org.apache.parquet.schema.MessageType addFallbackIds​(org.apache.parquet.schema.MessageType fileSchema)  
      static org.apache.parquet.schema.MessageType applyNameMapping​(org.apache.parquet.schema.MessageType fileSchema, NameMapping nameMapping)  
      static org.apache.parquet.schema.MessageType convert​(Schema schema, java.lang.String name)  
      static Schema convert​(org.apache.parquet.schema.MessageType parquetSchema)
      Converts a Parquet schema to an Iceberg schema.
      static Schema convertAndPrune​(org.apache.parquet.schema.MessageType parquetSchema)
      Converts a Parquet schema to an Iceberg schema and prunes fields without IDs.
      static org.apache.parquet.schema.Type determineListElementType​(org.apache.parquet.schema.GroupType array)  
      static boolean hasIds​(org.apache.parquet.schema.MessageType fileSchema)  
      static org.apache.parquet.schema.MessageType pruneColumns​(org.apache.parquet.schema.MessageType fileSchema, Schema expectedSchema)  
      static org.apache.parquet.schema.MessageType pruneColumnsFallback​(org.apache.parquet.schema.MessageType fileSchema, Schema expectedSchema)
      Prunes columns from a Parquet file schema that was written without field ids.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • convert

        public static org.apache.parquet.schema.MessageType convert​(Schema schema,
                                                                    java.lang.String name)
      • convert

        public static Schema convert​(org.apache.parquet.schema.MessageType parquetSchema)
        Converts a Parquet schema to an Iceberg schema. Fields without IDs are kept and assigned fallback IDs.
        Parameters:
        parquetSchema - a Parquet schema
        Returns:
        a matching Iceberg schema for the provided Parquet schema
      • convertAndPrune

        public static Schema convertAndPrune​(org.apache.parquet.schema.MessageType parquetSchema)
        Converts a Parquet schema to an Iceberg schema and prunes fields without IDs.
        Parameters:
        parquetSchema - a Parquet schema
        Returns:
        a matching Iceberg schema for the provided Parquet schema
      • pruneColumns

        public static org.apache.parquet.schema.MessageType pruneColumns​(org.apache.parquet.schema.MessageType fileSchema,
                                                                         Schema expectedSchema)
      • pruneColumnsFallback

        public static org.apache.parquet.schema.MessageType pruneColumnsFallback​(org.apache.parquet.schema.MessageType fileSchema,
                                                                                 Schema expectedSchema)
        Prunes columns from a Parquet file schema that was written without field ids.

        Files that were written without field ids are read assuming that schema evolution preserved column order. Deleting columns was not allowed.

        The order of columns in the resulting Parquet schema matches the Parquet file.

        Parameters:
        fileSchema - schema from a Parquet file that does not have field ids.
        expectedSchema - expected schema
        Returns:
        a parquet schema pruned using the expected schema
      • hasIds

        public static boolean hasIds​(org.apache.parquet.schema.MessageType fileSchema)
      • addFallbackIds

        public static org.apache.parquet.schema.MessageType addFallbackIds​(org.apache.parquet.schema.MessageType fileSchema)
      • applyNameMapping

        public static org.apache.parquet.schema.MessageType applyNameMapping​(org.apache.parquet.schema.MessageType fileSchema,
                                                                             NameMapping nameMapping)
      • determineListElementType

        public static org.apache.parquet.schema.Type determineListElementType​(org.apache.parquet.schema.GroupType array)