Class ORCSchemaUtil


  • public final class ORCSchemaUtil
    extends java.lang.Object
    Utilities for mapping Iceberg to ORC schemas.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static java.lang.String ICEBERG_BINARY_TYPE_ATTRIBUTE
      The name of the ORC TypeDescription attribute indicating the Iceberg type corresponding to an ORC binary type.
      static java.lang.String ICEBERG_LONG_TYPE_ATTRIBUTE
      The name of the ORC TypeDescription attribute indicating the Iceberg type corresponding to an ORC long type.
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static org.apache.orc.TypeDescription buildOrcProjection​(Schema schema, org.apache.orc.TypeDescription originalOrcSchema)
      Converts an Iceberg schema to a corresponding ORC schema within the context of an existing ORC file schema.
      static org.apache.orc.TypeDescription convert​(Schema schema)  
      static Schema convert​(org.apache.orc.TypeDescription orcSchema)
      Convert an ORC schema to an Iceberg schema.
      static int fieldId​(org.apache.orc.TypeDescription orcType)  
      static java.util.Map<java.lang.Integer,​java.lang.String> idToOrcName​(Schema schema)
      Generates mapping from field IDs to ORC qualified names.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • ICEBERG_BINARY_TYPE_ATTRIBUTE

        public static final java.lang.String ICEBERG_BINARY_TYPE_ATTRIBUTE
        The name of the ORC TypeDescription attribute indicating the Iceberg type corresponding to an ORC binary type. The values for this attribute are denoted in BinaryType.
        See Also:
        Constant Field Values
      • ICEBERG_LONG_TYPE_ATTRIBUTE

        public static final java.lang.String ICEBERG_LONG_TYPE_ATTRIBUTE
        The name of the ORC TypeDescription attribute indicating the Iceberg type corresponding to an ORC long type. The values for this attribute are denoted in LongType.
        See Also:
        Constant Field Values
    • Method Detail

      • convert

        public static org.apache.orc.TypeDescription convert​(Schema schema)
      • convert

        public static Schema convert​(org.apache.orc.TypeDescription orcSchema)
        Convert an ORC schema to an Iceberg schema. This method handles the convertion from the original Iceberg column mapping IDs if present in the ORC column attributes, otherwise, ORC columns with no Iceberg IDs will be ignored and skipped in the conversion.
        Returns:
        the Iceberg schema
        Throws:
        java.lang.IllegalArgumentException - if ORC schema has no columns with Iceberg ID attributes
      • buildOrcProjection

        public static org.apache.orc.TypeDescription buildOrcProjection​(Schema schema,
                                                                        org.apache.orc.TypeDescription originalOrcSchema)
        Converts an Iceberg schema to a corresponding ORC schema within the context of an existing ORC file schema. This method also handles schema evolution from the original ORC file schema to the given Iceberg schema. It builds the desired reader schema with the schema evolution rules and pass that down to the ORC reader, which would then use its schema evolution to map that to the writer’s schema.

        Example: Iceberg writer ORC writer struct<a (1): int, b (2): string> struct<a: int, b: string> struct<a (1): struct<b (2): string, c (3): date>> struct<a: struct<b:string, c:date>> Iceberg reader ORC reader struct<a (2): string, c (3): date> struct<b: string, c: date> struct<aa (1): struct<cc (3): date, bb (2): string>> struct<a: struct<c:date, b:string>>

        Parameters:
        schema - an Iceberg schema
        originalOrcSchema - an existing ORC file schema
        Returns:
        the resulting ORC schema
      • fieldId

        public static int fieldId​(org.apache.orc.TypeDescription orcType)
      • idToOrcName

        public static java.util.Map<java.lang.Integer,​java.lang.String> idToOrcName​(Schema schema)
        Generates mapping from field IDs to ORC qualified names. See IdToOrcName for details.