Class ORCSchemaUtil

java.lang.Object
org.apache.iceberg.orc.ORCSchemaUtil

public final class ORCSchemaUtil extends Object
Utilities for mapping Iceberg to ORC schemas.
  • Field Details

    • ICEBERG_BINARY_TYPE_ATTRIBUTE

      public static final String ICEBERG_BINARY_TYPE_ATTRIBUTE
      The name of the ORC TypeDescription attribute indicating the Iceberg type corresponding to an ORC binary type. The values for this attribute are denoted in BinaryType.
      See Also:
    • ICEBERG_LONG_TYPE_ATTRIBUTE

      public static final String ICEBERG_LONG_TYPE_ATTRIBUTE
      The name of the ORC TypeDescription attribute indicating the Iceberg type corresponding to an ORC long type. The values for this attribute are denoted in LongType.
      See Also:
  • Method Details

    • convert

      public static org.apache.orc.TypeDescription convert(Schema schema)
    • convert

      public static Schema convert(org.apache.orc.TypeDescription orcSchema)
      Convert an ORC schema to an Iceberg schema. This method handles the convertion from the original Iceberg column mapping IDs if present in the ORC column attributes, otherwise, ORC columns with no Iceberg IDs will be ignored and skipped in the conversion.
      Returns:
      the Iceberg schema
      Throws:
      IllegalArgumentException - if ORC schema has no columns with Iceberg ID attributes
    • buildOrcProjection

      public static org.apache.orc.TypeDescription buildOrcProjection(Schema schema, org.apache.orc.TypeDescription originalOrcSchema)
      Converts an Iceberg schema to a corresponding ORC schema within the context of an existing ORC file schema. This method also handles schema evolution from the original ORC file schema to the given Iceberg schema. It builds the desired reader schema with the schema evolution rules and pass that down to the ORC reader, which would then use its schema evolution to map that to the writer’s schema.

      Example: Iceberg writer ORC writer struct<a (1): int, b (2): string> struct<a: int, b: string> struct<a (1): struct<b (2): string, c (3): date>> struct<a: struct<b:string, c:date>> Iceberg reader ORC reader struct<a (2): string, c (3): date> struct<b: string, c: date> struct<aa (1): struct<cc (3): date, bb (2): string>> struct<a: struct<c:date, b:string>>

      Parameters:
      schema - an Iceberg schema
      originalOrcSchema - an existing ORC file schema
      Returns:
      the resulting ORC schema
    • fieldId

      public static int fieldId(org.apache.orc.TypeDescription orcType)
    • idToOrcName

      public static Map<Integer,String> idToOrcName(Schema schema)
      Generates mapping from field IDs to ORC qualified names. See IdToOrcName for details.