java.lang.Object

org.apache.iceberg.parquet.ParquetSchemaUtil

public class ParquetSchemaUtil extends Object

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static class

ParquetSchemaUtil.HasIds
Method Summary

Modifier and Type

Method

Description

static org.apache.parquet.schema.MessageType

addFallbackIds(org.apache.parquet.schema.MessageType fileSchema)

static org.apache.parquet.schema.MessageType

applyNameMapping(org.apache.parquet.schema.MessageType fileSchema, NameMapping nameMapping)

static org.apache.parquet.schema.MessageType

convert(Schema schema, String name)

Convert an Iceberg schema to Parquet.

static org.apache.parquet.schema.MessageType

convert(Schema schema, String name, VariantShreddingFunction variantShreddingFunc)

Convert an Iceberg schema to Parquet.

static Schema

convert(org.apache.parquet.schema.MessageType parquetSchema)

Converts a Parquet schema to an Iceberg schema.

static Schema

convertAndPrune(org.apache.parquet.schema.MessageType parquetSchema)

Converts a Parquet schema to an Iceberg schema and prunes fields without IDs.

static org.apache.parquet.schema.Type

determineListElementType(org.apache.parquet.schema.GroupType array)

static org.apache.parquet.schema.Type

fieldType(org.apache.parquet.schema.GroupType group, String name)

Returns the Type of the named field in the struct/group, or null.

static boolean

hasField(org.apache.parquet.schema.GroupType group, String name)

Returns true if the name identifies a field in the struct/group.

static boolean

hasIds(org.apache.parquet.schema.MessageType fileSchema)

static org.apache.parquet.schema.MessageType

pruneColumns(org.apache.parquet.schema.MessageType fileSchema, Schema expectedSchema)

static org.apache.parquet.schema.MessageType

pruneColumnsFallback(org.apache.parquet.schema.MessageType fileSchema, Schema expectedSchema)

Prunes columns from a Parquet file schema that was written without field ids.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Method Details
- convert
  
  public static org.apache.parquet.schema.MessageType convert(Schema schema, String name)
  
  Convert an Iceberg schema to Parquet.
  
  Parameters:
  
  schema - an Iceberg Schema
  
  name - name for the Parquet schema
  
  Returns:
  
  the schema converted to a Parquet MessageType
- convert
  
  public static org.apache.parquet.schema.MessageType convert(Schema schema, String name, VariantShreddingFunction variantShreddingFunc)
  
  Convert an Iceberg schema to Parquet.
  Variant fields are converted by calling the VariantShreddingFunction with the variant's and field ID and name to produce the shredding type as a typed_value field. This field is added to the variant struct alongside the metadata and value fields.
  
  Parameters:
  
  schema - an Iceberg Schema
  
  name - name for the Parquet schema
  
  variantShreddingFunc - VariantShreddingFunction that produces a shredded type
  
  Returns:
  
  the schema converted to a Parquet MessageType
- convert
  
  public static Schema convert(org.apache.parquet.schema.MessageType parquetSchema)
  
  Converts a Parquet schema to an Iceberg schema. Fields without IDs are kept and assigned fallback IDs.
  
  Parameters:
  
  parquetSchema - a Parquet schema
  
  Returns:
  
  a matching Iceberg schema for the provided Parquet schema
- convertAndPrune
  
  public static Schema convertAndPrune(org.apache.parquet.schema.MessageType parquetSchema)
  
  Converts a Parquet schema to an Iceberg schema and prunes fields without IDs.
  
  Parameters:
  
  parquetSchema - a Parquet schema
  
  Returns:
  
  a matching Iceberg schema for the provided Parquet schema
- hasField
  
  public static boolean hasField(org.apache.parquet.schema.GroupType group, String name)
  
  Returns true if the name identifies a field in the struct/group.
  
  Parameters:
  
  group - a GroupType
  
  name - a String name
  
  Returns:
  
  true if the group contains a field with the given name
- fieldType
  
  public static org.apache.parquet.schema.Type fieldType(org.apache.parquet.schema.GroupType group, String name)
  
  Returns the Type of the named field in the struct/group, or null.
  
  Parameters:
  
  group - a GroupType
  
  name - a String name
  
  Returns:
  
  the Type of the field in the group, or null if it is not present.
- pruneColumns
  
  public static org.apache.parquet.schema.MessageType pruneColumns(org.apache.parquet.schema.MessageType fileSchema, Schema expectedSchema)
- pruneColumnsFallback
  
  public static org.apache.parquet.schema.MessageType pruneColumnsFallback(org.apache.parquet.schema.MessageType fileSchema, Schema expectedSchema)
  
  Prunes columns from a Parquet file schema that was written without field ids.
  Files that were written without field ids are read assuming that schema evolution preserved column order. Deleting columns was not allowed.
  The order of columns in the resulting Parquet schema matches the Parquet file.
  
  Parameters:
  
  fileSchema - schema from a Parquet file that does not have field ids.
  
  expectedSchema - expected schema
  
  Returns:
  
  a parquet schema pruned using the expected schema
- hasIds
  
  public static boolean hasIds(org.apache.parquet.schema.MessageType fileSchema)
- addFallbackIds
  
  public static org.apache.parquet.schema.MessageType addFallbackIds(org.apache.parquet.schema.MessageType fileSchema)
- applyNameMapping
  
  public static org.apache.parquet.schema.MessageType applyNameMapping(org.apache.parquet.schema.MessageType fileSchema, NameMapping nameMapping)
- determineListElementType
  
  public static org.apache.parquet.schema.Type determineListElementType(org.apache.parquet.schema.GroupType array)

Class ParquetSchemaUtil

Nested Class Summary

Method Summary

Methods inherited from class java.lang.Object

Method Details

convert

convert

convert

convertAndPrune

hasField

fieldType

pruneColumns

pruneColumnsFallback

hasIds

addFallbackIds

applyNameMapping

determineListElementType