Class Partitioning

java.lang.Object
org.apache.iceberg.Partitioning

public class Partitioning extends Object
  • Method Details

    • hasBucketField

      public static boolean hasBucketField(PartitionSpec spec)
      Check whether the spec contains a bucketed partition field.
      Parameters:
      spec - a partition spec
      Returns:
      true if the spec has field with a bucket transform
    • sortOrderFor

      public static SortOrder sortOrderFor(PartitionSpec spec)
      Create a sort order that will group data for a partition spec.

      If the partition spec contains bucket columns, the sort order will also have a field to sort by a column that is bucketed in the spec. The column is selected by the highest number of buckets in the transform.

      Parameters:
      spec - a partition spec
      Returns:
      a sort order that will cluster data for the spec
    • groupingKeyType

      public static Types.StructType groupingKeyType(Schema schema, Collection<PartitionSpec> specs)
      Builds a grouping key type considering the provided schema and specs.

      A grouping key defines how data is split between files and consists of partition fields with non-void transforms that are present in each provided spec. Iceberg guarantees that records with different values for the grouping key are disjoint and are stored in separate files.

      If there is only one spec, the grouping key will include all partition fields with non-void transforms from that spec. Whenever there are multiple specs, the grouping key will represent an intersection of all partition fields with non-void transforms. If a partition field is present only in a subset of specs, Iceberg cannot guarantee data distribution on that field. That's why it will not be part of the grouping key. Unpartitioned tables or tables with non-overlapping specs have empty grouping keys.

      When partition fields are dropped in v1 tables, they are replaced with new partition fields that have the same field ID but use a void transform under the hood. Such fields cannot be part of the grouping key as void transforms always return null.

      If the provided schema is not null, this method will only take into account partition fields on top of columns present in the schema. Otherwise, all partition fields will be considered.

      Parameters:
      schema - a schema specifying a set of source columns to consider (null to consider all)
      specs - one or many specs
      Returns:
      the constructed grouping key type
    • partitionType

      public static Types.StructType partitionType(Table table)
      Builds a unified partition type considering all specs in a table.

      If there is only one spec, the partition type is that spec's partition type. Whenever there are multiple specs, the partition type is a struct containing all fields that have ever been a part of any spec in the table. In other words, the struct fields represent a union of all known partition fields.

      Parameters:
      table - a table with one or many specs
      Returns:
      the constructed unified partition type