Class Partitioning
-
Method Summary
Modifier and TypeMethodDescriptionstatic Types.StructType
groupingKeyType
(Schema schema, Collection<PartitionSpec> specs) Builds a grouping key type considering the provided schema and specs.static boolean
hasBucketField
(PartitionSpec spec) Check whether the spec contains a bucketed partition field.static Types.StructType
partitionType
(Table table) Builds a unified partition type considering all specs in a table.static SortOrder
sortOrderFor
(PartitionSpec spec) Create a sort order that will group data for a partition spec.
-
Method Details
-
hasBucketField
Check whether the spec contains a bucketed partition field.- Parameters:
spec
- a partition spec- Returns:
- true if the spec has field with a bucket transform
-
sortOrderFor
Create a sort order that will group data for a partition spec.If the partition spec contains bucket columns, the sort order will also have a field to sort by a column that is bucketed in the spec. The column is selected by the highest number of buckets in the transform.
- Parameters:
spec
- a partition spec- Returns:
- a sort order that will cluster data for the spec
-
groupingKeyType
Builds a grouping key type considering the provided schema and specs.A grouping key defines how data is split between files and consists of partition fields with non-void transforms that are present in each provided spec. Iceberg guarantees that records with different values for the grouping key are disjoint and are stored in separate files.
If there is only one spec, the grouping key will include all partition fields with non-void transforms from that spec. Whenever there are multiple specs, the grouping key will represent an intersection of all partition fields with non-void transforms. If a partition field is present only in a subset of specs, Iceberg cannot guarantee data distribution on that field. That's why it will not be part of the grouping key. Unpartitioned tables or tables with non-overlapping specs have empty grouping keys.
When partition fields are dropped in v1 tables, they are replaced with new partition fields that have the same field ID but use a void transform under the hood. Such fields cannot be part of the grouping key as void transforms always return null.
If the provided schema is not null, this method will only take into account partition fields on top of columns present in the schema. Otherwise, all partition fields will be considered.
- Parameters:
schema
- a schema specifying a set of source columns to consider (null to consider all)specs
- one or many specs- Returns:
- the constructed grouping key type
-
partitionType
Builds a unified partition type considering all specs in a table.If there is only one spec, the partition type is that spec's partition type. Whenever there are multiple specs, the partition type is a struct containing all fields that have ever been a part of any spec in the table. In other words, the struct fields represent a union of all known partition fields.
- Parameters:
table
- a table with one or many specs- Returns:
- the constructed unified partition type
-