org.apache.iceberg.util.ZOrderByteUtils

public class ZOrderByteUtils extends Object

Within Z-Ordering the byte representations of objects being compared must be ordered, this requires several types to be transformed when converted to bytes. The goal is to map object's whose byte representation are not lexicographically ordered into representations that are lexicographically ordered. Bytes produced should be compared lexicographically as unsigned bytes, big-endian.

All types except for String are stored within an 8 Byte Buffer

Most of these techniques are derived from https://aws.amazon.com/blogs/database/z-order-indexing-for-multifaceted-queries-in-amazon-dynamodb-part-2/

Some implementation is taken from https://github.com/apache/hbase/blob/master/hbase-common/src/main/java/org/apache/hadoop/hbase/util/OrderedBytes.java

Field Summary

Fields

Modifier and Type

Field

Description

static final int

PRIMITIVE_BUFFER_SIZE
Method Summary

Modifier and Type

Method

Description

static ByteBuffer

byteTruncateOrFill(byte[] val, int length, ByteBuffer reuse)

Return a bytebuffer with the given bytes truncated to length, or filled with 0's to length depending on whether the given bytes are larger or smaller than the given length.

static ByteBuffer

doubleToOrderedBytes(double val, ByteBuffer reuse)

Internally just calls floatingPointOrderedBytes(double, ByteBuffer)

static ByteBuffer

floatingPointOrderedBytes(double val, ByteBuffer reuse)

IEEE 754 : “If two floating-point numbers in the same format are ordered (say, x < y), they are ordered the same way when their bits are reinterpreted as sign-magnitude integers.”

static ByteBuffer

floatToOrderedBytes(float val, ByteBuffer reuse)

Internally just calls floatingPointOrderedBytes(double, ByteBuffer)

static byte[]

interleaveBits(byte[][] columnsBinary, int interleavedSize, ByteBuffer reuse)

Interleave bits using a naive loop.

static ByteBuffer

intToOrderedBytes(int val, ByteBuffer reuse)

Internally just calls wholeNumberOrderedBytes(long, ByteBuffer)

static ByteBuffer

longToOrderedBytes(long val, ByteBuffer reuse)

Internally just calls wholeNumberOrderedBytes(long, ByteBuffer)

static ByteBuffer

shortToOrderedBytes(short val, ByteBuffer reuse)

Internally just calls wholeNumberOrderedBytes(long, ByteBuffer)

static ByteBuffer

stringToOrderedBytes(String val, int length, ByteBuffer reuse, CharsetEncoder encoder)

Strings are lexicographically sortable BUT if different byte array lengths will ruin the Z-Ordering.

static ByteBuffer

tinyintToOrderedBytes(byte val, ByteBuffer reuse)

Internally just calls wholeNumberOrderedBytes(long, ByteBuffer)

static ByteBuffer

wholeNumberOrderedBytes(long val, ByteBuffer reuse)

Signed longs do not have their bytes in magnitude order because of the sign bit.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- PRIMITIVE_BUFFER_SIZE
  
  public static final int PRIMITIVE_BUFFER_SIZE
  See Also:
  
  Constant Field Values
Method Details
- intToOrderedBytes
  
  public static ByteBuffer intToOrderedBytes(int val, ByteBuffer reuse)
  
  Internally just calls wholeNumberOrderedBytes(long, ByteBuffer)
- longToOrderedBytes
  
  public static ByteBuffer longToOrderedBytes(long val, ByteBuffer reuse)
  
  Internally just calls wholeNumberOrderedBytes(long, ByteBuffer)
- shortToOrderedBytes
  
  public static ByteBuffer shortToOrderedBytes(short val, ByteBuffer reuse)
  
  Internally just calls wholeNumberOrderedBytes(long, ByteBuffer)
- tinyintToOrderedBytes
  
  public static ByteBuffer tinyintToOrderedBytes(byte val, ByteBuffer reuse)
  
  Internally just calls wholeNumberOrderedBytes(long, ByteBuffer)
- wholeNumberOrderedBytes
  
  public static ByteBuffer wholeNumberOrderedBytes(long val, ByteBuffer reuse)
  
  Signed longs do not have their bytes in magnitude order because of the sign bit. To fix this, flip the sign bit so that all negatives are ordered before positives. This essentially shifts the 0 value so that we don't break our ordering when we cross the new 0 value.
- floatToOrderedBytes
  
  public static ByteBuffer floatToOrderedBytes(float val, ByteBuffer reuse)
  
  Internally just calls floatingPointOrderedBytes(double, ByteBuffer)
- doubleToOrderedBytes
  
  public static ByteBuffer doubleToOrderedBytes(double val, ByteBuffer reuse)
  
  Internally just calls floatingPointOrderedBytes(double, ByteBuffer)
- floatingPointOrderedBytes
  
  public static ByteBuffer floatingPointOrderedBytes(double val, ByteBuffer reuse)
  
  IEEE 754 : “If two floating-point numbers in the same format are ordered (say, x < y), they are ordered the same way when their bits are reinterpreted as sign-magnitude integers.”
  Which means doubles can be treated as sign magnitude integers which can then be converted into lexicographically comparable bytes
- stringToOrderedBytes
  
  public static ByteBuffer stringToOrderedBytes(String val, int length, ByteBuffer reuse, CharsetEncoder encoder)
  
  Strings are lexicographically sortable BUT if different byte array lengths will ruin the Z-Ordering. (ZOrder requires that a given column contribute the same number of bytes every time). This implementation just uses a set size to for all output byte representations. Truncating longer strings and right padding 0 for shorter strings.
- byteTruncateOrFill
  
  public static ByteBuffer byteTruncateOrFill(byte[] val, int length, ByteBuffer reuse)
  
  Return a bytebuffer with the given bytes truncated to length, or filled with 0's to length depending on whether the given bytes are larger or smaller than the given length.
- interleaveBits
  
  public static byte[] interleaveBits(byte[][] columnsBinary, int interleavedSize, ByteBuffer reuse)
  
  Interleave bits using a naive loop. Variable length inputs are allowed but to get a consistent ordering it is required that every column contribute the same number of bytes in each invocation. Bits are interleaved from all columns that have a bit available at that position. Once a Column has no more bits to produce it is skipped in the interleaving.
  
  Parameters:
  
  columnsBinary - an array of ordered byte representations of the columns being ZOrdered
  
  interleavedSize - the number of bytes to use in the output
  
  Returns:
  
  the columnbytes interleaved

Class ZOrderByteUtils

Field Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

PRIMITIVE_BUFFER_SIZE

Method Details

intToOrderedBytes

longToOrderedBytes

shortToOrderedBytes

tinyintToOrderedBytes

wholeNumberOrderedBytes

floatToOrderedBytes

doubleToOrderedBytes

floatingPointOrderedBytes

stringToOrderedBytes

byteTruncateOrFill

interleaveBits