java.lang.Object

org.apache.iceberg.arrow.vectorized.VectorizedArrowReader

All Implemented Interfaces:: VectorizedReader<VectorHolder>

Direct Known Subclasses:: VectorizedArrowReader.ConstantVectorReader, VectorizedArrowReader.DeletedVectorReader

public class VectorizedArrowReader extends Object implements VectorizedReader<VectorHolder>

VectorReader(s) that read in a batch of values into Arrow vectors. It also takes care of allocating the right kind of Arrow vectors depending on the corresponding Iceberg/Parquet data types.

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static class

VectorizedArrowReader.ConstantVectorReader<T>

A Dummy Vector Reader which doesn't actually read files, instead it returns a dummy VectorHolder which indicates the constant value which should be used for this column.

static class

VectorizedArrowReader.DeletedVectorReader

A Dummy Vector Reader which doesn't actually read files.
Field Summary

Fields

Modifier and Type

Field

Description

static final int

DEFAULT_BATCH_SIZE
Constructor Summary

Constructors

Constructor

Description

VectorizedArrowReader(org.apache.parquet.column.ColumnDescriptor desc, Types.NestedField icebergField, org.apache.arrow.memory.BufferAllocator ra, boolean setArrowValidityVector)
Method Summary

Modifier and Type

Method

Description

void

close()

Release any resources allocated.

protected Types.NestedField

icebergField()

static VectorizedArrowReader

nulls()

static VectorizedArrowReader

positions()

static VectorizedArrowReader

positionsWithSetArrowValidityVector()

VectorHolder

read(VectorHolder reuse, int numValsToRead)

Reads a batch of type @param <T> and of size numRows

void

setBatchSize(int batchSize)

void

setRowGroupInfo(org.apache.parquet.column.page.PageReadStore source, Map<org.apache.parquet.hadoop.metadata.ColumnPath,org.apache.parquet.hadoop.metadata.ColumnChunkMetaData> metadata, long rowPosition)

Sets the row group information to be used with this reader

String

toString()

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Field Details
- DEFAULT_BATCH_SIZE
  
  public static final int DEFAULT_BATCH_SIZE
  See Also:
  
  Constant Field Values
Constructor Details
- VectorizedArrowReader
  
  public VectorizedArrowReader(org.apache.parquet.column.ColumnDescriptor desc, Types.NestedField icebergField, org.apache.arrow.memory.BufferAllocator ra, boolean setArrowValidityVector)
Method Details
- icebergField
  
  protected Types.NestedField icebergField()
- setBatchSize
  
  public void setBatchSize(int batchSize)
  
  Specified by:
  
  setBatchSize in interface VectorizedReader<VectorHolder>
- read
  
  public VectorHolder read(VectorHolder reuse, int numValsToRead)
  
  Description copied from interface: VectorizedReader
  
  Reads a batch of type @param <T> and of size numRows
  
  Specified by:
  
  read in interface VectorizedReader<VectorHolder>
  
  Parameters:
  
  reuse - container for the last batch to be reused for next batch
  
  numValsToRead - number of rows to read
  
  Returns:
  
  batch of records of type @param <T>
- setRowGroupInfo
  
  public void setRowGroupInfo(org.apache.parquet.column.page.PageReadStore source, Map<org.apache.parquet.hadoop.metadata.ColumnPath,org.apache.parquet.hadoop.metadata.ColumnChunkMetaData> metadata, long rowPosition)
  
  Description copied from interface: VectorizedReader
  
  Sets the row group information to be used with this reader
  
  Specified by:
  
  setRowGroupInfo in interface VectorizedReader<VectorHolder>
  
  Parameters:
  
  source - row group information for all the columns
  
  metadata - map of ColumnPath -> ColumnChunkMetaData for the row group
  
  rowPosition - the row group's row offset in the parquet file
- close
  
  public void close()
  
  Description copied from interface: VectorizedReader
  
  Release any resources allocated.
  
  Specified by:
  
  close in interface VectorizedReader<VectorHolder>
- toString
  
  public String toString()
  
  Overrides:
  
  toString in class Object
- nulls
  
  public static VectorizedArrowReader nulls()
- positions
  
  public static VectorizedArrowReader positions()
- positionsWithSetArrowValidityVector
  
  public static VectorizedArrowReader positionsWithSetArrowValidityVector()

Class VectorizedArrowReader

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

DEFAULT_BATCH_SIZE

Constructor Details

VectorizedArrowReader

Method Details

icebergField

setBatchSize

read

setRowGroupInfo

close

toString

nulls

positions

positionsWithSetArrowValidityVector