Class VectorizedArrowReader
- java.lang.Object
-
- org.apache.iceberg.arrow.vectorized.VectorizedArrowReader
-
- All Implemented Interfaces:
VectorizedReader<VectorHolder>
- Direct Known Subclasses:
VectorizedArrowReader.ConstantVectorReader
,VectorizedArrowReader.DeletedVectorReader
public class VectorizedArrowReader extends java.lang.Object implements VectorizedReader<VectorHolder>
VectorReader(s)
that read in a batch of values into Arrow vectors. It also takes care of allocating the right kind of Arrow vectors depending on the corresponding Iceberg/Parquet data types.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
VectorizedArrowReader.ConstantVectorReader<T>
A Dummy Vector Reader which doesn't actually read files, instead it returns a dummy VectorHolder which indicates the constant value which should be used for this column.static class
VectorizedArrowReader.DeletedVectorReader
A Dummy Vector Reader which doesn't actually read files.
-
Field Summary
Fields Modifier and Type Field Description static int
DEFAULT_BATCH_SIZE
-
Constructor Summary
Constructors Constructor Description VectorizedArrowReader(org.apache.parquet.column.ColumnDescriptor desc, Types.NestedField icebergField, org.apache.arrow.memory.BufferAllocator ra, boolean setArrowValidityVector)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
close()
Release any resources allocated.static VectorizedArrowReader
nulls()
static VectorizedArrowReader
positions()
static VectorizedArrowReader
positionsWithSetArrowValidityVector()
VectorHolder
read(VectorHolder reuse, int numValsToRead)
Reads a batch of type @param <T> and of size numRowsvoid
setBatchSize(int batchSize)
void
setRowGroupInfo(org.apache.parquet.column.page.PageReadStore source, java.util.Map<org.apache.parquet.hadoop.metadata.ColumnPath,org.apache.parquet.hadoop.metadata.ColumnChunkMetaData> metadata, long rowPosition)
Sets the row group information to be used with this readerjava.lang.String
toString()
-
-
-
Field Detail
-
DEFAULT_BATCH_SIZE
public static final int DEFAULT_BATCH_SIZE
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
VectorizedArrowReader
public VectorizedArrowReader(org.apache.parquet.column.ColumnDescriptor desc, Types.NestedField icebergField, org.apache.arrow.memory.BufferAllocator ra, boolean setArrowValidityVector)
-
-
Method Detail
-
setBatchSize
public void setBatchSize(int batchSize)
- Specified by:
setBatchSize
in interfaceVectorizedReader<VectorHolder>
-
read
public VectorHolder read(VectorHolder reuse, int numValsToRead)
Description copied from interface:VectorizedReader
Reads a batch of type @param <T> and of size numRows- Specified by:
read
in interfaceVectorizedReader<VectorHolder>
- Parameters:
reuse
- container for the last batch to be reused for next batchnumValsToRead
- number of rows to read- Returns:
- batch of records of type @param <T>
-
setRowGroupInfo
public void setRowGroupInfo(org.apache.parquet.column.page.PageReadStore source, java.util.Map<org.apache.parquet.hadoop.metadata.ColumnPath,org.apache.parquet.hadoop.metadata.ColumnChunkMetaData> metadata, long rowPosition)
Description copied from interface:VectorizedReader
Sets the row group information to be used with this reader- Specified by:
setRowGroupInfo
in interfaceVectorizedReader<VectorHolder>
- Parameters:
source
- row group information for all the columnsmetadata
- map ofColumnPath
->ColumnChunkMetaData
for the row grouprowPosition
- the row group's row offset in the parquet file
-
close
public void close()
Description copied from interface:VectorizedReader
Release any resources allocated.- Specified by:
close
in interfaceVectorizedReader<VectorHolder>
-
toString
public java.lang.String toString()
- Overrides:
toString
in classjava.lang.Object
-
nulls
public static VectorizedArrowReader nulls()
-
positions
public static VectorizedArrowReader positions()
-
positionsWithSetArrowValidityVector
public static VectorizedArrowReader positionsWithSetArrowValidityVector()
-
-