Class VectorizedArrowReader
java.lang.Object
org.apache.iceberg.arrow.vectorized.VectorizedArrowReader
- All Implemented Interfaces:
VectorizedReader<VectorHolder>
- Direct Known Subclasses:
VectorizedArrowReader.ConstantVectorReader
,VectorizedArrowReader.DeletedVectorReader
VectorReader(s)
that read in a batch of values into Arrow vectors. It
also takes care of allocating the right kind of Arrow vectors depending on the corresponding
Iceberg/Parquet data types.-
Nested Class Summary
Modifier and TypeClassDescriptionstatic class
A Dummy Vector Reader which doesn't actually read files, instead it returns a dummy VectorHolder which indicates the constant value which should be used for this column.static class
A Dummy Vector Reader which doesn't actually read files. -
Field Summary
-
Constructor Summary
ConstructorDescriptionVectorizedArrowReader
(org.apache.parquet.column.ColumnDescriptor desc, Types.NestedField icebergField, org.apache.arrow.memory.BufferAllocator ra, boolean setArrowValidityVector) -
Method Summary
Modifier and TypeMethodDescriptionvoid
close()
Release any resources allocated.protected Types.NestedField
static VectorizedArrowReader
nulls()
static VectorizedArrowReader
static VectorizedArrowReader
read
(VectorHolder reuse, int numValsToRead) Reads a batch of type @param <T> and of size numRowsvoid
setBatchSize
(int batchSize) void
setRowGroupInfo
(org.apache.parquet.column.page.PageReadStore source, Map<org.apache.parquet.hadoop.metadata.ColumnPath, org.apache.parquet.hadoop.metadata.ColumnChunkMetaData> metadata, long rowPosition) Sets the row group information to be used with this readertoString()
-
Field Details
-
DEFAULT_BATCH_SIZE
public static final int DEFAULT_BATCH_SIZE- See Also:
-
-
Constructor Details
-
VectorizedArrowReader
public VectorizedArrowReader(org.apache.parquet.column.ColumnDescriptor desc, Types.NestedField icebergField, org.apache.arrow.memory.BufferAllocator ra, boolean setArrowValidityVector)
-
-
Method Details
-
icebergField
-
setBatchSize
public void setBatchSize(int batchSize) - Specified by:
setBatchSize
in interfaceVectorizedReader<VectorHolder>
-
read
Description copied from interface:VectorizedReader
Reads a batch of type @param <T> and of size numRows- Specified by:
read
in interfaceVectorizedReader<VectorHolder>
- Parameters:
reuse
- container for the last batch to be reused for next batchnumValsToRead
- number of rows to read- Returns:
- batch of records of type @param <T>
-
setRowGroupInfo
public void setRowGroupInfo(org.apache.parquet.column.page.PageReadStore source, Map<org.apache.parquet.hadoop.metadata.ColumnPath, org.apache.parquet.hadoop.metadata.ColumnChunkMetaData> metadata, long rowPosition) Description copied from interface:VectorizedReader
Sets the row group information to be used with this reader- Specified by:
setRowGroupInfo
in interfaceVectorizedReader<VectorHolder>
- Parameters:
source
- row group information for all the columnsmetadata
- map ofColumnPath
->ColumnChunkMetaData
for the row grouprowPosition
- the row group's row offset in the parquet file
-
close
public void close()Description copied from interface:VectorizedReader
Release any resources allocated.- Specified by:
close
in interfaceVectorizedReader<VectorHolder>
-
toString
-
nulls
-
positions
-
positionsWithSetArrowValidityVector
-