public class VectorizedArrowReader extends java.lang.Object implements VectorizedReader<VectorHolder>
VectorReader(s) that read in a batch of values into Arrow vectors. It
also takes care of allocating the right kind of Arrow vectors depending on the corresponding
Iceberg/Parquet data types.| Modifier and Type | Class and Description |
|---|---|
static class |
VectorizedArrowReader.ConstantVectorReader<T>
A Dummy Vector Reader which doesn't actually read files, instead it returns a dummy
VectorHolder which indicates the constant value which should be used for this column.
|
static class |
VectorizedArrowReader.DeletedVectorReader
A Dummy Vector Reader which doesn't actually read files.
|
| Modifier and Type | Field and Description |
|---|---|
static int |
DEFAULT_BATCH_SIZE |
| Constructor and Description |
|---|
VectorizedArrowReader(org.apache.parquet.column.ColumnDescriptor desc,
Types.NestedField icebergField,
org.apache.arrow.memory.BufferAllocator ra,
boolean setArrowValidityVector) |
| Modifier and Type | Method and Description |
|---|---|
void |
close()
Release any resources allocated.
|
static VectorizedArrowReader |
nulls() |
static VectorizedArrowReader |
positions() |
static VectorizedArrowReader |
positionsWithSetArrowValidityVector() |
VectorHolder |
read(VectorHolder reuse,
int numValsToRead)
Reads a batch of type @param <T> and of size numRows
|
void |
setBatchSize(int batchSize) |
void |
setRowGroupInfo(org.apache.parquet.column.page.PageReadStore source,
java.util.Map<org.apache.parquet.hadoop.metadata.ColumnPath,org.apache.parquet.hadoop.metadata.ColumnChunkMetaData> metadata,
long rowPosition)
Sets the row group information to be used with this reader
|
java.lang.String |
toString() |
public static final int DEFAULT_BATCH_SIZE
public VectorizedArrowReader(org.apache.parquet.column.ColumnDescriptor desc,
Types.NestedField icebergField,
org.apache.arrow.memory.BufferAllocator ra,
boolean setArrowValidityVector)
public void setBatchSize(int batchSize)
setBatchSize in interface VectorizedReader<VectorHolder>public VectorHolder read(VectorHolder reuse, int numValsToRead)
VectorizedReaderread in interface VectorizedReader<VectorHolder>reuse - container for the last batch to be reused for next batchnumValsToRead - number of rows to readpublic void setRowGroupInfo(org.apache.parquet.column.page.PageReadStore source,
java.util.Map<org.apache.parquet.hadoop.metadata.ColumnPath,org.apache.parquet.hadoop.metadata.ColumnChunkMetaData> metadata,
long rowPosition)
VectorizedReadersetRowGroupInfo in interface VectorizedReader<VectorHolder>source - row group information for all the columnsmetadata - map of ColumnPath -> ColumnChunkMetaData for the row grouprowPosition - the row group's row offset in the parquet filepublic void close()
VectorizedReaderclose in interface VectorizedReader<VectorHolder>public java.lang.String toString()
toString in class java.lang.Objectpublic static VectorizedArrowReader nulls()
public static VectorizedArrowReader positions()
public static VectorizedArrowReader positionsWithSetArrowValidityVector()