public class ColumnarBatchReader extends java.lang.Object implements VectorizedReader<org.apache.spark.sql.vectorized.ColumnarBatch>
VectorizedReader
that returns Spark's ColumnarBatch
to support Spark's vectorized read path. The
ColumnarBatch
returned is created by passing in the Arrow vectors populated via delegated read calls to
VectorReader(s).Constructor and Description |
---|
ColumnarBatchReader(java.util.List<VectorizedReader<?>> readers) |
Modifier and Type | Method and Description |
---|---|
void |
close()
Release any resources allocated.
|
org.apache.spark.sql.vectorized.ColumnarBatch |
read(org.apache.spark.sql.vectorized.ColumnarBatch reuse,
int numRowsToRead)
Reads a batch of type @param <T> and of size numRows
|
void |
setBatchSize(int batchSize) |
void |
setRowGroupInfo(org.apache.parquet.column.page.PageReadStore pageStore,
java.util.Map<org.apache.parquet.hadoop.metadata.ColumnPath,org.apache.parquet.hadoop.metadata.ColumnChunkMetaData> metaData,
long rowPosition)
Sets the row group information to be used with this reader
|
public ColumnarBatchReader(java.util.List<VectorizedReader<?>> readers)
public final void setRowGroupInfo(org.apache.parquet.column.page.PageReadStore pageStore, java.util.Map<org.apache.parquet.hadoop.metadata.ColumnPath,org.apache.parquet.hadoop.metadata.ColumnChunkMetaData> metaData, long rowPosition)
VectorizedReader
setRowGroupInfo
in interface VectorizedReader<org.apache.spark.sql.vectorized.ColumnarBatch>
pageStore
- row group information for all the columnsmetaData
- map of ColumnPath
-> ColumnChunkMetaData
for the row grouprowPosition
- the row group's row offset in the parquet filepublic final org.apache.spark.sql.vectorized.ColumnarBatch read(org.apache.spark.sql.vectorized.ColumnarBatch reuse, int numRowsToRead)
VectorizedReader
read
in interface VectorizedReader<org.apache.spark.sql.vectorized.ColumnarBatch>
reuse
- container for the last batch to be reused for next batchnumRowsToRead
- number of rows to readpublic void close()
VectorizedReader
close
in interface VectorizedReader<org.apache.spark.sql.vectorized.ColumnarBatch>
public void setBatchSize(int batchSize)
setBatchSize
in interface VectorizedReader<org.apache.spark.sql.vectorized.ColumnarBatch>