public class ColumnarBatchReader extends java.lang.Object implements VectorizedReader<org.apache.spark.sql.vectorized.ColumnarBatch>
VectorizedReader that returns Spark's ColumnarBatch to support Spark's vectorized read path. The
ColumnarBatch returned is created by passing in the Arrow vectors populated via delegated read calls to
VectorReader(s).| Constructor and Description |
|---|
ColumnarBatchReader(java.util.List<VectorizedReader<?>> readers) |
| Modifier and Type | Method and Description |
|---|---|
void |
close()
Release any resources allocated.
|
org.apache.spark.sql.vectorized.ColumnarBatch |
read(org.apache.spark.sql.vectorized.ColumnarBatch reuse,
int numRowsToRead)
Reads a batch of type @param <T> and of size numRows
|
void |
setBatchSize(int batchSize) |
void |
setRowGroupInfo(org.apache.parquet.column.page.PageReadStore pageStore,
java.util.Map<org.apache.parquet.hadoop.metadata.ColumnPath,org.apache.parquet.hadoop.metadata.ColumnChunkMetaData> metaData,
long rowPosition)
Sets the row group information to be used with this reader
|
public ColumnarBatchReader(java.util.List<VectorizedReader<?>> readers)
public final void setRowGroupInfo(org.apache.parquet.column.page.PageReadStore pageStore,
java.util.Map<org.apache.parquet.hadoop.metadata.ColumnPath,org.apache.parquet.hadoop.metadata.ColumnChunkMetaData> metaData,
long rowPosition)
VectorizedReadersetRowGroupInfo in interface VectorizedReader<org.apache.spark.sql.vectorized.ColumnarBatch>pageStore - row group information for all the columnsmetaData - map of ColumnPath -> ColumnChunkMetaData for the row grouprowPosition - the row group's row offset in the parquet filepublic final org.apache.spark.sql.vectorized.ColumnarBatch read(org.apache.spark.sql.vectorized.ColumnarBatch reuse,
int numRowsToRead)
VectorizedReaderread in interface VectorizedReader<org.apache.spark.sql.vectorized.ColumnarBatch>reuse - container for the last batch to be reused for next batchnumRowsToRead - number of rows to readpublic void close()
VectorizedReaderclose in interface VectorizedReader<org.apache.spark.sql.vectorized.ColumnarBatch>public void setBatchSize(int batchSize)
setBatchSize in interface VectorizedReader<org.apache.spark.sql.vectorized.ColumnarBatch>