public class ColumnarBatchReader extends BaseBatchReader<org.apache.spark.sql.vectorized.ColumnarBatch>
VectorizedReader
that returns Spark's ColumnarBatch
to support Spark's vectorized read path. The
ColumnarBatch
returned is created by passing in the Arrow vectors populated via delegated read calls to
VectorReader(s).readers, vectorHolders
Constructor and Description |
---|
ColumnarBatchReader(java.util.List<VectorizedReader<?>> readers) |
Modifier and Type | Method and Description |
---|---|
org.apache.spark.sql.vectorized.ColumnarBatch |
read(org.apache.spark.sql.vectorized.ColumnarBatch reuse,
int numRowsToRead)
Reads a batch of type @param <T> and of size numRows
|
void |
setDeleteFilter(DeleteFilter<org.apache.spark.sql.catalyst.InternalRow> deleteFilter) |
void |
setRowGroupInfo(org.apache.parquet.column.page.PageReadStore pageStore,
java.util.Map<org.apache.parquet.hadoop.metadata.ColumnPath,org.apache.parquet.hadoop.metadata.ColumnChunkMetaData> metaData,
long rowPosition)
Sets the row group information to be used with this reader
|
close, closeVectors, setBatchSize
public ColumnarBatchReader(java.util.List<VectorizedReader<?>> readers)
public void setRowGroupInfo(org.apache.parquet.column.page.PageReadStore pageStore, java.util.Map<org.apache.parquet.hadoop.metadata.ColumnPath,org.apache.parquet.hadoop.metadata.ColumnChunkMetaData> metaData, long rowPosition)
VectorizedReader
setRowGroupInfo
in interface VectorizedReader<org.apache.spark.sql.vectorized.ColumnarBatch>
setRowGroupInfo
in class BaseBatchReader<org.apache.spark.sql.vectorized.ColumnarBatch>
pageStore
- row group information for all the columnsmetaData
- map of ColumnPath
-> ColumnChunkMetaData
for the row grouprowPosition
- the row group's row offset in the parquet filepublic void setDeleteFilter(DeleteFilter<org.apache.spark.sql.catalyst.InternalRow> deleteFilter)
public final org.apache.spark.sql.vectorized.ColumnarBatch read(org.apache.spark.sql.vectorized.ColumnarBatch reuse, int numRowsToRead)
VectorizedReader
reuse
- container for the last batch to be reused for next batchnumRowsToRead
- number of rows to read