public class ColumnarBatchReader extends BaseBatchReader<org.apache.spark.sql.vectorized.ColumnarBatch>
VectorizedReader that returns Spark's ColumnarBatch to support Spark's vectorized
read path. The ColumnarBatch returned is created by passing in the Arrow vectors
populated via delegated read calls to VectorReader(s).readers, vectorHolders| Constructor and Description |
|---|
ColumnarBatchReader(java.util.List<VectorizedReader<?>> readers) |
| Modifier and Type | Method and Description |
|---|---|
org.apache.spark.sql.vectorized.ColumnarBatch |
read(org.apache.spark.sql.vectorized.ColumnarBatch reuse,
int numRowsToRead)
Reads a batch of type @param <T> and of size numRows
|
void |
setDeleteFilter(DeleteFilter<org.apache.spark.sql.catalyst.InternalRow> deleteFilter) |
void |
setRowGroupInfo(org.apache.parquet.column.page.PageReadStore pageStore,
java.util.Map<org.apache.parquet.hadoop.metadata.ColumnPath,org.apache.parquet.hadoop.metadata.ColumnChunkMetaData> metaData,
long rowPosition)
Sets the row group information to be used with this reader
|
close, closeVectors, setBatchSizepublic ColumnarBatchReader(java.util.List<VectorizedReader<?>> readers)
public void setRowGroupInfo(org.apache.parquet.column.page.PageReadStore pageStore,
java.util.Map<org.apache.parquet.hadoop.metadata.ColumnPath,org.apache.parquet.hadoop.metadata.ColumnChunkMetaData> metaData,
long rowPosition)
VectorizedReadersetRowGroupInfo in interface VectorizedReader<org.apache.spark.sql.vectorized.ColumnarBatch>setRowGroupInfo in class BaseBatchReader<org.apache.spark.sql.vectorized.ColumnarBatch>pageStore - row group information for all the columnsmetaData - map of ColumnPath -> ColumnChunkMetaData for the row grouprowPosition - the row group's row offset in the parquet filepublic void setDeleteFilter(DeleteFilter<org.apache.spark.sql.catalyst.InternalRow> deleteFilter)
public final org.apache.spark.sql.vectorized.ColumnarBatch read(org.apache.spark.sql.vectorized.ColumnarBatch reuse,
int numRowsToRead)
VectorizedReaderreuse - container for the last batch to be reused for next batchnumRowsToRead - number of rows to read