Class ColumnarBatchReader
java.lang.Object
org.apache.iceberg.arrow.vectorized.BaseBatchReader<org.apache.spark.sql.vectorized.ColumnarBatch>
org.apache.iceberg.spark.data.vectorized.ColumnarBatchReader
- All Implemented Interfaces:
VectorizedReader<org.apache.spark.sql.vectorized.ColumnarBatch>
public class ColumnarBatchReader
extends BaseBatchReader<org.apache.spark.sql.vectorized.ColumnarBatch>
VectorizedReader
that returns Spark's ColumnarBatch
to support Spark's vectorized
read path. The ColumnarBatch
returned is created by passing in the Arrow vectors
populated via delegated read calls to VectorReader(s).-
Field Summary
Fields inherited from class org.apache.iceberg.arrow.vectorized.BaseBatchReader
readers, vectorHolders
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionfinal org.apache.spark.sql.vectorized.ColumnarBatch
read
(org.apache.spark.sql.vectorized.ColumnarBatch reuse, int numRowsToRead) Reads a batch of type @param <T> and of size numRowsvoid
setDeleteFilter
(DeleteFilter<org.apache.spark.sql.catalyst.InternalRow> deleteFilter) void
setRowGroupInfo
(org.apache.parquet.column.page.PageReadStore pageStore, Map<org.apache.parquet.hadoop.metadata.ColumnPath, org.apache.parquet.hadoop.metadata.ColumnChunkMetaData> metaData, long rowPosition) Sets the row group information to be used with this readerMethods inherited from class org.apache.iceberg.arrow.vectorized.BaseBatchReader
close, closeVectors, setBatchSize
-
Constructor Details
-
ColumnarBatchReader
-
-
Method Details
-
setRowGroupInfo
public void setRowGroupInfo(org.apache.parquet.column.page.PageReadStore pageStore, Map<org.apache.parquet.hadoop.metadata.ColumnPath, org.apache.parquet.hadoop.metadata.ColumnChunkMetaData> metaData, long rowPosition) Description copied from interface:VectorizedReader
Sets the row group information to be used with this reader- Specified by:
setRowGroupInfo
in interfaceVectorizedReader<org.apache.spark.sql.vectorized.ColumnarBatch>
- Overrides:
setRowGroupInfo
in classBaseBatchReader<org.apache.spark.sql.vectorized.ColumnarBatch>
- Parameters:
pageStore
- row group information for all the columnsmetaData
- map ofColumnPath
->ColumnChunkMetaData
for the row grouprowPosition
- the row group's row offset in the parquet file
-
setDeleteFilter
-
read
public final org.apache.spark.sql.vectorized.ColumnarBatch read(org.apache.spark.sql.vectorized.ColumnarBatch reuse, int numRowsToRead) Description copied from interface:VectorizedReader
Reads a batch of type @param <T> and of size numRows- Parameters:
reuse
- container for the last batch to be reused for next batchnumRowsToRead
- number of rows to read- Returns:
- batch of records of type @param <T>
-