Class ColumnarBatchReader
- java.lang.Object
-
- org.apache.iceberg.spark.data.vectorized.ColumnarBatchReader
-
- All Implemented Interfaces:
VectorizedReader<org.apache.spark.sql.vectorized.ColumnarBatch>
public class ColumnarBatchReader extends java.lang.Object implements VectorizedReader<org.apache.spark.sql.vectorized.ColumnarBatch>
VectorizedReader
that returns Spark'sColumnarBatch
to support Spark's vectorized read path. TheColumnarBatch
returned is created by passing in the Arrow vectors populated via delegated read calls to VectorReader(s).
-
-
Constructor Summary
Constructors Constructor Description ColumnarBatchReader(java.util.List<VectorizedReader<?>> readers)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
close()
Release any resources allocated.org.apache.spark.sql.vectorized.ColumnarBatch
read(org.apache.spark.sql.vectorized.ColumnarBatch reuse, int numRowsToRead)
Reads a batch of type @param <T> and of size numRowsvoid
setBatchSize(int batchSize)
void
setRowGroupInfo(org.apache.parquet.column.page.PageReadStore pageStore, java.util.Map<org.apache.parquet.hadoop.metadata.ColumnPath,org.apache.parquet.hadoop.metadata.ColumnChunkMetaData> metaData)
-
-
-
Constructor Detail
-
ColumnarBatchReader
public ColumnarBatchReader(java.util.List<VectorizedReader<?>> readers)
-
-
Method Detail
-
setRowGroupInfo
public final void setRowGroupInfo(org.apache.parquet.column.page.PageReadStore pageStore, java.util.Map<org.apache.parquet.hadoop.metadata.ColumnPath,org.apache.parquet.hadoop.metadata.ColumnChunkMetaData> metaData)
- Specified by:
setRowGroupInfo
in interfaceVectorizedReader<org.apache.spark.sql.vectorized.ColumnarBatch>
- Parameters:
pageStore
- row group information for all the columnsmetaData
- map ofColumnPath
->ColumnChunkMetaData
for the row group
-
read
public final org.apache.spark.sql.vectorized.ColumnarBatch read(org.apache.spark.sql.vectorized.ColumnarBatch reuse, int numRowsToRead)
Description copied from interface:VectorizedReader
Reads a batch of type @param <T> and of size numRows- Specified by:
read
in interfaceVectorizedReader<org.apache.spark.sql.vectorized.ColumnarBatch>
- Parameters:
reuse
- container for the last batch to be reused for next batchnumRowsToRead
- number of rows to read- Returns:
- batch of records of type @param <T>
-
close
public void close()
Description copied from interface:VectorizedReader
Release any resources allocated.- Specified by:
close
in interfaceVectorizedReader<org.apache.spark.sql.vectorized.ColumnarBatch>
-
setBatchSize
public void setBatchSize(int batchSize)
- Specified by:
setBatchSize
in interfaceVectorizedReader<org.apache.spark.sql.vectorized.ColumnarBatch>
-
-