Class ColumnarBatchReader

  • All Implemented Interfaces:
    VectorizedReader<org.apache.spark.sql.vectorized.ColumnarBatch>

    public class ColumnarBatchReader
    extends BaseBatchReader<org.apache.spark.sql.vectorized.ColumnarBatch>
    VectorizedReader that returns Spark's ColumnarBatch to support Spark's vectorized read path. The ColumnarBatch returned is created by passing in the Arrow vectors populated via delegated read calls to VectorReader(s).
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      org.apache.spark.sql.vectorized.ColumnarBatch read​(org.apache.spark.sql.vectorized.ColumnarBatch reuse, int numRowsToRead)
      Reads a batch of type @param <T> and of size numRows
      void setDeleteFilter​(DeleteFilter<org.apache.spark.sql.catalyst.InternalRow> deleteFilter)  
      void setRowGroupInfo​(org.apache.parquet.column.page.PageReadStore pageStore, java.util.Map<org.apache.parquet.hadoop.metadata.ColumnPath,​org.apache.parquet.hadoop.metadata.ColumnChunkMetaData> metaData, long rowPosition)
      Sets the row group information to be used with this reader
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • ColumnarBatchReader

        public ColumnarBatchReader​(java.util.List<VectorizedReader<?>> readers)
    • Method Detail

      • setRowGroupInfo

        public void setRowGroupInfo​(org.apache.parquet.column.page.PageReadStore pageStore,
                                    java.util.Map<org.apache.parquet.hadoop.metadata.ColumnPath,​org.apache.parquet.hadoop.metadata.ColumnChunkMetaData> metaData,
                                    long rowPosition)
        Description copied from interface: VectorizedReader
        Sets the row group information to be used with this reader
        Specified by:
        setRowGroupInfo in interface VectorizedReader<org.apache.spark.sql.vectorized.ColumnarBatch>
        Overrides:
        setRowGroupInfo in class BaseBatchReader<org.apache.spark.sql.vectorized.ColumnarBatch>
        Parameters:
        pageStore - row group information for all the columns
        metaData - map of ColumnPath -> ColumnChunkMetaData for the row group
        rowPosition - the row group's row offset in the parquet file
      • setDeleteFilter

        public void setDeleteFilter​(DeleteFilter<org.apache.spark.sql.catalyst.InternalRow> deleteFilter)
      • read

        public final org.apache.spark.sql.vectorized.ColumnarBatch read​(org.apache.spark.sql.vectorized.ColumnarBatch reuse,
                                                                        int numRowsToRead)
        Description copied from interface: VectorizedReader
        Reads a batch of type @param <T> and of size numRows
        Parameters:
        reuse - container for the last batch to be reused for next batch
        numRowsToRead - number of rows to read
        Returns:
        batch of records of type @param <T>