Class SparkOrcReader

java.lang.Object
org.apache.iceberg.spark.data.SparkOrcReader
All Implemented Interfaces:
OrcRowReader<org.apache.spark.sql.catalyst.InternalRow>

public class SparkOrcReader extends Object implements OrcRowReader<org.apache.spark.sql.catalyst.InternalRow>
Converts the OrcIterator, which returns ORC's VectorizedRowBatch to a set of Spark's UnsafeRows.

It minimizes allocations by reusing most of the objects in the implementation.

  • Constructor Details

    • SparkOrcReader

      public SparkOrcReader(Schema expectedSchema, org.apache.orc.TypeDescription readSchema)
    • SparkOrcReader

      public SparkOrcReader(Schema expectedSchema, org.apache.orc.TypeDescription readOrcSchema, Map<Integer,?> idToConstant)
  • Method Details

    • read

      public org.apache.spark.sql.catalyst.InternalRow read(org.apache.orc.storage.ql.exec.vector.VectorizedRowBatch batch, int row)
      Description copied from interface: OrcRowReader
      Reads a row.
      Specified by:
      read in interface OrcRowReader<org.apache.spark.sql.catalyst.InternalRow>
    • setBatchContext

      public void setBatchContext(long batchOffsetInFile)
      Specified by:
      setBatchContext in interface OrcRowReader<org.apache.spark.sql.catalyst.InternalRow>