Class SparkOrcReader

  • All Implemented Interfaces:
    OrcRowReader<org.apache.spark.sql.catalyst.InternalRow>

    public class SparkOrcReader
    extends java.lang.Object
    implements OrcRowReader<org.apache.spark.sql.catalyst.InternalRow>
    Converts the OrcIterator, which returns ORC's VectorizedRowBatch to a set of Spark's UnsafeRows. It minimizes allocations by reusing most of the objects in the implementation.
    • Constructor Summary

      Constructors 
      Constructor Description
      SparkOrcReader​(Schema expectedSchema, org.apache.orc.TypeDescription readSchema)  
      SparkOrcReader​(Schema expectedSchema, org.apache.orc.TypeDescription readOrcSchema, java.util.Map<java.lang.Integer,​?> idToConstant)  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      org.apache.spark.sql.catalyst.InternalRow read​(org.apache.orc.storage.ql.exec.vector.VectorizedRowBatch batch, int row)
      Reads a row.
      void setBatchContext​(long batchOffsetInFile)  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • SparkOrcReader

        public SparkOrcReader​(Schema expectedSchema,
                              org.apache.orc.TypeDescription readSchema)
      • SparkOrcReader

        public SparkOrcReader​(Schema expectedSchema,
                              org.apache.orc.TypeDescription readOrcSchema,
                              java.util.Map<java.lang.Integer,​?> idToConstant)
    • Method Detail

      • read

        public org.apache.spark.sql.catalyst.InternalRow read​(org.apache.orc.storage.ql.exec.vector.VectorizedRowBatch batch,
                                                              int row)
        Description copied from interface: OrcRowReader
        Reads a row.
        Specified by:
        read in interface OrcRowReader<org.apache.spark.sql.catalyst.InternalRow>
      • setBatchContext

        public void setBatchContext​(long batchOffsetInFile)
        Specified by:
        setBatchContext in interface OrcRowReader<org.apache.spark.sql.catalyst.InternalRow>