Class ArrowReader
- java.lang.Object
- 
- org.apache.iceberg.io.CloseableGroup
- 
- org.apache.iceberg.arrow.vectorized.ArrowReader
 
 
- 
- All Implemented Interfaces:
- java.io.Closeable,- java.lang.AutoCloseable
 
 public class ArrowReader extends CloseableGroup Vectorized reader that returns an iterator ofColumnarBatch. Seeopen(CloseableIterable)()} to learn about the behavior of the iterator.The following Iceberg data types are supported and have been tested: - Iceberg: Types.BooleanType, Arrow:Types.MinorType.BIT
- Iceberg: Types.IntegerType, Arrow:Types.MinorType.INT
- Iceberg: Types.LongType, Arrow:Types.MinorType.BIGINT
- Iceberg: Types.FloatType, Arrow:Types.MinorType.FLOAT4
- Iceberg: Types.DoubleType, Arrow:Types.MinorType.FLOAT8
- Iceberg: Types.StringType, Arrow:Types.MinorType.VARCHAR
- Iceberg: Types.TimestampType(both with and without timezone), Arrow:Types.MinorType.TIMEMICRO
- Iceberg: Types.BinaryType, Arrow:Types.MinorType.VARBINARY
- Iceberg: Types.DateType, Arrow:Types.MinorType.DATEDAY
- Iceberg: Types.TimeType, Arrow:Types.MinorType.TIMEMICRO
- Iceberg: Types.UUIDType, Arrow:Types.MinorType.FIXEDSIZEBINARY(16)
 Features that don't work in this implementation: - Type promotion: In case of type promotion, the Arrow vector corresponding to the data type in the parquet file is returned instead of the data type in the latest schema. See https://github.com/apache/iceberg/issues/2483.
- Columns with constant values are physically encoded as a dictionary. The Arrow vector type is int32 instead of the type as per the schema. See https://github.com/apache/iceberg/issues/2484.
- Data types: Types.ListType,Types.MapType,Types.StructType,Types.FixedTypeandTypes.DecimalTypeSee https://github.com/apache/iceberg/issues/2485 and https://github.com/apache/iceberg/issues/2486.
- Delete files are not supported. See https://github.com/apache/iceberg/issues/2487.
 
- 
- 
Constructor SummaryConstructors Constructor Description ArrowReader(TableScan scan, int batchSize, boolean reuseContainers)Create a new instance of the reader.
 - 
Method SummaryAll Methods Instance Methods Concrete Methods Modifier and Type Method Description voidclose()Close all the registered resources.CloseableIterator<ColumnarBatch>open(CloseableIterable<CombinedScanTask> tasks)Returns a new iterator ofColumnarBatchobjects.- 
Methods inherited from class org.apache.iceberg.io.CloseableGroupaddCloseable, addCloseable, setSuppressCloseFailure
 
- 
 
- 
- 
- 
Constructor Detail- 
ArrowReaderpublic ArrowReader(TableScan scan, int batchSize, boolean reuseContainers) Create a new instance of the reader.- Parameters:
- scan- the table scan object.
- batchSize- the maximum number of rows per Arrow batch.
- reuseContainers- whether to reuse Arrow vectors when iterating through the data. If set to- false, every- Iterator.next()call creates new instances of Arrow vectors. If set to- true, the Arrow vectors in the previous- Iterator.next()may be reused for the data returned in the current- Iterator.next(). This option avoids allocating memory again and again. Irrespective of the value of- reuseContainers, the Arrow vectors in the previous- Iterator.next()call are closed before creating new instances if the current- Iterator.next().
 
 
- 
 - 
Method Detail- 
openpublic CloseableIterator<ColumnarBatch> open(CloseableIterable<CombinedScanTask> tasks) Returns a new iterator ofColumnarBatchobjects.Note that the reader owns the ColumnarBatchobjects and takes care of closing them. The caller should not hold onto aColumnarBatchor try to close them.If reuseContainersisfalse, the Arrow vectors in the previousColumnarBatchare closed before returning the nextColumnarBatchobject. This implies that the caller should either use theColumnarBatchor transfer the ownership ofColumnarBatchbefore getting the nextColumnarBatch.If reuseContainersistrue, the Arrow vectors in the previousColumnarBatchmay be reused for the nextColumnarBatch. This implies that the caller should either use theColumnarBatchor deep copy theColumnarBatchbefore getting the nextColumnarBatch.This method works for only when the following conditions are true: - At least one column is queried,
- There are no delete files, and
- Supported data types are queried (see SUPPORTED_TYPES).
 UnsupportedOperationExceptionis thrown.
 - 
closepublic void close() throws java.io.IOExceptionDescription copied from class:CloseableGroupClose all the registered resources. Close method of each resource will only be called once. Checked exception from AutoCloseable will be wrapped to runtime exception.- Specified by:
- closein interface- java.lang.AutoCloseable
- Specified by:
- closein interface- java.io.Closeable
- Overrides:
- closein class- CloseableGroup
- Throws:
- java.io.IOException
 
 
- 
 
-