public class ArrowReader extends CloseableGroup
ColumnarBatch. See open(CloseableIterable) ()} to learn about the behavior of the iterator.
The following Iceberg data types are supported and have been tested:
Types.BooleanType, Arrow: Types.MinorType.BIT
Types.IntegerType, Arrow: Types.MinorType.INT
Types.LongType, Arrow: Types.MinorType.BIGINT
Types.FloatType, Arrow: Types.MinorType.FLOAT4
Types.DoubleType, Arrow: Types.MinorType.FLOAT8
Types.StringType, Arrow: Types.MinorType.VARCHAR
Types.TimestampType (both with and without timezone), Arrow: Types.MinorType.TIMEMICRO
Types.BinaryType, Arrow: Types.MinorType.VARBINARY
Types.DateType, Arrow: Types.MinorType.DATEDAY
Types.TimeType, Arrow: Types.MinorType.TIMEMICRO
Types.UUIDType, Arrow: Types.MinorType.FIXEDSIZEBINARY(16)
Features that don't work in this implementation:
Types.ListType, Types.MapType, Types.StructType, Types.FixedType and Types.DecimalType See
https://github.com/apache/iceberg/issues/2485 and
https://github.com/apache/iceberg/issues/2486.
| Constructor and Description |
|---|
ArrowReader(TableScan scan,
int batchSize,
boolean reuseContainers)
Create a new instance of the reader.
|
| Modifier and Type | Method and Description |
|---|---|
void |
close()
Close all the registered resources.
|
CloseableIterator<ColumnarBatch> |
open(CloseableIterable<CombinedScanTask> tasks)
Returns a new iterator of
ColumnarBatch objects. |
addCloseable, addCloseable, setSuppressCloseFailurepublic ArrowReader(TableScan scan, int batchSize, boolean reuseContainers)
scan - the table scan object.batchSize - the maximum number of rows per Arrow batch.reuseContainers - whether to reuse Arrow vectors when iterating through the data. If set
to false, every Iterator.next() call creates new instances of Arrow
vectors. If set to true, the Arrow vectors in the previous Iterator.next()
may be reused for the data returned in the current Iterator.next(). This option
avoids allocating memory again and again. Irrespective of the value of reuseContainers, the Arrow vectors in the previous Iterator.next() call are closed
before creating new instances if the current Iterator.next().public CloseableIterator<ColumnarBatch> open(CloseableIterable<CombinedScanTask> tasks)
ColumnarBatch objects.
Note that the reader owns the ColumnarBatch objects and takes care of closing them.
The caller should not hold onto a ColumnarBatch or try to close them.
If reuseContainers is false, the Arrow vectors in the previous ColumnarBatch are closed before returning the next ColumnarBatch object. This implies
that the caller should either use the ColumnarBatch or transfer the ownership of ColumnarBatch before getting the next ColumnarBatch.
If reuseContainers is true, the Arrow vectors in the previous ColumnarBatch may be reused for the next ColumnarBatch. This implies that the caller
should either use the ColumnarBatch or deep copy the ColumnarBatch before
getting the next ColumnarBatch.
This method works for only when the following conditions are true:
SUPPORTED_TYPES).
UnsupportedOperationException is thrown.public void close()
throws java.io.IOException
CloseableGroupclose in interface java.io.Closeableclose in interface java.lang.AutoCloseableclose in class CloseableGroupjava.io.IOException