public class ArrowReader extends CloseableGroup
ColumnarBatch
.
See open(CloseableIterable)
()} to learn about the
behavior of the iterator.
The following Iceberg data types are supported and have been tested:
Types.BooleanType
, Arrow: Types.MinorType.BIT
Types.IntegerType
, Arrow: Types.MinorType.INT
Types.LongType
, Arrow: Types.MinorType.BIGINT
Types.FloatType
, Arrow: Types.MinorType.FLOAT4
Types.DoubleType
, Arrow: Types.MinorType.FLOAT8
Types.StringType
, Arrow: Types.MinorType.VARCHAR
Types.TimestampType
(both with and without timezone),
Arrow: Types.MinorType.TIMEMICRO
Types.BinaryType
, Arrow: Types.MinorType.VARBINARY
Types.DateType
, Arrow: Types.MinorType.DATEDAY
Types.TimeType
, Arrow: Types.MinorType.TIMEMICRO
Types.UUIDType
, Arrow: Types.MinorType.FIXEDSIZEBINARY
(16)Features that don't work in this implementation:
Types.ListType
, Types.MapType
,
Types.StructType
, Types.FixedType
and
Types.DecimalType
See https://github.com/apache/iceberg/issues/2485 and https://github.com/apache/iceberg/issues/2486.Constructor and Description |
---|
ArrowReader(TableScan scan,
int batchSize,
boolean reuseContainers)
Create a new instance of the reader.
|
Modifier and Type | Method and Description |
---|---|
void |
close()
Close all the registered resources.
|
CloseableIterator<ColumnarBatch> |
open(CloseableIterable<CombinedScanTask> tasks)
Returns a new iterator of
ColumnarBatch objects. |
addCloseable, addCloseable, setSuppressCloseFailure
public ArrowReader(TableScan scan, int batchSize, boolean reuseContainers)
scan
- the table scan object.batchSize
- the maximum number of rows per Arrow batch.reuseContainers
- whether to reuse Arrow vectors when iterating through the data.
If set to false
, every Iterator.next()
call creates
new instances of Arrow vectors.
If set to true
, the Arrow vectors in the previous
Iterator.next()
may be reused for the data returned
in the current Iterator.next()
.
This option avoids allocating memory again and again.
Irrespective of the value of reuseContainers
, the Arrow vectors
in the previous Iterator.next()
call are closed before creating
new instances if the current Iterator.next()
.public CloseableIterator<ColumnarBatch> open(CloseableIterable<CombinedScanTask> tasks)
ColumnarBatch
objects.
Note that the reader owns the ColumnarBatch
objects and takes care of closing them.
The caller should not hold onto a ColumnarBatch
or try to close them.
If reuseContainers
is false
, the Arrow vectors in the
previous ColumnarBatch
are closed before returning the next ColumnarBatch
object.
This implies that the caller should either use the ColumnarBatch
or transfer the ownership of
ColumnarBatch
before getting the next ColumnarBatch
.
If reuseContainers
is true
, the Arrow vectors in the
previous ColumnarBatch
may be reused for the next ColumnarBatch
.
This implies that the caller should either use the ColumnarBatch
or deep copy the
ColumnarBatch
before getting the next ColumnarBatch
.
This method works for only when the following conditions are true:
SUPPORTED_TYPES
).UnsupportedOperationException
is thrown.public void close() throws java.io.IOException
CloseableGroup
close
in interface java.io.Closeable
close
in interface java.lang.AutoCloseable
close
in class CloseableGroup
java.io.IOException