public class ArrowReader extends CloseableGroup
ColumnarBatch.
See open(CloseableIterable) ()} to learn about the
behavior of the iterator.
The following Iceberg data types are supported and have been tested:
Types.BooleanType, Arrow: Types.MinorType.BITTypes.IntegerType, Arrow: Types.MinorType.INTTypes.LongType, Arrow: Types.MinorType.BIGINTTypes.FloatType, Arrow: Types.MinorType.FLOAT4Types.DoubleType, Arrow: Types.MinorType.FLOAT8Types.StringType, Arrow: Types.MinorType.VARCHARTypes.TimestampType (both with and without timezone),
Arrow: Types.MinorType.TIMEMICROTypes.BinaryType, Arrow: Types.MinorType.VARBINARYTypes.DateType, Arrow: Types.MinorType.DATEDAYTypes.TimeType, Arrow: Types.MinorType.TIMEMICROTypes.UUIDType, Arrow: Types.MinorType.FIXEDSIZEBINARY(16)Features that don't work in this implementation:
Types.ListType, Types.MapType,
Types.StructType, Types.FixedType and
Types.DecimalType
See https://github.com/apache/iceberg/issues/2485 and https://github.com/apache/iceberg/issues/2486.| Constructor and Description |
|---|
ArrowReader(TableScan scan,
int batchSize,
boolean reuseContainers)
Create a new instance of the reader.
|
| Modifier and Type | Method and Description |
|---|---|
void |
close()
Close all the registered resources.
|
CloseableIterator<ColumnarBatch> |
open(CloseableIterable<CombinedScanTask> tasks)
Returns a new iterator of
ColumnarBatch objects. |
addCloseable, addCloseable, setSuppressCloseFailurepublic ArrowReader(TableScan scan, int batchSize, boolean reuseContainers)
scan - the table scan object.batchSize - the maximum number of rows per Arrow batch.reuseContainers - whether to reuse Arrow vectors when iterating through the data.
If set to false, every Iterator.next() call creates
new instances of Arrow vectors.
If set to true, the Arrow vectors in the previous
Iterator.next() may be reused for the data returned
in the current Iterator.next().
This option avoids allocating memory again and again.
Irrespective of the value of reuseContainers, the Arrow vectors
in the previous Iterator.next() call are closed before creating
new instances if the current Iterator.next().public CloseableIterator<ColumnarBatch> open(CloseableIterable<CombinedScanTask> tasks)
ColumnarBatch objects.
Note that the reader owns the ColumnarBatch objects and takes care of closing them.
The caller should not hold onto a ColumnarBatch or try to close them.
If reuseContainers is false, the Arrow vectors in the
previous ColumnarBatch are closed before returning the next ColumnarBatch object.
This implies that the caller should either use the ColumnarBatch or transfer the ownership of
ColumnarBatch before getting the next ColumnarBatch.
If reuseContainers is true, the Arrow vectors in the
previous ColumnarBatch may be reused for the next ColumnarBatch.
This implies that the caller should either use the ColumnarBatch or deep copy the
ColumnarBatch before getting the next ColumnarBatch.
This method works for only when the following conditions are true:
SUPPORTED_TYPES).UnsupportedOperationException is thrown.public void close()
throws java.io.IOException
CloseableGroupclose in interface java.io.Closeableclose in interface java.lang.AutoCloseableclose in class CloseableGroupjava.io.IOException