Package org.apache.iceberg.formats
Interface ReadBuilder<D,S>
- Type Parameters:
D- the output data type produced by the readerS- the type of the schema for the output data type
public interface ReadBuilder<D,S>
Builder interface for creating file readers across supported data file formats. The
FormatModel implementations provides appropriate ReadBuilder instances
The ReadBuilder follows the builder pattern to configure and create CloseableIterable instances that read data from source files. Configuration options include
schema projection, predicate filtering, record batching, and encryption settings.
This interface is directly exposed to users for parameterizing readers.
-
Method Summary
Modifier and TypeMethodDescriptionbuild()Builds the reader.caseSensitive(boolean caseSensitive) Configures whether filtering should be case-sensitive.engineProjection(S schema) Sets the engine's representation of the projected schema.filter(Expression filter) Pushes down theExpressionfilter for the reader to prevent reading unnecessary records.idToConstant(Map<Integer, ?> idToConstant) Contains the values in the result objects which are coming from metadata and not coming from the data files themselves.Set the projection schema.recordsPerBatch(int rowsPerBatch) Sets the batch size for vectorized readers.Enables reusing the containers returned by the reader.Set a reader configuration property which affects the reader behavior.default ReadBuilder<D, S> Sets multiple reader configuration properties that affect the reader behavior.split(long start, long length) Restricts the read to the given range: [start, start + length).withNameMapping(NameMapping nameMapping) Sets a mapping from external schema names to Iceberg type IDs.
-
Method Details
-
split
Restricts the read to the given range: [start, start + length).- Parameters:
start- the start position for this readlength- the length of the range this read should scan
-
project
Set the projection schema. This must be set before the reader is instantiated. -
engineProjection
Sets the engine's representation of the projected schema.When provided, this schema should be consistent with the requested Iceberg projection, while allowing representation differences. Examples include:
- using a
longto represent an Icebergintcolumn, - requesting a shredded representation for a variant type, or
- selecting specific concrete classes for Iceberg structs.
- using a
-
caseSensitive
Configures whether filtering should be case-sensitive. If the reader supports filtering, it must respect this setting. The default value istrue.- Parameters:
caseSensitive- indicates if filtering is case-sensitive
-
filter
Pushes down theExpressionfilter for the reader to prevent reading unnecessary records. Some readers may not support filtering, or may only support filtering for certain expressions. In this case the reader might return unfiltered or partially filtered rows. It is the caller's responsibility to apply the filter again.- Parameters:
filter- the filter to set
-
set
Set a reader configuration property which affects the reader behavior. Reader builders should ignore configuration keys not known for them.- Parameters:
key- a reader config property namevalue- config value- Returns:
- this for method chaining
-
setAll
Sets multiple reader configuration properties that affect the reader behavior. Reader builders should ignore configuration keys not known for them.- Parameters:
properties- reader config properties to set- Returns:
- this for method chaining
-
reuseContainers
ReadBuilder<D,S> reuseContainers()Enables reusing the containers returned by the reader. Decreases pressure on GC. -
recordsPerBatch
Sets the batch size for vectorized readers. -
idToConstant
Contains the values in the result objects which are coming from metadata and not coming from the data files themselves. The keys of the map are the column ids, the values are the constant values to be used in the result. -
withNameMapping
Sets a mapping from external schema names to Iceberg type IDs. -
build
CloseableIterable<D> build()Builds the reader.
-