Interface ReadBuilder<D,S>

Type Parameters:
D - the output data type produced by the reader
S - the type of the schema for the output data type

public interface ReadBuilder<D,S>
Builder interface for creating file readers across supported data file formats. The FormatModel implementations provides appropriate ReadBuilder instances

The ReadBuilder follows the builder pattern to configure and create CloseableIterable instances that read data from source files. Configuration options include schema projection, predicate filtering, record batching, and encryption settings.

This interface is directly exposed to users for parameterizing readers.

  • Method Details

    • split

      ReadBuilder<D,S> split(long start, long length)
      Restricts the read to the given range: [start, start + length).
      Parameters:
      start - the start position for this read
      length - the length of the range this read should scan
    • project

      ReadBuilder<D,S> project(Schema schema)
      Set the projection schema. This must be set before the reader is instantiated.
    • engineProjection

      ReadBuilder<D,S> engineProjection(S schema)
      Sets the engine's representation of the projected schema.

      When provided, this schema should be consistent with the requested Iceberg projection, while allowing representation differences. Examples include:

      • using a long to represent an Iceberg int column,
      • requesting a shredded representation for a variant type, or
      • selecting specific concrete classes for Iceberg structs.
    • caseSensitive

      ReadBuilder<D,S> caseSensitive(boolean caseSensitive)
      Configures whether filtering should be case-sensitive. If the reader supports filtering, it must respect this setting. The default value is true.
      Parameters:
      caseSensitive - indicates if filtering is case-sensitive
    • filter

      ReadBuilder<D,S> filter(Expression filter)
      Pushes down the Expression filter for the reader to prevent reading unnecessary records. Some readers may not support filtering, or may only support filtering for certain expressions. In this case the reader might return unfiltered or partially filtered rows. It is the caller's responsibility to apply the filter again.
      Parameters:
      filter - the filter to set
    • set

      ReadBuilder<D,S> set(String key, String value)
      Set a reader configuration property which affects the reader behavior. Reader builders should ignore configuration keys not known for them.
      Parameters:
      key - a reader config property name
      value - config value
      Returns:
      this for method chaining
    • setAll

      default ReadBuilder<D,S> setAll(Map<String,String> properties)
      Sets multiple reader configuration properties that affect the reader behavior. Reader builders should ignore configuration keys not known for them.
      Parameters:
      properties - reader config properties to set
      Returns:
      this for method chaining
    • reuseContainers

      ReadBuilder<D,S> reuseContainers()
      Enables reusing the containers returned by the reader. Decreases pressure on GC.
    • recordsPerBatch

      ReadBuilder<D,S> recordsPerBatch(int rowsPerBatch)
      Sets the batch size for vectorized readers.
    • idToConstant

      ReadBuilder<D,S> idToConstant(Map<Integer,?> idToConstant)
      Contains the values in the result objects which are coming from metadata and not coming from the data files themselves. The keys of the map are the column ids, the values are the constant values to be used in the result.
    • withNameMapping

      ReadBuilder<D,S> withNameMapping(NameMapping nameMapping)
      Sets a mapping from external schema names to Iceberg type IDs.
    • build

      Builds the reader.