Class BatchScanAdapter

java.lang.Object
org.apache.iceberg.BatchScanAdapter
All Implemented Interfaces:
BatchScan, Scan<BatchScan,ScanTask,ScanTaskGroup<ScanTask>>

public class BatchScanAdapter extends Object implements BatchScan
An adapter that allows using TableScan as BatchScan.
  • Constructor Details

    • BatchScanAdapter

      public BatchScanAdapter(TableScan scan)
  • Method Details

    • table

      public Table table()
      Description copied from interface: BatchScan
      Returns the Table from which this scan loads data.
      Specified by:
      table in interface BatchScan
      Returns:
      this scan's table
    • fileIO

      public Supplier<FileIO> fileIO()
      Description copied from interface: BatchScan
      Returns the FileIO instance to use when reading data files for this scan.
      Specified by:
      fileIO in interface BatchScan
      Specified by:
      fileIO in interface Scan<BatchScan,ScanTask,ScanTaskGroup<ScanTask>>
    • useSnapshot

      public BatchScan useSnapshot(long snapshotId)
      Description copied from interface: BatchScan
      Create a new BatchScan from this scan's configuration that will use a snapshot with the given ID.
      Specified by:
      useSnapshot in interface BatchScan
      Parameters:
      snapshotId - a snapshot ID
      Returns:
      a new scan based on this with the given snapshot ID
    • useRef

      public BatchScan useRef(String ref)
      Description copied from interface: BatchScan
      Create a new BatchScan from this scan's configuration that will use the given reference.
      Specified by:
      useRef in interface BatchScan
      Parameters:
      ref - a reference
      Returns:
      a new scan based on this with the given reference
    • asOfTime

      public BatchScan asOfTime(long timestampMillis)
      Description copied from interface: BatchScan
      Create a new BatchScan from this scan's configuration that will use the most recent snapshot as of the given time in milliseconds on the branch in the scan or main if no branch is set.
      Specified by:
      asOfTime in interface BatchScan
      Parameters:
      timestampMillis - a timestamp in milliseconds
      Returns:
      a new scan based on this with the current snapshot at the given time
    • snapshot

      public Snapshot snapshot()
      Description copied from interface: BatchScan
      Returns the Snapshot that will be used by this scan.

      If the snapshot was not configured using BatchScan.asOfTime(long) or BatchScan.useSnapshot(long), the current table snapshot will be used.

      Specified by:
      snapshot in interface BatchScan
      Returns:
      the Snapshot this scan will use
    • option

      public BatchScan option(String property, String value)
      Description copied from interface: Scan
      Create a new scan from this scan's configuration that will override the Table's behavior based on the incoming pair. Unknown properties will be ignored.
      Specified by:
      option in interface Scan<BatchScan,ScanTask,ScanTaskGroup<ScanTask>>
      Parameters:
      property - name of the table property to be overridden
      value - value to override with
      Returns:
      a new scan based on this with overridden behavior
    • project

      public BatchScan project(Schema schema)
      Description copied from interface: Scan
      Create a new scan from this with the schema as its projection.
      Specified by:
      project in interface Scan<BatchScan,ScanTask,ScanTaskGroup<ScanTask>>
      Parameters:
      schema - a projection schema
      Returns:
      a new scan based on this with the given projection
    • caseSensitive

      public BatchScan caseSensitive(boolean caseSensitive)
      Description copied from interface: Scan
      Create a new scan from this that, if data columns where selected via Scan.select(java.util.Collection), controls whether the match to the schema will be done with case sensitivity. Default is true.
      Specified by:
      caseSensitive in interface Scan<BatchScan,ScanTask,ScanTaskGroup<ScanTask>>
      Returns:
      a new scan based on this with case sensitivity as stated
    • isCaseSensitive

      public boolean isCaseSensitive()
      Description copied from interface: Scan
      Returns whether this scan is case-sensitive with respect to column names.
      Specified by:
      isCaseSensitive in interface Scan<BatchScan,ScanTask,ScanTaskGroup<ScanTask>>
      Returns:
      true if case-sensitive, false otherwise.
    • includeColumnStats

      public BatchScan includeColumnStats()
      Description copied from interface: Scan
      Create a new scan from this that loads the column stats with each data file.

      Column stats include: value count, null value count, lower bounds, and upper bounds.

      Specified by:
      includeColumnStats in interface Scan<BatchScan,ScanTask,ScanTaskGroup<ScanTask>>
      Returns:
      a new scan based on this that loads column stats.
    • includeColumnStats

      public BatchScan includeColumnStats(Collection<String> requestedColumns)
      Description copied from interface: Scan
      Create a new scan from this that loads the column stats for the specific columns with each data file.

      Column stats include: value count, null value count, lower bounds, and upper bounds.

      Specified by:
      includeColumnStats in interface Scan<BatchScan,ScanTask,ScanTaskGroup<ScanTask>>
      Parameters:
      requestedColumns - column names for which to keep the stats.
      Returns:
      a new scan based on this that loads column stats for specific columns.
    • select

      public BatchScan select(Collection<String> columns)
      Description copied from interface: Scan
      Create a new scan from this that will read the given data columns. This produces an expected schema that includes all fields that are either selected or used by this scan's filter expression.
      Specified by:
      select in interface Scan<BatchScan,ScanTask,ScanTaskGroup<ScanTask>>
      Parameters:
      columns - column names from the table's schema
      Returns:
      a new scan based on this with the given projection columns
    • filter

      public BatchScan filter(Expression expr)
      Description copied from interface: Scan
      Create a new scan from the results of this filtered by the Expression.
      Specified by:
      filter in interface Scan<BatchScan,ScanTask,ScanTaskGroup<ScanTask>>
      Parameters:
      expr - a filter expression
      Returns:
      a new scan based on this with results filtered by the expression
    • filter

      public Expression filter()
      Description copied from interface: Scan
      Returns this scan's filter Expression.
      Specified by:
      filter in interface Scan<BatchScan,ScanTask,ScanTaskGroup<ScanTask>>
      Returns:
      this scan's filter expression
    • ignoreResiduals

      public BatchScan ignoreResiduals()
      Description copied from interface: Scan
      Create a new scan from this that applies data filtering to files but not to rows in those files.
      Specified by:
      ignoreResiduals in interface Scan<BatchScan,ScanTask,ScanTaskGroup<ScanTask>>
      Returns:
      a new scan based on this that does not filter rows in files.
    • planWith

      public BatchScan planWith(ExecutorService executorService)
      Description copied from interface: Scan
      Create a new scan to use a particular executor to plan. The default worker pool will be used by default.
      Specified by:
      planWith in interface Scan<BatchScan,ScanTask,ScanTaskGroup<ScanTask>>
      Parameters:
      executorService - the provided executor
      Returns:
      a table scan that uses the provided executor to access manifests
    • schema

      public Schema schema()
      Description copied from interface: Scan
      Returns this scan's projection Schema.

      If the projection schema was set directly using Scan.project(Schema), returns that schema.

      If the projection schema was set by calling Scan.select(Collection), returns a projection schema that includes the selected data fields and any fields used in the filter expression.

      Specified by:
      schema in interface Scan<BatchScan,ScanTask,ScanTaskGroup<ScanTask>>
      Returns:
      this scan's projection schema
    • planFiles

      public CloseableIterable<ScanTask> planFiles()
      Description copied from interface: Scan
      Plan tasks for this scan where each task reads a single file.

      Use Scan.planTasks() for planning balanced tasks where each task will read either a single file, a part of a file, or multiple files.

      Specified by:
      planFiles in interface Scan<BatchScan,ScanTask,ScanTaskGroup<ScanTask>>
      Returns:
      an Iterable of tasks scanning entire files required by this scan
    • planTasks

      Description copied from interface: Scan
      Plan balanced task groups for this scan by splitting large and combining small tasks.

      Task groups created by this method may read partial input files, multiple input files or both.

      Specified by:
      planTasks in interface Scan<BatchScan,ScanTask,ScanTaskGroup<ScanTask>>
      Returns:
      an Iterable of balanced task groups required by this scan
    • targetSplitSize

      public long targetSplitSize()
      Description copied from interface: Scan
      Returns the target split size for this scan.
      Specified by:
      targetSplitSize in interface Scan<BatchScan,ScanTask,ScanTaskGroup<ScanTask>>
    • splitLookback

      public int splitLookback()
      Description copied from interface: Scan
      Returns the split lookback for this scan.
      Specified by:
      splitLookback in interface Scan<BatchScan,ScanTask,ScanTaskGroup<ScanTask>>
    • splitOpenFileCost

      public long splitOpenFileCost()
      Description copied from interface: Scan
      Returns the split open file cost for this scan.
      Specified by:
      splitOpenFileCost in interface Scan<BatchScan,ScanTask,ScanTaskGroup<ScanTask>>
    • metricsReporter

      public BatchScan metricsReporter(MetricsReporter reporter)
      Description copied from interface: Scan
      Create a new scan that will report scan metrics to the provided reporter in addition to reporters maintained by the scan.
      Specified by:
      metricsReporter in interface Scan<BatchScan,ScanTask,ScanTaskGroup<ScanTask>>
    • minRowsRequested

      public BatchScan minRowsRequested(long numRows)
      Description copied from interface: Scan
      Create a new scan that returns files with at least the given number of rows. This is used as a hint and is entirely optional in order to not have to return more rows than necessary. This may return fewer rows if the scan does not contain that many, or it may return more than requested.
      Specified by:
      minRowsRequested in interface Scan<BatchScan,ScanTask,ScanTaskGroup<ScanTask>>
      Parameters:
      numRows - The minimum number of rows requested
      Returns:
      A new scan based on this with at least the given number of rows