Class SnapshotScan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>

java.lang.Object
org.apache.iceberg.SnapshotScan<ThisT,T,G>
Type Parameters:
ThisT - actual BaseScan implementation class type
T - type of ScanTask returned
G - type of ScanTaskGroup returned
All Implemented Interfaces:
Scan<ThisT,T,G>
Direct Known Subclasses:
AllDataFilesTable.AllDataFilesTableScan, AllDeleteFilesTable.AllDeleteFilesTableScan, AllFilesTable.AllFilesTableScan, AllManifestsTable.AllManifestsTableScan, DataFilesTable.DataFilesTableScan, DataTableScan, DeleteFilesTable.DeleteFilesTableScan, FilesTable.FilesTableScan, PositionDeletesTable.PositionDeletesBatchScan, SparkDistributedDataScan

public abstract class SnapshotScan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>> extends Object
This is a common base class to share code between different BaseScan implementations that handle scans of a particular snapshot.
  • Field Details

    • SCAN_COLUMNS

      protected static final List<String> SCAN_COLUMNS
    • SCAN_WITH_STATS_COLUMNS

      protected static final List<String> SCAN_WITH_STATS_COLUMNS
    • DELETE_SCAN_COLUMNS

      protected static final List<String> DELETE_SCAN_COLUMNS
    • DELETE_SCAN_WITH_STATS_COLUMNS

      protected static final List<String> DELETE_SCAN_WITH_STATS_COLUMNS
    • PLAN_SCANS_WITH_WORKER_POOL

      protected static final boolean PLAN_SCANS_WITH_WORKER_POOL
  • Constructor Details

    • SnapshotScan

      protected SnapshotScan(Table table, Schema schema, org.apache.iceberg.TableScanContext context)
  • Method Details

    • snapshotId

      protected Long snapshotId()
    • doPlanFiles

      protected abstract CloseableIterable<T> doPlanFiles()
    • useSnapshotSchema

      protected boolean useSnapshotSchema()
    • scanMetrics

      protected ScanMetrics scanMetrics()
    • useSnapshot

      public ThisT useSnapshot(long scanSnapshotId)
    • useRef

      public ThisT useRef(String name)
    • asOfTime

      public ThisT asOfTime(long timestampMillis)
    • planFiles

      public CloseableIterable<T> planFiles()
      Description copied from interface: Scan
      Plan tasks for this scan where each task reads a single file.

      Use Scan.planTasks() for planning balanced tasks where each task will read either a single file, a part of a file, or multiple files.

      Returns:
      an Iterable of tasks scanning entire files required by this scan
    • snapshot

      public Snapshot snapshot()
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • table

      public Table table()
    • io

      protected FileIO io()
    • tableSchema

      protected Schema tableSchema()
    • context

      protected org.apache.iceberg.TableScanContext context()
    • options

      protected Map<String,String> options()
    • scanColumns

      protected List<String> scanColumns()
    • shouldReturnColumnStats

      protected boolean shouldReturnColumnStats()
    • columnsToKeepStats

      protected Set<Integer> columnsToKeepStats()
    • shouldIgnoreResiduals

      protected boolean shouldIgnoreResiduals()
    • residualFilter

      protected Expression residualFilter()
    • shouldPlanWithExecutor

      protected boolean shouldPlanWithExecutor()
    • planExecutor

      protected ExecutorService planExecutor()
    • newRefinedScan

      protected abstract ThisT newRefinedScan(Table newTable, Schema newSchema, org.apache.iceberg.TableScanContext newContext)
    • option

      public ThisT option(String property, String value)
      Description copied from interface: Scan
      Create a new scan from this scan's configuration that will override the Table's behavior based on the incoming pair. Unknown properties will be ignored.
      Specified by:
      option in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
      Parameters:
      property - name of the table property to be overridden
      value - value to override with
      Returns:
      a new scan based on this with overridden behavior
    • project

      public ThisT project(Schema projectedSchema)
      Description copied from interface: Scan
      Create a new scan from this with the schema as its projection.
      Specified by:
      project in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
      Parameters:
      projectedSchema - a projection schema
      Returns:
      a new scan based on this with the given projection
    • caseSensitive

      public ThisT caseSensitive(boolean caseSensitive)
      Description copied from interface: Scan
      Create a new scan from this that, if data columns where selected via Scan.select(java.util.Collection), controls whether the match to the schema will be done with case sensitivity. Default is true.
      Specified by:
      caseSensitive in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
      Returns:
      a new scan based on this with case sensitivity as stated
    • isCaseSensitive

      public boolean isCaseSensitive()
      Description copied from interface: Scan
      Returns whether this scan is case-sensitive with respect to column names.
      Specified by:
      isCaseSensitive in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
      Returns:
      true if case-sensitive, false otherwise.
    • includeColumnStats

      public ThisT includeColumnStats()
      Description copied from interface: Scan
      Create a new scan from this that loads the column stats with each data file.

      Column stats include: value count, null value count, lower bounds, and upper bounds.

      Specified by:
      includeColumnStats in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
      Returns:
      a new scan based on this that loads column stats.
    • includeColumnStats

      public ThisT includeColumnStats(Collection<String> requestedColumns)
      Description copied from interface: Scan
      Create a new scan from this that loads the column stats for the specific columns with each data file.

      Column stats include: value count, null value count, lower bounds, and upper bounds.

      Specified by:
      includeColumnStats in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
      Parameters:
      requestedColumns - column names for which to keep the stats.
      Returns:
      a new scan based on this that loads column stats for specific columns.
    • select

      public ThisT select(Collection<String> columns)
      Description copied from interface: Scan
      Create a new scan from this that will read the given data columns. This produces an expected schema that includes all fields that are either selected or used by this scan's filter expression.
      Specified by:
      select in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
      Parameters:
      columns - column names from the table's schema
      Returns:
      a new scan based on this with the given projection columns
    • filter

      public ThisT filter(Expression expr)
      Description copied from interface: Scan
      Create a new scan from the results of this filtered by the Expression.
      Specified by:
      filter in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
      Parameters:
      expr - a filter expression
      Returns:
      a new scan based on this with results filtered by the expression
    • filter

      public Expression filter()
      Description copied from interface: Scan
      Returns this scan's filter Expression.
      Specified by:
      filter in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
      Returns:
      this scan's filter expression
    • ignoreResiduals

      public ThisT ignoreResiduals()
      Description copied from interface: Scan
      Create a new scan from this that applies data filtering to files but not to rows in those files.
      Specified by:
      ignoreResiduals in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
      Returns:
      a new scan based on this that does not filter rows in files.
    • planWith

      public ThisT planWith(ExecutorService executorService)
      Description copied from interface: Scan
      Create a new scan to use a particular executor to plan. The default worker pool will be used by default.
      Specified by:
      planWith in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
      Parameters:
      executorService - the provided executor
      Returns:
      a table scan that uses the provided executor to access manifests
    • schema

      public Schema schema()
      Description copied from interface: Scan
      Returns this scan's projection Schema.

      If the projection schema was set directly using Scan.project(Schema), returns that schema.

      If the projection schema was set by calling Scan.select(Collection), returns a projection schema that includes the selected data fields and any fields used in the filter expression.

      Specified by:
      schema in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
      Returns:
      this scan's projection schema
    • targetSplitSize

      public long targetSplitSize()
      Description copied from interface: Scan
      Returns the target split size for this scan.
      Specified by:
      targetSplitSize in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
    • splitLookback

      public int splitLookback()
      Description copied from interface: Scan
      Returns the split lookback for this scan.
      Specified by:
      splitLookback in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
    • splitOpenFileCost

      public long splitOpenFileCost()
      Description copied from interface: Scan
      Returns the split open file cost for this scan.
      Specified by:
      splitOpenFileCost in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
    • metricsReporter

      public ThisT metricsReporter(MetricsReporter reporter)
      Description copied from interface: Scan
      Create a new scan that will report scan metrics to the provided reporter in addition to reporters maintained by the scan.
      Specified by:
      metricsReporter in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>