Class AllDataFilesTable.AllDataFilesTableScan

java.lang.Object
org.apache.iceberg.SnapshotScan<TableScan,FileScanTask,CombinedScanTask>
org.apache.iceberg.AllDataFilesTable.AllDataFilesTableScan
All Implemented Interfaces:
Scan<TableScan,FileScanTask,CombinedScanTask>, TableScan
Enclosing class:
AllDataFilesTable

public static class AllDataFilesTable.AllDataFilesTableScan extends SnapshotScan<TableScan,FileScanTask,CombinedScanTask>
  • Field Details Link icon

    • SCAN_COLUMNS Link icon

      protected static final List<String> SCAN_COLUMNS
    • SCAN_WITH_STATS_COLUMNS Link icon

      protected static final List<String> SCAN_WITH_STATS_COLUMNS
    • DELETE_SCAN_COLUMNS Link icon

      protected static final List<String> DELETE_SCAN_COLUMNS
    • DELETE_SCAN_WITH_STATS_COLUMNS Link icon

      protected static final List<String> DELETE_SCAN_WITH_STATS_COLUMNS
    • PLAN_SCANS_WITH_WORKER_POOL Link icon

      protected static final boolean PLAN_SCANS_WITH_WORKER_POOL
  • Method Details Link icon

    • newRefinedScan Link icon

      protected TableScan newRefinedScan(Table table, Schema schema, org.apache.iceberg.TableScanContext context)
    • manifests Link icon

      protected CloseableIterable<ManifestFile> manifests()
      Returns an iterable of manifest files to explore for this all files metadata table scan
    • doPlanFiles Link icon

      protected CloseableIterable<FileScanTask> doPlanFiles()
      Specified by:
      doPlanFiles in class SnapshotScan<TableScan,FileScanTask,CombinedScanTask>
    • useSnapshot Link icon

      public TableScan useSnapshot(long scanSnapshotId)
      Description copied from interface: TableScan
      Create a new TableScan from this scan's configuration that will use the given snapshot by ID.
      Specified by:
      useSnapshot in interface TableScan
      Overrides:
      useSnapshot in class SnapshotScan<TableScan,FileScanTask,CombinedScanTask>
      Parameters:
      scanSnapshotId - a snapshot ID
      Returns:
      a new scan based on this with the given snapshot ID
    • useRef Link icon

      public TableScan useRef(String ref)
      Description copied from interface: TableScan
      Create a new TableScan from this scan's configuration that will use the given reference.
      Specified by:
      useRef in interface TableScan
      Overrides:
      useRef in class SnapshotScan<TableScan,FileScanTask,CombinedScanTask>
      Parameters:
      ref - reference
      Returns:
      a new scan based on the given reference.
    • asOfTime Link icon

      public TableScan asOfTime(long timestampMillis)
      Description copied from interface: TableScan
      Create a new TableScan from this scan's configuration that will use the most recent snapshot as of the given time in milliseconds on the branch in the scan or main if no branch is set.
      Specified by:
      asOfTime in interface TableScan
      Overrides:
      asOfTime in class SnapshotScan<TableScan,FileScanTask,CombinedScanTask>
      Parameters:
      timestampMillis - a timestamp in milliseconds.
      Returns:
      a new scan based on this with the current snapshot at the given time
    • planFiles Link icon

      public CloseableIterable<FileScanTask> planFiles()
      Description copied from interface: Scan
      Plan tasks for this scan where each task reads a single file.

      Use Scan.planTasks() for planning balanced tasks where each task will read either a single file, a part of a file, or multiple files.

      Specified by:
      planFiles in interface Scan<TableScan,FileScanTask,CombinedScanTask>
      Overrides:
      planFiles in class SnapshotScan<TableScan,FileScanTask,CombinedScanTask>
      Returns:
      an Iterable of tasks scanning entire files required by this scan
    • reachableManifests Link icon

      protected CloseableIterable<ManifestFile> reachableManifests(org.apache.iceberg.relocated.com.google.common.base.Function<Snapshot,Iterable<ManifestFile>> toManifests)
    • tableType Link icon

      protected MetadataTableType tableType()
      Type of scan being performed, such as MetadataTableType.ALL_DATA_FILES when scanning a table's AllDataFilesTable.

      Used for logging and error messages.

    • appendsBetween Link icon

      public TableScan appendsBetween(long fromSnapshotId, long toSnapshotId)
      Description copied from interface: TableScan
      Create a new TableScan to read appended data from fromSnapshotId exclusive to toSnapshotId inclusive.
      Specified by:
      appendsBetween in interface TableScan
      Parameters:
      fromSnapshotId - the last snapshot id read by the user, exclusive
      toSnapshotId - read append data up to this snapshot id
      Returns:
      a table scan which can read append data from fromSnapshotId exclusive and up to toSnapshotId inclusive
    • appendsAfter Link icon

      public TableScan appendsAfter(long fromSnapshotId)
      Description copied from interface: TableScan
      Create a new TableScan to read appended data from fromSnapshotId exclusive to the current snapshot inclusive.
      Specified by:
      appendsAfter in interface TableScan
      Parameters:
      fromSnapshotId - - the last snapshot id read by the user, exclusive
      Returns:
      a table scan which can read append data from fromSnapshotId exclusive and up to current snapshot inclusive
    • targetSplitSize Link icon

      public long targetSplitSize()
      Description copied from interface: Scan
      Returns the target split size for this scan.
      Specified by:
      targetSplitSize in interface Scan<TableScan,FileScanTask,CombinedScanTask>
    • planTasks Link icon

      public CloseableIterable<CombinedScanTask> planTasks()
      Description copied from interface: Scan
      Plan balanced task groups for this scan by splitting large and combining small tasks.

      Task groups created by this method may read partial input files, multiple input files or both.

      Specified by:
      planTasks in interface Scan<TableScan,FileScanTask,CombinedScanTask>
      Returns:
      an Iterable of balanced task groups required by this scan
    • table Link icon

      public Table table()
    • io Link icon

      protected FileIO io()
    • tableSchema Link icon

      protected Schema tableSchema()
    • context Link icon

      protected org.apache.iceberg.TableScanContext context()
    • options Link icon

      protected Map<String,String> options()
    • scanColumns Link icon

      protected List<String> scanColumns()
    • shouldReturnColumnStats Link icon

      protected boolean shouldReturnColumnStats()
    • columnsToKeepStats Link icon

      protected Set<Integer> columnsToKeepStats()
    • shouldIgnoreResiduals Link icon

      protected boolean shouldIgnoreResiduals()
    • residualFilter Link icon

      protected Expression residualFilter()
    • shouldPlanWithExecutor Link icon

      protected boolean shouldPlanWithExecutor()
    • planExecutor Link icon

      protected ExecutorService planExecutor()
    • option Link icon

      public TableScan option(String property, String value)
      Description copied from interface: Scan
      Create a new scan from this scan's configuration that will override the Table's behavior based on the incoming pair. Unknown properties will be ignored.
      Specified by:
      option in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
      Parameters:
      property - name of the table property to be overridden
      value - value to override with
      Returns:
      a new scan based on this with overridden behavior
    • project Link icon

      public TableScan project(Schema projectedSchema)
      Description copied from interface: Scan
      Create a new scan from this with the schema as its projection.
      Specified by:
      project in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
      Parameters:
      projectedSchema - a projection schema
      Returns:
      a new scan based on this with the given projection
    • caseSensitive Link icon

      public TableScan caseSensitive(boolean caseSensitive)
      Description copied from interface: Scan
      Create a new scan from this that, if data columns where selected via Scan.select(java.util.Collection), controls whether the match to the schema will be done with case sensitivity. Default is true.
      Specified by:
      caseSensitive in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
      Returns:
      a new scan based on this with case sensitivity as stated
    • isCaseSensitive Link icon

      public boolean isCaseSensitive()
      Description copied from interface: Scan
      Returns whether this scan is case-sensitive with respect to column names.
      Specified by:
      isCaseSensitive in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
      Returns:
      true if case-sensitive, false otherwise.
    • includeColumnStats Link icon

      public TableScan includeColumnStats()
      Description copied from interface: Scan
      Create a new scan from this that loads the column stats with each data file.

      Column stats include: value count, null value count, lower bounds, and upper bounds.

      Specified by:
      includeColumnStats in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
      Returns:
      a new scan based on this that loads column stats.
    • includeColumnStats Link icon

      public TableScan includeColumnStats(Collection<String> requestedColumns)
      Description copied from interface: Scan
      Create a new scan from this that loads the column stats for the specific columns with each data file.

      Column stats include: value count, null value count, lower bounds, and upper bounds.

      Specified by:
      includeColumnStats in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
      Parameters:
      requestedColumns - column names for which to keep the stats.
      Returns:
      a new scan based on this that loads column stats for specific columns.
    • select Link icon

      public TableScan select(Collection<String> columns)
      Description copied from interface: Scan
      Create a new scan from this that will read the given data columns. This produces an expected schema that includes all fields that are either selected or used by this scan's filter expression.
      Specified by:
      select in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
      Parameters:
      columns - column names from the table's schema
      Returns:
      a new scan based on this with the given projection columns
    • filter Link icon

      public TableScan filter(Expression expr)
      Description copied from interface: Scan
      Create a new scan from the results of this filtered by the Expression.
      Specified by:
      filter in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
      Parameters:
      expr - a filter expression
      Returns:
      a new scan based on this with results filtered by the expression
    • filter Link icon

      public Expression filter()
      Description copied from interface: Scan
      Returns this scan's filter Expression.
      Specified by:
      filter in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
      Returns:
      this scan's filter expression
    • ignoreResiduals Link icon

      public TableScan ignoreResiduals()
      Description copied from interface: Scan
      Create a new scan from this that applies data filtering to files but not to rows in those files.
      Specified by:
      ignoreResiduals in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
      Returns:
      a new scan based on this that does not filter rows in files.
    • planWith Link icon

      public TableScan planWith(ExecutorService executorService)
      Description copied from interface: Scan
      Create a new scan to use a particular executor to plan. The default worker pool will be used by default.
      Specified by:
      planWith in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
      Parameters:
      executorService - the provided executor
      Returns:
      a table scan that uses the provided executor to access manifests
    • schema Link icon

      public Schema schema()
      Description copied from interface: Scan
      Returns this scan's projection Schema.

      If the projection schema was set directly using Scan.project(Schema), returns that schema.

      If the projection schema was set by calling Scan.select(Collection), returns a projection schema that includes the selected data fields and any fields used in the filter expression.

      Specified by:
      schema in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
      Returns:
      this scan's projection schema
    • splitLookback Link icon

      public int splitLookback()
      Description copied from interface: Scan
      Returns the split lookback for this scan.
      Specified by:
      splitLookback in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
    • splitOpenFileCost Link icon

      public long splitOpenFileCost()
      Description copied from interface: Scan
      Returns the split open file cost for this scan.
      Specified by:
      splitOpenFileCost in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
    • metricsReporter Link icon

      public TableScan metricsReporter(MetricsReporter reporter)
      Description copied from interface: Scan
      Create a new scan that will report scan metrics to the provided reporter in addition to reporters maintained by the scan.
      Specified by:
      metricsReporter in interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>