Type Parameters:: ThisT - the child Java API class, returned by method chaining; T - the Java type of tasks produces by this scan; G - the Java type of task groups produces by this scan

All Known Subinterfaces:: BatchScan, IncrementalAppendScan, IncrementalChangelogScan, IncrementalScan<ThisT,T,G>, TableScan

All Known Implementing Classes:: AllDataFilesTable.AllDataFilesTableScan, AllDeleteFilesTable.AllDeleteFilesTableScan, AllFilesTable.AllFilesTableScan, AllManifestsTable.AllManifestsTableScan, DataFilesTable.DataFilesTableScan, DataTableScan, DeleteFilesTable.DeleteFilesTableScan, FilesTable.FilesTableScan, PositionDeletesTable.PositionDeletesBatchScan, SnapshotScan, SparkDistributedDataScan

public interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>

Scan objects are immutable and can be shared between threads. Refinement methods, like select(Collection) and filter(Expression), create new TableScan instances.

Method Summary

Modifier and Type

Method

Description

ThisT

caseSensitive(boolean caseSensitive)

Create a new scan from this that, if data columns where selected via select(java.util.Collection), controls whether the match to the schema will be done with case sensitivity.

Expression

filter()

Returns this scan's filter Expression.

ThisT

filter(Expression expr)

Create a new scan from the results of this filtered by the Expression.

ThisT

ignoreResiduals()

Create a new scan from this that applies data filtering to files but not to rows in those files.

ThisT

includeColumnStats()

Create a new scan from this that loads the column stats with each data file.

default ThisT

includeColumnStats(Collection<String> requestedColumns)

Create a new scan from this that loads the column stats for the specific columns with each data file.

boolean

isCaseSensitive()

Returns whether this scan is case-sensitive with respect to column names.

default ThisT

metricsReporter(MetricsReporter reporter)

Create a new scan that will report scan metrics to the provided reporter in addition to reporters maintained by the scan.

ThisT

option(String property, String value)

Create a new scan from this scan's configuration that will override the Table's behavior based on the incoming pair.

CloseableIterable<T>

planFiles()

Plan tasks for this scan where each task reads a single file.

CloseableIterable<G>

planTasks()

Plan balanced task groups for this scan by splitting large and combining small tasks.

ThisT

planWith(ExecutorService executorService)

Create a new scan to use a particular executor to plan.

ThisT

project(Schema schema)

Create a new scan from this with the schema as its projection.

Schema

schema()

Returns this scan's projection Schema.

default ThisT

select(String... columns)

Create a new scan from this that will read the given columns.

ThisT

select(Collection<String> columns)

Create a new scan from this that will read the given data columns.

int

splitLookback()

Returns the split lookback for this scan.

long

splitOpenFileCost()

Returns the split open file cost for this scan.

long

targetSplitSize()

Returns the target split size for this scan.

Method Details
- option
  
  ThisT option(String property, String value)
  
  Create a new scan from this scan's configuration that will override the Table's behavior based on the incoming pair. Unknown properties will be ignored.
  
  Parameters:
  
  property - name of the table property to be overridden
  
  value - value to override with
  
  Returns:
  
  a new scan based on this with overridden behavior
- project
  
  ThisT project(Schema schema)
  
  Create a new scan from this with the schema as its projection.
  
  Parameters:
  
  schema - a projection schema
  
  Returns:
  
  a new scan based on this with the given projection
- caseSensitive
  
  ThisT caseSensitive(boolean caseSensitive)
  
  Create a new scan from this that, if data columns where selected via select(java.util.Collection), controls whether the match to the schema will be done with case sensitivity. Default is true.
  
  Returns:
  
  a new scan based on this with case sensitivity as stated
- isCaseSensitive
  
  boolean isCaseSensitive()
  
  Returns whether this scan is case-sensitive with respect to column names.
  
  Returns:
  
  true if case-sensitive, false otherwise.
- includeColumnStats
  
  ThisT includeColumnStats()
  
  Create a new scan from this that loads the column stats with each data file.
  Column stats include: value count, null value count, lower bounds, and upper bounds.
  
  Returns:
  
  a new scan based on this that loads column stats.
- includeColumnStats
  
  default ThisT includeColumnStats(Collection<String> requestedColumns)
  
  Create a new scan from this that loads the column stats for the specific columns with each data file.
  Column stats include: value count, null value count, lower bounds, and upper bounds.
  
  Parameters:
  
  requestedColumns - column names for which to keep the stats.
  
  Returns:
  
  a new scan based on this that loads column stats for specific columns.
- select
  
  ThisT select(Collection<String> columns)
  
  Create a new scan from this that will read the given data columns. This produces an expected schema that includes all fields that are either selected or used by this scan's filter expression.
  
  Parameters:
  
  columns - column names from the table's schema
  
  Returns:
  
  a new scan based on this with the given projection columns
- select
  
  default ThisT select(String... columns)
  
  Create a new scan from this that will read the given columns. This produces an expected schema that includes all fields that are either selected or used by this scan's filter expression.
  
  Parameters:
  
  columns - column names
  
  Returns:
  
  a new scan based on this with the given projection columns
- filter
  
  ThisT filter(Expression expr)
  
  Create a new scan from the results of this filtered by the Expression.
  
  Parameters:
  
  expr - a filter expression
  
  Returns:
  
  a new scan based on this with results filtered by the expression
- filter
  
  Expression filter()
  
  Returns this scan's filter Expression.
  
  Returns:
  
  this scan's filter expression
- ignoreResiduals
  
  ThisT ignoreResiduals()
  
  Create a new scan from this that applies data filtering to files but not to rows in those files.
  
  Returns:
  
  a new scan based on this that does not filter rows in files.
- planWith
  
  ThisT planWith(ExecutorService executorService)
  
  Create a new scan to use a particular executor to plan. The default worker pool will be used by default.
  
  Parameters:
  
  executorService - the provided executor
  
  Returns:
  
  a table scan that uses the provided executor to access manifests
- schema
  
  Schema schema()
  
  Returns this scan's projection Schema.
  If the projection schema was set directly using project(Schema), returns that schema.
  If the projection schema was set by calling select(Collection), returns a projection schema that includes the selected data fields and any fields used in the filter expression.
  
  Returns:
  
  this scan's projection schema
- planFiles
  
  CloseableIterable<T> planFiles()
  
  Plan tasks for this scan where each task reads a single file.
  Use planTasks() for planning balanced tasks where each task will read either a single file, a part of a file, or multiple files.
  
  Returns:
  
  an Iterable of tasks scanning entire files required by this scan
- planTasks
  
  CloseableIterable<G> planTasks()
  
  Plan balanced task groups for this scan by splitting large and combining small tasks.
  Task groups created by this method may read partial input files, multiple input files or both.
  
  Returns:
  
  an Iterable of balanced task groups required by this scan
- targetSplitSize
  
  long targetSplitSize()
  
  Returns the target split size for this scan.
- splitLookback
  
  int splitLookback()
  
  Returns the split lookback for this scan.
- splitOpenFileCost
  
  long splitOpenFileCost()
  
  Returns the split open file cost for this scan.
- metricsReporter
  
  default ThisT metricsReporter(MetricsReporter reporter)
  
  Create a new scan that will report scan metrics to the provided reporter in addition to reporters maintained by the scan.

Interface Scan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>

Method Summary

Method Details

option

project

caseSensitive

isCaseSensitive

includeColumnStats

includeColumnStats

select

select

filter

filter

ignoreResiduals

planWith

schema

planFiles

planTasks

targetSplitSize

splitLookback

splitOpenFileCost

metricsReporter