Interface TableScan
-
- All Known Implementing Classes:
AllDataFilesTable.AllDataFilesTableScan
,AllManifestsTable.AllManifestsTableScan
,DataFilesTable.FilesTableScan
,DataTableScan
public interface TableScan
API for configuring a table scan.TableScan objects are immutable and can be shared between threads. Refinement methods, like
select(Collection)
andfilter(Expression)
, create new TableScan instances.
-
-
Method Summary
All Methods Instance Methods Abstract Methods Default Methods Modifier and Type Method Description TableScan
appendsAfter(long fromSnapshotId)
Create a newTableScan
to read appended data fromfromSnapshotId
exclusive to the current snapshot inclusive.TableScan
appendsBetween(long fromSnapshotId, long toSnapshotId)
Create a newTableScan
to read appended data fromfromSnapshotId
exclusive totoSnapshotId
inclusive.TableScan
asOfTime(long timestampMillis)
Create a newTableScan
from this scan's configuration that will use the most recent snapshot as of the given time in milliseconds.TableScan
caseSensitive(boolean caseSensitive)
Create a newTableScan
from this that, if data columns where selected viaselect(java.util.Collection)
, controls whether the match to the schema will be done with case sensitivity.Expression
filter()
Returns this scan's filterExpression
.TableScan
filter(Expression expr)
Create a newTableScan
from the results of this filtered by theExpression
.TableScan
ignoreResiduals()
Create a newTableScan
from this that applies data filtering to files but not to rows in those files.TableScan
includeColumnStats()
Create a newTableScan
from this that loads the column stats with each data file.boolean
isCaseSensitive()
Returns whether this scan should apply column name case sensitiveness as percaseSensitive(boolean)
.TableScan
option(java.lang.String property, java.lang.String value)
CloseableIterable<FileScanTask>
planFiles()
Plan thefiles
that will be read by this scan.CloseableIterable<CombinedScanTask>
planTasks()
Plan thetasks
for this scan.TableScan
project(Schema schema)
Create a newTableScan
from this with the schema as its projection.Schema
schema()
Returns this scan's projectionSchema
.default TableScan
select(java.lang.String... columns)
Create a newTableScan
from this that will read the given data columns.TableScan
select(java.util.Collection<java.lang.String> columns)
Create a newTableScan
from this that will read the given data columns.Snapshot
snapshot()
Returns theSnapshot
that will be used by this scan.int
splitLookback()
Returns the split lookback for this scan.long
splitOpenFileCost()
Returns the split open file cost for this scan.Table
table()
Returns theTable
from which this scan loads data.long
targetSplitSize()
Returns the target split size for this scan.TableScan
useSnapshot(long snapshotId)
Create a newTableScan
from this scan's configuration that will use the given snapshot by ID.
-
-
-
Method Detail
-
useSnapshot
TableScan useSnapshot(long snapshotId)
Create a newTableScan
from this scan's configuration that will use the given snapshot by ID.- Parameters:
snapshotId
- a snapshot ID- Returns:
- a new scan based on this with the given snapshot ID
- Throws:
java.lang.IllegalArgumentException
- if the snapshot cannot be found
-
asOfTime
TableScan asOfTime(long timestampMillis)
Create a newTableScan
from this scan's configuration that will use the most recent snapshot as of the given time in milliseconds.- Parameters:
timestampMillis
- a timestamp in milliseconds.- Returns:
- a new scan based on this with the current snapshot at the given time
- Throws:
java.lang.IllegalArgumentException
- if the snapshot cannot be found
-
option
TableScan option(java.lang.String property, java.lang.String value)
Create a newTableScan
from this scan's configuration that will override theTable
's behavior based on the incoming pair. Unknown properties will be ignored.- Parameters:
property
- name of the table property to be overriddenvalue
- value to override with- Returns:
- a new scan based on this with overridden behavior
-
project
TableScan project(Schema schema)
Create a newTableScan
from this with the schema as its projection.- Parameters:
schema
- a projection schema- Returns:
- a new scan based on this with the given projection
-
caseSensitive
TableScan caseSensitive(boolean caseSensitive)
Create a newTableScan
from this that, if data columns where selected viaselect(java.util.Collection)
, controls whether the match to the schema will be done with case sensitivity.- Returns:
- a new scan based on this with case sensitivity as stated
-
includeColumnStats
TableScan includeColumnStats()
Create a newTableScan
from this that loads the column stats with each data file.Column stats include: value count, null value count, lower bounds, and upper bounds.
- Returns:
- a new scan based on this that loads column stats.
-
select
default TableScan select(java.lang.String... columns)
Create a newTableScan
from this that will read the given data columns. This produces an expected schema that includes all fields that are either selected or used by this scan's filter expression.- Parameters:
columns
- column names from the table's schema- Returns:
- a new scan based on this with the given projection columns
-
select
TableScan select(java.util.Collection<java.lang.String> columns)
Create a newTableScan
from this that will read the given data columns. This produces an expected schema that includes all fields that are either selected or used by this scan's filter expression.- Parameters:
columns
- column names from the table's schema- Returns:
- a new scan based on this with the given projection columns
-
filter
TableScan filter(Expression expr)
Create a newTableScan
from the results of this filtered by theExpression
.- Parameters:
expr
- a filter expression- Returns:
- a new scan based on this with results filtered by the expression
-
filter
Expression filter()
Returns this scan's filterExpression
.- Returns:
- this scan's filter expression
-
ignoreResiduals
TableScan ignoreResiduals()
Create a newTableScan
from this that applies data filtering to files but not to rows in those files.- Returns:
- a new scan based on this that does not filter rows in files.
-
appendsBetween
TableScan appendsBetween(long fromSnapshotId, long toSnapshotId)
Create a newTableScan
to read appended data fromfromSnapshotId
exclusive totoSnapshotId
inclusive.- Parameters:
fromSnapshotId
- the last snapshot id read by the user, exclusivetoSnapshotId
- read append data up to this snapshot id- Returns:
- a table scan which can read append data from
fromSnapshotId
exclusive and up totoSnapshotId
inclusive
-
appendsAfter
TableScan appendsAfter(long fromSnapshotId)
Create a newTableScan
to read appended data fromfromSnapshotId
exclusive to the current snapshot inclusive.- Parameters:
fromSnapshotId
- - the last snapshot id read by the user, exclusive- Returns:
- a table scan which can read append data from
fromSnapshotId
exclusive and up to current snapshot inclusive
-
planFiles
CloseableIterable<FileScanTask> planFiles()
Plan thefiles
that will be read by this scan.Each file has a residual expression that should be applied to filter the file's rows.
This simple plan returns file scans for each file from position 0 to the file's length. For planning that will combine small files, split large files, and attempt to balance work, use
planTasks()
instead.- Returns:
- an Iterable of file tasks that are required by this scan
-
planTasks
CloseableIterable<CombinedScanTask> planTasks()
Plan thetasks
for this scan.Tasks created by this method may read partial input files, multiple input files, or both.
- Returns:
- an Iterable of tasks for this scan
-
schema
Schema schema()
Returns this scan's projectionSchema
.If the projection schema was set directly using
project(Schema)
, returns that schema.If the projection schema was set by calling
select(Collection)
, returns a projection schema that includes the selected data fields and any fields used in the filter expression.- Returns:
- this scan's projection schema
-
snapshot
Snapshot snapshot()
Returns theSnapshot
that will be used by this scan.If the snapshot was not configured using
asOfTime(long)
oruseSnapshot(long)
, the current table snapshot will be used.- Returns:
- the Snapshot this scan will use
-
isCaseSensitive
boolean isCaseSensitive()
Returns whether this scan should apply column name case sensitiveness as percaseSensitive(boolean)
.- Returns:
- true if case sensitive, false otherwise.
-
targetSplitSize
long targetSplitSize()
Returns the target split size for this scan.
-
splitLookback
int splitLookback()
Returns the split lookback for this scan.
-
splitOpenFileCost
long splitOpenFileCost()
Returns the split open file cost for this scan.
-
-