Package org.apache.iceberg
Class DataFilesTable.DataFilesTableScan
java.lang.Object
org.apache.iceberg.SnapshotScan<TableScan,FileScanTask,CombinedScanTask>
org.apache.iceberg.DataFilesTable.DataFilesTableScan
- All Implemented Interfaces:
Scan<TableScan,
,FileScanTask, CombinedScanTask> TableScan
- Enclosing class:
- DataFilesTable
public static class DataFilesTable.DataFilesTableScan
extends SnapshotScan<TableScan,FileScanTask,CombinedScanTask>
-
Field Summary
Modifier and TypeFieldDescriptionprotected static final boolean
-
Method Summary
Modifier and TypeMethodDescriptionappendsAfter
(long fromSnapshotId) Create a newTableScan
to read appended data fromfromSnapshotId
exclusive to the current snapshot inclusive.appendsBetween
(long fromSnapshotId, long toSnapshotId) Create a newTableScan
to read appended data fromfromSnapshotId
exclusive totoSnapshotId
inclusive.caseSensitive
(boolean caseSensitive) Create a new scan from this that, if data columns where selected viaScan.select(java.util.Collection)
, controls whether the match to the schema will be done with case sensitivity.protected org.apache.iceberg.TableScanContext
context()
protected CloseableIterable<FileScanTask>
filter()
Returns this scan's filterExpression
.filter
(Expression expr) Create a new scan from the results of this filtered by theExpression
.Create a new scan from this that applies data filtering to files but not to rows in those files.Create a new scan from this that loads the column stats with each data file.includeColumnStats
(Collection<String> requestedColumns) Create a new scan from this that loads the column stats for the specific columns with each data file.protected FileIO
io()
boolean
Returns whether this scan is case-sensitive with respect to column names.protected CloseableIterable<ManifestFile>
Returns an iterable of manifest files to explore for this files metadata table scanmetricsReporter
(MetricsReporter reporter) Create a new scan that will report scan metrics to the provided reporter in addition to reporters maintained by the scan.protected TableScan
newRefinedScan
(Table table, Schema schema, org.apache.iceberg.TableScanContext context) Create a new scan from this scan's configuration that will override theTable
's behavior based on the incoming pair.options()
protected ExecutorService
Plan balanced task groups for this scan by splitting large and combining small tasks.planWith
(ExecutorService executorService) Create a new scan to use a particular executor to plan.Create a new scan from this with the schema as its projection.protected Expression
schema()
Returns this scan's projectionSchema
.select
(Collection<String> columns) Create a new scan from this that will read the given data columns.protected boolean
protected boolean
protected boolean
int
Returns the split lookback for this scan.long
Returns the split open file cost for this scan.table()
protected Schema
protected MetadataTableType
Type of scan being performed, such asMetadataTableType.ALL_DATA_FILES
when scanning a table'sAllDataFilesTable
.long
Returns the target split size for this scan.Methods inherited from class org.apache.iceberg.SnapshotScan
asOfTime, planFiles, scanMetrics, snapshot, snapshotId, toString, useRef, useSnapshot, useSnapshotSchema
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface org.apache.iceberg.Scan
caseSensitive, filter, filter, ignoreResiduals, includeColumnStats, includeColumnStats, isCaseSensitive, metricsReporter, option, planFiles, planWith, project, schema, select, select, splitLookback, splitOpenFileCost
-
Field Details
-
SCAN_COLUMNS
-
SCAN_WITH_STATS_COLUMNS
-
DELETE_SCAN_COLUMNS
-
DELETE_SCAN_WITH_STATS_COLUMNS
-
PLAN_SCANS_WITH_WORKER_POOL
protected static final boolean PLAN_SCANS_WITH_WORKER_POOL
-
-
Method Details
-
newRefinedScan
-
manifests
Returns an iterable of manifest files to explore for this files metadata table scan -
doPlanFiles
- Specified by:
doPlanFiles
in classSnapshotScan<TableScan,
FileScanTask, CombinedScanTask>
-
tableType
Type of scan being performed, such asMetadataTableType.ALL_DATA_FILES
when scanning a table'sAllDataFilesTable
.Used for logging and error messages.
-
appendsBetween
Description copied from interface:TableScan
Create a newTableScan
to read appended data fromfromSnapshotId
exclusive totoSnapshotId
inclusive.- Specified by:
appendsBetween
in interfaceTableScan
- Parameters:
fromSnapshotId
- the last snapshot id read by the user, exclusivetoSnapshotId
- read append data up to this snapshot id- Returns:
- a table scan which can read append data from
fromSnapshotId
exclusive and up totoSnapshotId
inclusive
-
appendsAfter
Description copied from interface:TableScan
Create a newTableScan
to read appended data fromfromSnapshotId
exclusive to the current snapshot inclusive.- Specified by:
appendsAfter
in interfaceTableScan
- Parameters:
fromSnapshotId
- - the last snapshot id read by the user, exclusive- Returns:
- a table scan which can read append data from
fromSnapshotId
exclusive and up to current snapshot inclusive
-
targetSplitSize
public long targetSplitSize()Description copied from interface:Scan
Returns the target split size for this scan.- Specified by:
targetSplitSize
in interfaceScan<TableScan,
FileScanTask, CombinedScanTask>
-
planTasks
Description copied from interface:Scan
Plan balanced task groups for this scan by splitting large and combining small tasks.Task groups created by this method may read partial input files, multiple input files or both.
- Specified by:
planTasks
in interfaceScan<TableScan,
FileScanTask, CombinedScanTask> - Returns:
- an Iterable of balanced task groups required by this scan
-
table
-
io
-
tableSchema
-
context
protected org.apache.iceberg.TableScanContext context() -
options
-
scanColumns
-
shouldReturnColumnStats
protected boolean shouldReturnColumnStats() -
columnsToKeepStats
-
shouldIgnoreResiduals
protected boolean shouldIgnoreResiduals() -
residualFilter
-
shouldPlanWithExecutor
protected boolean shouldPlanWithExecutor() -
planExecutor
-
option
Description copied from interface:Scan
Create a new scan from this scan's configuration that will override theTable
's behavior based on the incoming pair. Unknown properties will be ignored.- Specified by:
option
in interfaceScan<ThisT,
T extends ScanTask, G extends ScanTaskGroup<T>> - Parameters:
property
- name of the table property to be overriddenvalue
- value to override with- Returns:
- a new scan based on this with overridden behavior
-
project
Description copied from interface:Scan
Create a new scan from this with the schema as its projection.- Specified by:
project
in interfaceScan<ThisT,
T extends ScanTask, G extends ScanTaskGroup<T>> - Parameters:
projectedSchema
- a projection schema- Returns:
- a new scan based on this with the given projection
-
caseSensitive
Description copied from interface:Scan
Create a new scan from this that, if data columns where selected viaScan.select(java.util.Collection)
, controls whether the match to the schema will be done with case sensitivity. Default is true.- Specified by:
caseSensitive
in interfaceScan<ThisT,
T extends ScanTask, G extends ScanTaskGroup<T>> - Returns:
- a new scan based on this with case sensitivity as stated
-
isCaseSensitive
public boolean isCaseSensitive()Description copied from interface:Scan
Returns whether this scan is case-sensitive with respect to column names.- Specified by:
isCaseSensitive
in interfaceScan<ThisT,
T extends ScanTask, G extends ScanTaskGroup<T>> - Returns:
- true if case-sensitive, false otherwise.
-
includeColumnStats
Description copied from interface:Scan
Create a new scan from this that loads the column stats with each data file.Column stats include: value count, null value count, lower bounds, and upper bounds.
- Specified by:
includeColumnStats
in interfaceScan<ThisT,
T extends ScanTask, G extends ScanTaskGroup<T>> - Returns:
- a new scan based on this that loads column stats.
-
includeColumnStats
Description copied from interface:Scan
Create a new scan from this that loads the column stats for the specific columns with each data file.Column stats include: value count, null value count, lower bounds, and upper bounds.
- Specified by:
includeColumnStats
in interfaceScan<ThisT,
T extends ScanTask, G extends ScanTaskGroup<T>> - Parameters:
requestedColumns
- column names for which to keep the stats.- Returns:
- a new scan based on this that loads column stats for specific columns.
-
select
Description copied from interface:Scan
Create a new scan from this that will read the given data columns. This produces an expected schema that includes all fields that are either selected or used by this scan's filter expression.- Specified by:
select
in interfaceScan<ThisT,
T extends ScanTask, G extends ScanTaskGroup<T>> - Parameters:
columns
- column names from the table's schema- Returns:
- a new scan based on this with the given projection columns
-
filter
Description copied from interface:Scan
Create a new scan from the results of this filtered by theExpression
.- Specified by:
filter
in interfaceScan<ThisT,
T extends ScanTask, G extends ScanTaskGroup<T>> - Parameters:
expr
- a filter expression- Returns:
- a new scan based on this with results filtered by the expression
-
filter
Description copied from interface:Scan
Returns this scan's filterExpression
.- Specified by:
filter
in interfaceScan<ThisT,
T extends ScanTask, G extends ScanTaskGroup<T>> - Returns:
- this scan's filter expression
-
ignoreResiduals
Description copied from interface:Scan
Create a new scan from this that applies data filtering to files but not to rows in those files.- Specified by:
ignoreResiduals
in interfaceScan<ThisT,
T extends ScanTask, G extends ScanTaskGroup<T>> - Returns:
- a new scan based on this that does not filter rows in files.
-
planWith
Description copied from interface:Scan
Create a new scan to use a particular executor to plan. The default worker pool will be used by default.- Specified by:
planWith
in interfaceScan<ThisT,
T extends ScanTask, G extends ScanTaskGroup<T>> - Parameters:
executorService
- the provided executor- Returns:
- a table scan that uses the provided executor to access manifests
-
schema
Description copied from interface:Scan
Returns this scan's projectionSchema
.If the projection schema was set directly using
Scan.project(Schema)
, returns that schema.If the projection schema was set by calling
Scan.select(Collection)
, returns a projection schema that includes the selected data fields and any fields used in the filter expression.- Specified by:
schema
in interfaceScan<ThisT,
T extends ScanTask, G extends ScanTaskGroup<T>> - Returns:
- this scan's projection schema
-
splitLookback
public int splitLookback()Description copied from interface:Scan
Returns the split lookback for this scan.- Specified by:
splitLookback
in interfaceScan<ThisT,
T extends ScanTask, G extends ScanTaskGroup<T>>
-
splitOpenFileCost
public long splitOpenFileCost()Description copied from interface:Scan
Returns the split open file cost for this scan.- Specified by:
splitOpenFileCost
in interfaceScan<ThisT,
T extends ScanTask, G extends ScanTaskGroup<T>>
-
metricsReporter
Description copied from interface:Scan
Create a new scan that will report scan metrics to the provided reporter in addition to reporters maintained by the scan.- Specified by:
metricsReporter
in interfaceScan<ThisT,
T extends ScanTask, G extends ScanTaskGroup<T>>
-