Package org.apache.iceberg
Class SnapshotScan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
- java.lang.Object
-
- org.apache.iceberg.SnapshotScan<ThisT,T,G>
-
- Type Parameters:
ThisT
- actual BaseScan implementation class typeT
- type of ScanTask returnedG
- type of ScanTaskGroup returned
- All Implemented Interfaces:
Scan<ThisT,T,G>
- Direct Known Subclasses:
AllDataFilesTable.AllDataFilesTableScan
,AllDeleteFilesTable.AllDeleteFilesTableScan
,AllFilesTable.AllFilesTableScan
,AllManifestsTable.AllManifestsTableScan
,DataFilesTable.DataFilesTableScan
,DataTableScan
,DeleteFilesTable.DeleteFilesTableScan
,FilesTable.FilesTableScan
,PositionDeletesTable.PositionDeletesBatchScan
,SparkDistributedDataScan
public abstract class SnapshotScan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>> extends java.lang.Object
This is a common base class to share code between different BaseScan implementations that handle scans of a particular snapshot.
-
-
Field Summary
Fields Modifier and Type Field Description protected static java.util.List<java.lang.String>
DELETE_SCAN_COLUMNS
protected static java.util.List<java.lang.String>
DELETE_SCAN_WITH_STATS_COLUMNS
protected static boolean
PLAN_SCANS_WITH_WORKER_POOL
protected static java.util.List<java.lang.String>
SCAN_COLUMNS
protected static java.util.List<java.lang.String>
SCAN_WITH_STATS_COLUMNS
-
Constructor Summary
Constructors Modifier Constructor Description protected
SnapshotScan(Table table, Schema schema, org.apache.iceberg.TableScanContext context)
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description ThisT
asOfTime(long timestampMillis)
ThisT
caseSensitive(boolean caseSensitive)
Create a new scan from this that, if data columns where selected viaScan.select(java.util.Collection)
, controls whether the match to the schema will be done with case sensitivity.protected org.apache.iceberg.TableScanContext
context()
protected abstract CloseableIterable<T>
doPlanFiles()
Expression
filter()
Returns this scan's filterExpression
.ThisT
filter(Expression expr)
Create a new scan from the results of this filtered by theExpression
.ThisT
ignoreResiduals()
Create a new scan from this that applies data filtering to files but not to rows in those files.ThisT
includeColumnStats()
Create a new scan from this that loads the column stats with each data file.protected FileIO
io()
boolean
isCaseSensitive()
Returns whether this scan is case-sensitive with respect to column names.ThisT
metricsReporter(MetricsReporter reporter)
Create a new scan that will report scan metrics to the provided reporter in addition to reporters maintained by the scan.protected abstract ThisT
newRefinedScan(Table newTable, Schema newSchema, org.apache.iceberg.TableScanContext newContext)
ThisT
option(java.lang.String property, java.lang.String value)
Create a new scan from this scan's configuration that will override theTable
's behavior based on the incoming pair.protected java.util.Map<java.lang.String,java.lang.String>
options()
protected java.util.concurrent.ExecutorService
planExecutor()
CloseableIterable<T>
planFiles()
Plan tasks for this scan where each task reads a single file.ThisT
planWith(java.util.concurrent.ExecutorService executorService)
Create a new scan to use a particular executor to plan.ThisT
project(Schema projectedSchema)
Create a new scan from this with the schema as its projection.protected Expression
residualFilter()
protected java.util.List<java.lang.String>
scanColumns()
protected ScanMetrics
scanMetrics()
Schema
schema()
Returns this scan's projectionSchema
.ThisT
select(java.util.Collection<java.lang.String> columns)
Create a new scan from this that will read the given data columns.protected boolean
shouldIgnoreResiduals()
protected boolean
shouldPlanWithExecutor()
protected boolean
shouldReturnColumnStats()
Snapshot
snapshot()
protected java.lang.Long
snapshotId()
int
splitLookback()
Returns the split lookback for this scan.long
splitOpenFileCost()
Returns the split open file cost for this scan.Table
table()
protected Schema
tableSchema()
long
targetSplitSize()
Returns the target split size for this scan.java.lang.String
toString()
ThisT
useRef(java.lang.String name)
ThisT
useSnapshot(long scanSnapshotId)
protected boolean
useSnapshotSchema()
-
-
-
Field Detail
-
SCAN_COLUMNS
protected static final java.util.List<java.lang.String> SCAN_COLUMNS
-
SCAN_WITH_STATS_COLUMNS
protected static final java.util.List<java.lang.String> SCAN_WITH_STATS_COLUMNS
-
DELETE_SCAN_COLUMNS
protected static final java.util.List<java.lang.String> DELETE_SCAN_COLUMNS
-
DELETE_SCAN_WITH_STATS_COLUMNS
protected static final java.util.List<java.lang.String> DELETE_SCAN_WITH_STATS_COLUMNS
-
PLAN_SCANS_WITH_WORKER_POOL
protected static final boolean PLAN_SCANS_WITH_WORKER_POOL
-
-
Method Detail
-
snapshotId
protected java.lang.Long snapshotId()
-
doPlanFiles
protected abstract CloseableIterable<T> doPlanFiles()
-
useSnapshotSchema
protected boolean useSnapshotSchema()
-
scanMetrics
protected ScanMetrics scanMetrics()
-
useSnapshot
public ThisT useSnapshot(long scanSnapshotId)
-
useRef
public ThisT useRef(java.lang.String name)
-
asOfTime
public ThisT asOfTime(long timestampMillis)
-
planFiles
public CloseableIterable<T> planFiles()
Description copied from interface:Scan
Plan tasks for this scan where each task reads a single file.Use
Scan.planTasks()
for planning balanced tasks where each task will read either a single file, a part of a file, or multiple files.- Returns:
- an Iterable of tasks scanning entire files required by this scan
-
snapshot
public Snapshot snapshot()
-
toString
public java.lang.String toString()
- Overrides:
toString
in classjava.lang.Object
-
table
public Table table()
-
io
protected FileIO io()
-
tableSchema
protected Schema tableSchema()
-
context
protected org.apache.iceberg.TableScanContext context()
-
options
protected java.util.Map<java.lang.String,java.lang.String> options()
-
scanColumns
protected java.util.List<java.lang.String> scanColumns()
-
shouldReturnColumnStats
protected boolean shouldReturnColumnStats()
-
shouldIgnoreResiduals
protected boolean shouldIgnoreResiduals()
-
residualFilter
protected Expression residualFilter()
-
shouldPlanWithExecutor
protected boolean shouldPlanWithExecutor()
-
planExecutor
protected java.util.concurrent.ExecutorService planExecutor()
-
newRefinedScan
protected abstract ThisT newRefinedScan(Table newTable, Schema newSchema, org.apache.iceberg.TableScanContext newContext)
-
option
public ThisT option(java.lang.String property, java.lang.String value)
Description copied from interface:Scan
Create a new scan from this scan's configuration that will override theTable
's behavior based on the incoming pair. Unknown properties will be ignored.- Specified by:
option
in interfaceScan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
- Parameters:
property
- name of the table property to be overriddenvalue
- value to override with- Returns:
- a new scan based on this with overridden behavior
-
project
public ThisT project(Schema projectedSchema)
Description copied from interface:Scan
Create a new scan from this with the schema as its projection.- Specified by:
project
in interfaceScan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
- Parameters:
projectedSchema
- a projection schema- Returns:
- a new scan based on this with the given projection
-
caseSensitive
public ThisT caseSensitive(boolean caseSensitive)
Description copied from interface:Scan
Create a new scan from this that, if data columns where selected viaScan.select(java.util.Collection)
, controls whether the match to the schema will be done with case sensitivity. Default is true.- Specified by:
caseSensitive
in interfaceScan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
- Returns:
- a new scan based on this with case sensitivity as stated
-
isCaseSensitive
public boolean isCaseSensitive()
Description copied from interface:Scan
Returns whether this scan is case-sensitive with respect to column names.- Specified by:
isCaseSensitive
in interfaceScan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
- Returns:
- true if case-sensitive, false otherwise.
-
includeColumnStats
public ThisT includeColumnStats()
Description copied from interface:Scan
Create a new scan from this that loads the column stats with each data file.Column stats include: value count, null value count, lower bounds, and upper bounds.
- Specified by:
includeColumnStats
in interfaceScan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
- Returns:
- a new scan based on this that loads column stats.
-
select
public ThisT select(java.util.Collection<java.lang.String> columns)
Description copied from interface:Scan
Create a new scan from this that will read the given data columns. This produces an expected schema that includes all fields that are either selected or used by this scan's filter expression.- Specified by:
select
in interfaceScan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
- Parameters:
columns
- column names from the table's schema- Returns:
- a new scan based on this with the given projection columns
-
filter
public ThisT filter(Expression expr)
Description copied from interface:Scan
Create a new scan from the results of this filtered by theExpression
.- Specified by:
filter
in interfaceScan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
- Parameters:
expr
- a filter expression- Returns:
- a new scan based on this with results filtered by the expression
-
filter
public Expression filter()
Description copied from interface:Scan
Returns this scan's filterExpression
.- Specified by:
filter
in interfaceScan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
- Returns:
- this scan's filter expression
-
ignoreResiduals
public ThisT ignoreResiduals()
Description copied from interface:Scan
Create a new scan from this that applies data filtering to files but not to rows in those files.- Specified by:
ignoreResiduals
in interfaceScan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
- Returns:
- a new scan based on this that does not filter rows in files.
-
planWith
public ThisT planWith(java.util.concurrent.ExecutorService executorService)
Description copied from interface:Scan
Create a new scan to use a particular executor to plan. The default worker pool will be used by default.- Specified by:
planWith
in interfaceScan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
- Parameters:
executorService
- the provided executor- Returns:
- a table scan that uses the provided executor to access manifests
-
schema
public Schema schema()
Description copied from interface:Scan
Returns this scan's projectionSchema
.If the projection schema was set directly using
Scan.project(Schema)
, returns that schema.If the projection schema was set by calling
Scan.select(Collection)
, returns a projection schema that includes the selected data fields and any fields used in the filter expression.- Specified by:
schema
in interfaceScan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
- Returns:
- this scan's projection schema
-
targetSplitSize
public long targetSplitSize()
Description copied from interface:Scan
Returns the target split size for this scan.- Specified by:
targetSplitSize
in interfaceScan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
-
splitLookback
public int splitLookback()
Description copied from interface:Scan
Returns the split lookback for this scan.- Specified by:
splitLookback
in interfaceScan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
-
splitOpenFileCost
public long splitOpenFileCost()
Description copied from interface:Scan
Returns the split open file cost for this scan.- Specified by:
splitOpenFileCost
in interfaceScan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
-
metricsReporter
public ThisT metricsReporter(MetricsReporter reporter)
Description copied from interface:Scan
Create a new scan that will report scan metrics to the provided reporter in addition to reporters maintained by the scan.- Specified by:
metricsReporter
in interfaceScan<ThisT,T extends ScanTask,G extends ScanTaskGroup<T>>
-
-