Class BaseExpireSnapshotsSparkAction
- java.lang.Object
-
- org.apache.iceberg.spark.actions.BaseExpireSnapshotsSparkAction
-
- All Implemented Interfaces:
Action<ExpireSnapshots,ExpireSnapshots.Result>
,ExpireSnapshots
public class BaseExpireSnapshotsSparkAction extends java.lang.Object implements ExpireSnapshots
An action that performs the same operation asExpireSnapshots
but uses Spark to determine the delta in files between the pre and post-expiration table metadata. All of the same restrictions ofExpireSnapshots
also apply to this action.This action first leverages
ExpireSnapshots
to expire snapshots and then uses metadata tables to find files that can be safely deleted. This is done by anti-joining two Datasets that contain all manifest and data files before and after the expiration. The snapshot expiration will be fully committed before any deletes are issued.This operation performs a shuffle so the parallelism can be controlled through 'spark.sql.shuffle.partitions'.
Deletes are still performed locally after retrieving the results from the Spark executors.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface org.apache.iceberg.actions.ExpireSnapshots
ExpireSnapshots.Result
-
-
Constructor Summary
Constructors Constructor Description BaseExpireSnapshotsSparkAction(org.apache.spark.sql.SparkSession spark, Table table)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>
buildManifestFileDF(Table table)
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>
buildManifestListDF(Table table)
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>
buildOtherMetadataFileDF(Table table)
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>
buildValidDataFileDF(Table table)
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>
buildValidMetadataFileDF(Table table)
BaseExpireSnapshotsSparkAction
deleteWith(java.util.function.Consumer<java.lang.String> newDeleteFunc)
Passes an alternative delete implementation that will be used for manifests and data files.ExpireSnapshots.Result
execute()
Executes this action.BaseExpireSnapshotsSparkAction
executeDeleteWith(java.util.concurrent.ExecutorService executorService)
Passes an alternative executor service that will be used for manifests and data files deletion.org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>
expire()
Expires snapshots and commits the changes to the table, returning a Dataset of files to delete.BaseExpireSnapshotsSparkAction
expireOlderThan(long timestampMillis)
Expires all snapshots older than the given timestamp.BaseExpireSnapshotsSparkAction
expireSnapshotId(long snapshotId)
Expires a specificSnapshot
identified by id.protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>
loadMetadataTable(Table table, MetadataTableType type)
protected JobGroupInfo
newJobGroupInfo(java.lang.String groupId, java.lang.String desc)
protected Table
newStaticTable(TableMetadata metadata, FileIO io)
ThisT
option(java.lang.String name, java.lang.String value)
Configures this action with an extra option.protected java.util.Map<java.lang.String,java.lang.String>
options()
ThisT
options(java.util.Map<java.lang.String,java.lang.String> newOptions)
Configures this action with extra options.BaseExpireSnapshotsSparkAction
retainLast(int numSnapshots)
Retains the most recent ancestors of the current snapshot.protected ExpireSnapshots
self()
protected org.apache.spark.sql.SparkSession
spark()
protected org.apache.spark.api.java.JavaSparkContext
sparkContext()
protected <T> T
withJobGroupInfo(JobGroupInfo info, java.util.function.Supplier<T> supplier)
-
-
-
Constructor Detail
-
BaseExpireSnapshotsSparkAction
public BaseExpireSnapshotsSparkAction(org.apache.spark.sql.SparkSession spark, Table table)
-
-
Method Detail
-
self
protected ExpireSnapshots self()
-
executeDeleteWith
public BaseExpireSnapshotsSparkAction executeDeleteWith(java.util.concurrent.ExecutorService executorService)
Description copied from interface:ExpireSnapshots
Passes an alternative executor service that will be used for manifests and data files deletion.If this method is not called, unnecessary manifests and data files will still be deleted in the current thread.
Identical to
ExpireSnapshots.executeDeleteWith(ExecutorService)
- Specified by:
executeDeleteWith
in interfaceExpireSnapshots
- Parameters:
executorService
- the service to use- Returns:
- this for method chaining
-
expireSnapshotId
public BaseExpireSnapshotsSparkAction expireSnapshotId(long snapshotId)
Description copied from interface:ExpireSnapshots
Expires a specificSnapshot
identified by id.Identical to
ExpireSnapshots.expireSnapshotId(long)
- Specified by:
expireSnapshotId
in interfaceExpireSnapshots
- Parameters:
snapshotId
- id of the snapshot to expire- Returns:
- this for method chaining
-
expireOlderThan
public BaseExpireSnapshotsSparkAction expireOlderThan(long timestampMillis)
Description copied from interface:ExpireSnapshots
Expires all snapshots older than the given timestamp.Identical to
ExpireSnapshots.expireOlderThan(long)
- Specified by:
expireOlderThan
in interfaceExpireSnapshots
- Parameters:
timestampMillis
- a long timestamp, as returned bySystem.currentTimeMillis()
- Returns:
- this for method chaining
-
retainLast
public BaseExpireSnapshotsSparkAction retainLast(int numSnapshots)
Description copied from interface:ExpireSnapshots
Retains the most recent ancestors of the current snapshot.If a snapshot would be expired because it is older than the expiration timestamp, but is one of the
numSnapshots
most recent ancestors of the current state, it will be retained. This will not cause snapshots explicitly identified by id from expiring.Identical to
ExpireSnapshots.retainLast(int)
- Specified by:
retainLast
in interfaceExpireSnapshots
- Parameters:
numSnapshots
- the number of snapshots to retain- Returns:
- this for method chaining
-
deleteWith
public BaseExpireSnapshotsSparkAction deleteWith(java.util.function.Consumer<java.lang.String> newDeleteFunc)
Description copied from interface:ExpireSnapshots
Passes an alternative delete implementation that will be used for manifests and data files.Manifest files that are no longer used by valid snapshots will be deleted. Data files that were deleted by snapshots that are expired will be deleted.
If this method is not called, unnecessary manifests and data files will still be deleted.
Identical to
ExpireSnapshots.deleteWith(Consumer)
- Specified by:
deleteWith
in interfaceExpireSnapshots
- Parameters:
newDeleteFunc
- a function that will be called to delete manifests and data files- Returns:
- this for method chaining
-
expire
public org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> expire()
Expires snapshots and commits the changes to the table, returning a Dataset of files to delete.This does not delete data files. To delete data files, run
execute()
.This may be called before or after
execute()
is called to return the expired file list.- Returns:
- a Dataset of files that are no longer referenced by the table
-
execute
public ExpireSnapshots.Result execute()
Description copied from interface:Action
Executes this action.- Specified by:
execute
in interfaceAction<ExpireSnapshots,ExpireSnapshots.Result>
- Returns:
- the result of this action
-
spark
protected org.apache.spark.sql.SparkSession spark()
-
sparkContext
protected org.apache.spark.api.java.JavaSparkContext sparkContext()
-
option
public ThisT option(java.lang.String name, java.lang.String value)
Description copied from interface:Action
Configures this action with an extra option.Certain actions allow users to control internal details of their execution via options.
-
options
public ThisT options(java.util.Map<java.lang.String,java.lang.String> newOptions)
Description copied from interface:Action
Configures this action with extra options.Certain actions allow users to control internal details of their execution via options.
-
options
protected java.util.Map<java.lang.String,java.lang.String> options()
-
withJobGroupInfo
protected <T> T withJobGroupInfo(JobGroupInfo info, java.util.function.Supplier<T> supplier)
-
newJobGroupInfo
protected JobGroupInfo newJobGroupInfo(java.lang.String groupId, java.lang.String desc)
-
newStaticTable
protected Table newStaticTable(TableMetadata metadata, FileIO io)
-
buildValidDataFileDF
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildValidDataFileDF(Table table)
-
buildManifestFileDF
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildManifestFileDF(Table table)
-
buildManifestListDF
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildManifestListDF(Table table)
-
buildOtherMetadataFileDF
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildOtherMetadataFileDF(Table table)
-
buildValidMetadataFileDF
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildValidMetadataFileDF(Table table)
-
loadMetadataTable
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(Table table, MetadataTableType type)
-
-