Class ExpireSnapshotsSparkAction
- java.lang.Object
- 
- org.apache.iceberg.spark.actions.ExpireSnapshotsSparkAction
 
- 
- All Implemented Interfaces:
- Action<ExpireSnapshots,ExpireSnapshots.Result>,- ExpireSnapshots
 
 public class ExpireSnapshotsSparkAction extends java.lang.Object implements ExpireSnapshots An action that performs the same operation asExpireSnapshotsbut uses Spark to determine the delta in files between the pre and post-expiration table metadata. All of the same restrictions ofExpireSnapshotsalso apply to this action.This action first leverages ExpireSnapshotsto expire snapshots and then uses metadata tables to find files that can be safely deleted. This is done by anti-joining two Datasets that contain all manifest and content files before and after the expiration. The snapshot expiration will be fully committed before any deletes are issued.This operation performs a shuffle so the parallelism can be controlled through 'spark.sql.shuffle.partitions'. Deletes are still performed locally after retrieving the results from the Spark executors. 
- 
- 
Nested Class Summary- 
Nested classes/interfaces inherited from interface org.apache.iceberg.actions.ExpireSnapshotsExpireSnapshots.Result
 
- 
 - 
Field SummaryFields Modifier and Type Field Description protected static org.apache.iceberg.relocated.com.google.common.base.JoinerCOMMA_JOINERprotected static org.apache.iceberg.relocated.com.google.common.base.SplitterCOMMA_SPLITTERprotected static java.lang.StringFILE_PATHprotected static java.lang.StringLAST_MODIFIEDprotected static java.lang.StringMANIFESTprotected static java.lang.StringMANIFEST_LISTprotected static java.lang.StringOTHERSprotected static java.lang.StringSTATISTICS_FILESstatic java.lang.StringSTREAM_RESULTSstatic booleanSTREAM_RESULTS_DEFAULT
 - 
Method SummaryAll Methods Instance Methods Concrete Methods Modifier and Type Method Description protected org.apache.spark.sql.Dataset<FileInfo>allReachableOtherMetadataFileDS(Table table)protected org.apache.spark.sql.Dataset<FileInfo>contentFileDS(Table table)protected org.apache.spark.sql.Dataset<FileInfo>contentFileDS(Table table, java.util.Set<java.lang.Long> snapshotIds)protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummarydeleteFiles(java.util.concurrent.ExecutorService executorService, java.util.function.Consumer<java.lang.String> deleteFunc, java.util.Iterator<FileInfo> files)Deletes files and keeps track of how many files were removed for each file type.protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummarydeleteFiles(SupportsBulkOperations io, java.util.Iterator<FileInfo> files)ExpireSnapshotsSparkActiondeleteWith(java.util.function.Consumer<java.lang.String> newDeleteFunc)Passes an alternative delete implementation that will be used for manifests, data and delete files.ExpireSnapshots.Resultexecute()Executes this action.ExpireSnapshotsSparkActionexecuteDeleteWith(java.util.concurrent.ExecutorService executorService)Passes an alternative executor service that will be used for files removal.org.apache.spark.sql.Dataset<FileInfo>expireFiles()Expires snapshots and commits the changes to the table, returning a Dataset of files to delete.ExpireSnapshotsSparkActionexpireOlderThan(long timestampMillis)Expires all snapshots older than the given timestamp.ExpireSnapshotsSparkActionexpireSnapshotId(long snapshotId)Expires a specificSnapshotidentified by id.protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>loadMetadataTable(Table table, MetadataTableType type)protected org.apache.spark.sql.Dataset<FileInfo>manifestDS(Table table)protected org.apache.spark.sql.Dataset<FileInfo>manifestDS(Table table, java.util.Set<java.lang.Long> snapshotIds)protected org.apache.spark.sql.Dataset<FileInfo>manifestListDS(Table table)protected org.apache.spark.sql.Dataset<FileInfo>manifestListDS(Table table, java.util.Set<java.lang.Long> snapshotIds)protected JobGroupInfonewJobGroupInfo(java.lang.String groupId, java.lang.String desc)protected TablenewStaticTable(TableMetadata metadata, FileIO io)ThisToption(java.lang.String name, java.lang.String value)protected java.util.Map<java.lang.String,java.lang.String>options()ThisToptions(java.util.Map<java.lang.String,java.lang.String> newOptions)protected org.apache.spark.sql.Dataset<FileInfo>otherMetadataFileDS(Table table)ExpireSnapshotsSparkActionretainLast(int numSnapshots)Retains the most recent ancestors of the current snapshot.protected ExpireSnapshotsSparkActionself()protected org.apache.spark.sql.SparkSessionspark()protected org.apache.spark.api.java.JavaSparkContextsparkContext()protected org.apache.spark.sql.Dataset<FileInfo>statisticsFileDS(Table table, java.util.Set<java.lang.Long> snapshotIds)protected <T> TwithJobGroupInfo(JobGroupInfo info, java.util.function.Supplier<T> supplier)
 
- 
- 
- 
Field Detail- 
STREAM_RESULTSpublic static final java.lang.String STREAM_RESULTS - See Also:
- Constant Field Values
 
 - 
STREAM_RESULTS_DEFAULTpublic static final boolean STREAM_RESULTS_DEFAULT - See Also:
- Constant Field Values
 
 - 
MANIFESTprotected static final java.lang.String MANIFEST - See Also:
- Constant Field Values
 
 - 
MANIFEST_LISTprotected static final java.lang.String MANIFEST_LIST - See Also:
- Constant Field Values
 
 - 
STATISTICS_FILESprotected static final java.lang.String STATISTICS_FILES - See Also:
- Constant Field Values
 
 - 
OTHERSprotected static final java.lang.String OTHERS - See Also:
- Constant Field Values
 
 - 
FILE_PATHprotected static final java.lang.String FILE_PATH - See Also:
- Constant Field Values
 
 - 
LAST_MODIFIEDprotected static final java.lang.String LAST_MODIFIED - See Also:
- Constant Field Values
 
 - 
COMMA_SPLITTERprotected static final org.apache.iceberg.relocated.com.google.common.base.Splitter COMMA_SPLITTER 
 - 
COMMA_JOINERprotected static final org.apache.iceberg.relocated.com.google.common.base.Joiner COMMA_JOINER 
 
- 
 - 
Method Detail- 
selfprotected ExpireSnapshotsSparkAction self() 
 - 
executeDeleteWithpublic ExpireSnapshotsSparkAction executeDeleteWith(java.util.concurrent.ExecutorService executorService) Description copied from interface:ExpireSnapshotsPasses an alternative executor service that will be used for files removal. This service will only be used if a custom delete function is provided byExpireSnapshots.deleteWith(Consumer)or if the FileIO does notsupport bulk deletes. Otherwise, parallelism should be controlled by the IO specificdeleteFilesmethod.If this method is not called and bulk deletes are not supported, unnecessary manifests and content files will still be deleted in the current thread. Identical to ExpireSnapshots.executeDeleteWith(ExecutorService)- Specified by:
- executeDeleteWithin interface- ExpireSnapshots
- Parameters:
- executorService- the service to use
- Returns:
- this for method chaining
 
 - 
expireSnapshotIdpublic ExpireSnapshotsSparkAction expireSnapshotId(long snapshotId) Description copied from interface:ExpireSnapshotsExpires a specificSnapshotidentified by id.Identical to ExpireSnapshots.expireSnapshotId(long)- Specified by:
- expireSnapshotIdin interface- ExpireSnapshots
- Parameters:
- snapshotId- id of the snapshot to expire
- Returns:
- this for method chaining
 
 - 
expireOlderThanpublic ExpireSnapshotsSparkAction expireOlderThan(long timestampMillis) Description copied from interface:ExpireSnapshotsExpires all snapshots older than the given timestamp.Identical to ExpireSnapshots.expireOlderThan(long)- Specified by:
- expireOlderThanin interface- ExpireSnapshots
- Parameters:
- timestampMillis- a long timestamp, as returned by- System.currentTimeMillis()
- Returns:
- this for method chaining
 
 - 
retainLastpublic ExpireSnapshotsSparkAction retainLast(int numSnapshots) Description copied from interface:ExpireSnapshotsRetains the most recent ancestors of the current snapshot.If a snapshot would be expired because it is older than the expiration timestamp, but is one of the numSnapshotsmost recent ancestors of the current state, it will be retained. This will not cause snapshots explicitly identified by id from expiring.Identical to ExpireSnapshots.retainLast(int)- Specified by:
- retainLastin interface- ExpireSnapshots
- Parameters:
- numSnapshots- the number of snapshots to retain
- Returns:
- this for method chaining
 
 - 
deleteWithpublic ExpireSnapshotsSparkAction deleteWith(java.util.function.Consumer<java.lang.String> newDeleteFunc) Description copied from interface:ExpireSnapshotsPasses an alternative delete implementation that will be used for manifests, data and delete files.Manifest files that are no longer used by valid snapshots will be deleted. Content files that were marked as logically deleted by snapshots that are expired will be deleted as well. If this method is not called, unnecessary manifests and content files will still be deleted. Identical to ExpireSnapshots.deleteWith(Consumer)- Specified by:
- deleteWithin interface- ExpireSnapshots
- Parameters:
- newDeleteFunc- a function that will be called to delete manifests and data files
- Returns:
- this for method chaining
 
 - 
expireFilespublic org.apache.spark.sql.Dataset<FileInfo> expireFiles() Expires snapshots and commits the changes to the table, returning a Dataset of files to delete.This does not delete data files. To delete data files, run execute().This may be called before or after execute()to return the expired files.- Returns:
- a Dataset of files that are no longer referenced by the table
 
 - 
executepublic ExpireSnapshots.Result execute() Description copied from interface:ActionExecutes this action.- Specified by:
- executein interface- Action<ExpireSnapshots,ExpireSnapshots.Result>
- Returns:
- the result of this action
 
 - 
sparkprotected org.apache.spark.sql.SparkSession spark() 
 - 
sparkContextprotected org.apache.spark.api.java.JavaSparkContext sparkContext() 
 - 
optionpublic ThisT option(java.lang.String name, java.lang.String value)
 - 
optionspublic ThisT options(java.util.Map<java.lang.String,java.lang.String> newOptions) 
 - 
optionsprotected java.util.Map<java.lang.String,java.lang.String> options() 
 - 
withJobGroupInfoprotected <T> T withJobGroupInfo(JobGroupInfo info, java.util.function.Supplier<T> supplier) 
 - 
newJobGroupInfoprotected JobGroupInfo newJobGroupInfo(java.lang.String groupId, java.lang.String desc) 
 - 
newStaticTableprotected Table newStaticTable(TableMetadata metadata, FileIO io) 
 - 
contentFileDSprotected org.apache.spark.sql.Dataset<FileInfo> contentFileDS(Table table, java.util.Set<java.lang.Long> snapshotIds) 
 - 
manifestDSprotected org.apache.spark.sql.Dataset<FileInfo> manifestDS(Table table, java.util.Set<java.lang.Long> snapshotIds) 
 - 
manifestListDSprotected org.apache.spark.sql.Dataset<FileInfo> manifestListDS(Table table, java.util.Set<java.lang.Long> snapshotIds) 
 - 
statisticsFileDSprotected org.apache.spark.sql.Dataset<FileInfo> statisticsFileDS(Table table, java.util.Set<java.lang.Long> snapshotIds) 
 - 
otherMetadataFileDSprotected org.apache.spark.sql.Dataset<FileInfo> otherMetadataFileDS(Table table) 
 - 
allReachableOtherMetadataFileDSprotected org.apache.spark.sql.Dataset<FileInfo> allReachableOtherMetadataFileDS(Table table) 
 - 
loadMetadataTableprotected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(Table table, MetadataTableType type) 
 - 
deleteFilesprotected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(java.util.concurrent.ExecutorService executorService, java.util.function.Consumer<java.lang.String> deleteFunc, java.util.Iterator<FileInfo> files)Deletes files and keeps track of how many files were removed for each file type.- Parameters:
- executorService- an executor service to use for parallel deletes
- deleteFunc- a delete func
- files- an iterator of Spark rows of the structure (path: String, type: String)
- Returns:
- stats on which files were deleted
 
 - 
deleteFilesprotected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(SupportsBulkOperations io, java.util.Iterator<FileInfo> files) 
 
- 
 
-