Package org.apache.iceberg.spark.actions
Class DeleteReachableFilesSparkAction
java.lang.Object
org.apache.iceberg.spark.actions.DeleteReachableFilesSparkAction
- All Implemented Interfaces:
Action<DeleteReachableFiles,
,DeleteReachableFiles.Result> DeleteReachableFiles
An implementation of
DeleteReachableFiles
that uses metadata tables in Spark to determine
which files should be deleted.-
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.iceberg.actions.DeleteReachableFiles
DeleteReachableFiles.Result
-
Field Summary
Modifier and TypeFieldDescriptionprotected static final org.apache.iceberg.relocated.com.google.common.base.Joiner
protected static final org.apache.iceberg.relocated.com.google.common.base.Splitter
protected static final String
protected static final String
protected static final String
protected static final String
protected static final String
protected static final String
static final String
static final boolean
-
Method Summary
Modifier and TypeMethodDescriptionprotected org.apache.spark.sql.Dataset
<FileInfo> protected org.apache.spark.sql.Dataset
<FileInfo> contentFileDS
(Table table) protected org.apache.spark.sql.Dataset
<FileInfo> contentFileDS
(Table table, Set<Long> snapshotIds) protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary
deleteFiles
(ExecutorService executorService, Consumer<String> deleteFunc, Iterator<FileInfo> files) Deletes files and keeps track of how many files were removed for each file type.protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary
deleteFiles
(SupportsBulkOperations io, Iterator<FileInfo> files) deleteWith
(Consumer<String> newDeleteFunc) Passes an alternative delete implementation that will be used for files.execute()
Executes this action.executeDeleteWith
(ExecutorService executorService) Passes an alternative executor service that will be used for files removal.Set theFileIO
to be used for files removalprotected org.apache.spark.sql.Dataset
<org.apache.spark.sql.Row> loadMetadataTable
(Table table, MetadataTableType type) protected org.apache.spark.sql.Dataset
<FileInfo> manifestDS
(Table table) protected org.apache.spark.sql.Dataset
<FileInfo> manifestDS
(Table table, Set<Long> snapshotIds) protected org.apache.spark.sql.Dataset
<FileInfo> manifestListDS
(Table table) protected org.apache.spark.sql.Dataset
<FileInfo> manifestListDS
(Table table, Set<Long> snapshotIds) protected JobGroupInfo
newJobGroupInfo
(String groupId, String desc) protected Table
newStaticTable
(TableMetadata metadata, FileIO io) options()
protected org.apache.spark.sql.Dataset
<FileInfo> otherMetadataFileDS
(Table table) protected DeleteReachableFilesSparkAction
self()
protected org.apache.spark.sql.SparkSession
spark()
protected org.apache.spark.api.java.JavaSparkContext
protected org.apache.spark.sql.Dataset
<FileInfo> statisticsFileDS
(Table table, Set<Long> snapshotIds) protected <T> T
withJobGroupInfo
(JobGroupInfo info, Supplier<T> supplier)
-
Field Details
-
STREAM_RESULTS
- See Also:
-
STREAM_RESULTS_DEFAULT
public static final boolean STREAM_RESULTS_DEFAULT- See Also:
-
MANIFEST
- See Also:
-
MANIFEST_LIST
- See Also:
-
STATISTICS_FILES
- See Also:
-
OTHERS
- See Also:
-
FILE_PATH
- See Also:
-
LAST_MODIFIED
- See Also:
-
COMMA_SPLITTER
protected static final org.apache.iceberg.relocated.com.google.common.base.Splitter COMMA_SPLITTER -
COMMA_JOINER
protected static final org.apache.iceberg.relocated.com.google.common.base.Joiner COMMA_JOINER
-
-
Method Details
-
self
-
io
Description copied from interface:DeleteReachableFiles
Set theFileIO
to be used for files removal- Specified by:
io
in interfaceDeleteReachableFiles
- Parameters:
fileIO
- FileIO to use for files removal- Returns:
- this for method chaining
-
deleteWith
Description copied from interface:DeleteReachableFiles
Passes an alternative delete implementation that will be used for files.- Specified by:
deleteWith
in interfaceDeleteReachableFiles
- Parameters:
newDeleteFunc
- a function that will be called to delete files. The function accepts path to file as an argument.- Returns:
- this for method chaining
-
executeDeleteWith
Description copied from interface:DeleteReachableFiles
Passes an alternative executor service that will be used for files removal. This service will only be used if a custom delete function is provided byDeleteReachableFiles.deleteWith(Consumer)
or if the FileIO does notsupport bulk deletes
. Otherwise, parallelism should be controlled by the IO specificdeleteFiles
method.- Specified by:
executeDeleteWith
in interfaceDeleteReachableFiles
- Parameters:
executorService
- the service to use- Returns:
- this for method chaining
-
execute
Description copied from interface:Action
Executes this action.- Specified by:
execute
in interfaceAction<DeleteReachableFiles,
DeleteReachableFiles.Result> - Returns:
- the result of this action
-
spark
protected org.apache.spark.sql.SparkSession spark() -
sparkContext
protected org.apache.spark.api.java.JavaSparkContext sparkContext() -
option
-
options
-
options
-
withJobGroupInfo
-
newJobGroupInfo
-
newStaticTable
-
contentFileDS
-
contentFileDS
-
manifestDS
-
manifestDS
-
manifestListDS
-
manifestListDS
-
statisticsFileDS
-
otherMetadataFileDS
-
allReachableOtherMetadataFileDS
-
loadMetadataTable
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(Table table, MetadataTableType type) -
deleteFiles
protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(ExecutorService executorService, Consumer<String> deleteFunc, Iterator<FileInfo> files) Deletes files and keeps track of how many files were removed for each file type.- Parameters:
executorService
- an executor service to use for parallel deletesdeleteFunc
- a delete funcfiles
- an iterator of Spark rows of the structure (path: String, type: String)- Returns:
- stats on which files were deleted
-
deleteFiles
protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(SupportsBulkOperations io, Iterator<FileInfo> files)
-