RemoveOrphanFilesAction

java.lang.Object
- org.apache.iceberg.actions.RemoveOrphanFilesAction

All Implemented Interfaces:

Action<java.util.List<java.lang.String>>
```
public class RemoveOrphanFilesAction
extends java.lang.Object
```
An action that removes orphan metadata and data files by listing a given location and comparing the actual files in that location with data and metadata files referenced by all valid snapshots. The location must be accessible for listing via the Hadoop FileSystem.
By default, this action cleans up the table location returned by Table.location() and removes unreachable files that are older than 3 days using Table.io(). The behavior can be modified by passing a custom location to location and a custom timestamp to olderThan(long). For example, someone might point this action to the data folder to clean up only orphan data files. In addition, there is a way to configure an alternative delete method via deleteWith(Consumer).
Note: It is dangerous to call this action with a short retention interval as it might corrupt the state of the table if another operation is writing at the same time.

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`buildManifestFileDF(org.apache.spark.sql.SparkSession spark, java.lang.String tableName)`
`protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`buildManifestListDF(org.apache.spark.sql.SparkSession spark, java.lang.String metadataFileLocation)`
`protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`buildManifestListDF(org.apache.spark.sql.SparkSession spark, Table table)`
`protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`buildOtherMetadataFileDF(org.apache.spark.sql.SparkSession spark, TableOperations ops)`
`protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`buildValidDataFileDF(org.apache.spark.sql.SparkSession spark)`
`protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`buildValidDataFileDF(org.apache.spark.sql.SparkSession spark, java.lang.String tableName)`
`protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`buildValidMetadataFileDF(org.apache.spark.sql.SparkSession spark, Table table, TableOperations ops)`
`RemoveOrphanFilesAction`	`deleteWith(java.util.function.Consumer<java.lang.String> newDeleteFunc)` Passes an alternative delete implementation that will be used to delete orphan files.
`java.util.List<java.lang.String>`	`execute()` Executes this action.
`protected static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`loadMetadataTable(org.apache.spark.sql.SparkSession spark, java.lang.String tableName, java.lang.String tableLocation, MetadataTableType type)`
`RemoveOrphanFilesAction`	`location(java.lang.String newLocation)` Removes orphan files in the given location.
`RemoveOrphanFilesAction`	`olderThan(long newOlderThanTimestamp)` Removes orphan files that are older than the given timestamp.
`protected Table`	`table()`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Method Detail

table
```
protected Table table()
```

location
```
public RemoveOrphanFilesAction location(java.lang.String newLocation)
```
Removes orphan files in the given location.

Parameters:

newLocation - a location

Returns:

this for method chaining

olderThan
```
public RemoveOrphanFilesAction olderThan(long newOlderThanTimestamp)
```
Removes orphan files that are older than the given timestamp.

Parameters:

newOlderThanTimestamp - a timestamp in milliseconds

Returns:

this for method chaining

deleteWith
```
public RemoveOrphanFilesAction deleteWith(java.util.function.Consumer<java.lang.String> newDeleteFunc)
```
Passes an alternative delete implementation that will be used to delete orphan files.

Parameters:

newDeleteFunc - a delete func

Returns:

this for method chaining

execute
```
public java.util.List<java.lang.String> execute()
```
Description copied from interface: Action

Executes this action.

Returns:

the result of this action

buildValidDataFileDF

protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildValidDataFileDF(org.apache.spark.sql.SparkSession spark)

buildValidDataFileDF

protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildValidDataFileDF(org.apache.spark.sql.SparkSession spark,
                                                                                      java.lang.String tableName)

buildManifestFileDF

protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildManifestFileDF(org.apache.spark.sql.SparkSession spark,
                                                                                     java.lang.String tableName)

buildManifestListDF

protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildManifestListDF(org.apache.spark.sql.SparkSession spark,
                                                                                     Table table)

buildManifestListDF

protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildManifestListDF(org.apache.spark.sql.SparkSession spark,
                                                                                     java.lang.String metadataFileLocation)

buildOtherMetadataFileDF

protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildOtherMetadataFileDF(org.apache.spark.sql.SparkSession spark,
                                                                                          TableOperations ops)

buildValidMetadataFileDF

protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> buildValidMetadataFileDF(org.apache.spark.sql.SparkSession spark,
                                                                                          Table table,
                                                                                          TableOperations ops)

loadMetadataTable

protected static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(org.apache.spark.sql.SparkSession spark,
                                                                                          java.lang.String tableName,
                                                                                          java.lang.String tableLocation,
                                                                                          MetadataTableType type)

Class RemoveOrphanFilesAction

Method Summary

Methods inherited from class java.lang.Object

Method Detail

table

location

olderThan

deleteWith

execute

buildValidDataFileDF

buildValidDataFileDF

buildManifestFileDF

buildManifestListDF

buildManifestListDF

buildOtherMetadataFileDF

buildValidMetadataFileDF

loadMetadataTable