java.lang.Object
- org.apache.iceberg.spark.procedures.RemoveOrphanFilesProcedure

All Implemented Interfaces:

Procedure
```
public class RemoveOrphanFilesProcedure
extends java.lang.Object
```
A procedure that removes orphan files in a table.

See Also:

SparkActions.deleteOrphanFiles(Table)

Field Summary

Fields
Modifier and Type Field Description

protected static org.apache.spark.sql.types.DataType STRING_ARRAY

protected static org.apache.spark.sql.types.DataType STRING_MAP

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`protected SparkActions`	`actions()`
`static SparkProcedures.ProcedureBuilder`	`builder()`
`org.apache.spark.sql.catalyst.InternalRow[]`	`call(org.apache.spark.sql.catalyst.InternalRow args)`	Executes this procedure.
`protected void`	`closeService()`	Closes this procedure's executor service if a new one was created with `BaseProcedure.executorService(int, String)`.
`java.lang.String`	`description()`	Returns the description of this procedure.
`protected java.util.concurrent.ExecutorService`	`executorService(int threadPoolSize, java.lang.String nameFormat)`	Starts a new executor service which can be used by this procedure in its work.
`protected Expression`	`filterExpression(org.apache.spark.sql.connector.catalog.Identifier ident, java.lang.String where)`
`protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`loadRows(org.apache.spark.sql.connector.catalog.Identifier tableIdent, java.util.Map<java.lang.String,java.lang.String> options)`
`protected SparkTable`	`loadSparkTable(org.apache.spark.sql.connector.catalog.Identifier ident)`
`protected <T> T`	`modifyIcebergTable(org.apache.spark.sql.connector.catalog.Identifier ident, java.util.function.Function<Table,T> func)`
`protected org.apache.spark.sql.catalyst.InternalRow`	`newInternalRow(java.lang.Object... values)`
`org.apache.spark.sql.types.StructType`	`outputType()`	Returns the type of rows produced by this procedure.
`ProcedureParameter[]`	`parameters()`	Returns the input parameters of this procedure.
`protected void`	`refreshSparkCache(org.apache.spark.sql.connector.catalog.Identifier ident, org.apache.spark.sql.connector.catalog.Table table)`
`protected org.apache.spark.sql.SparkSession`	`spark()`
`protected org.apache.spark.sql.connector.catalog.TableCatalog`	`tableCatalog()`
`protected Spark3Util.CatalogAndIdentifier`	`toCatalogAndIdentifier(java.lang.String identifierAsString, java.lang.String argName, org.apache.spark.sql.connector.catalog.CatalogPlugin catalog)`
`protected org.apache.spark.sql.connector.catalog.Identifier`	`toIdentifier(java.lang.String identifierAsString, java.lang.String argName)`
`protected <T> T`	`withIcebergTable(org.apache.spark.sql.connector.catalog.Identifier ident, java.util.function.Function<Table,T> func)`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

STRING_MAP

protected static final org.apache.spark.sql.types.DataType STRING_MAP

STRING_ARRAY

protected static final org.apache.spark.sql.types.DataType STRING_ARRAY

Method Detail

builder

public static SparkProcedures.ProcedureBuilder builder()

parameters
```
public ProcedureParameter[] parameters()
```
Description copied from interface: Procedure

Returns the input parameters of this procedure.

outputType
```
public org.apache.spark.sql.types.StructType outputType()
```
Description copied from interface: Procedure

Returns the type of rows produced by this procedure.

call
```
public org.apache.spark.sql.catalyst.InternalRow[] call(org.apache.spark.sql.catalyst.InternalRow args)
```
Description copied from interface: Procedure

Executes this procedure.
Spark will align the provided arguments according to the input parameters defined in Procedure.parameters() either by position or by name before execution.
Implementations may provide a summary of execution by returning one or many rows as a result. The schema of output rows must match the defined output type in Procedure.outputType().

Parameters:

args - input arguments

Returns:

the result of executing this procedure with the given arguments

description
```
public java.lang.String description()
```
Description copied from interface: Procedure

Returns the description of this procedure.

spark

protected org.apache.spark.sql.SparkSession spark()

actions
```
protected SparkActions actions()
```

tableCatalog

protected org.apache.spark.sql.connector.catalog.TableCatalog tableCatalog()

modifyIcebergTable

protected <T> T modifyIcebergTable(org.apache.spark.sql.connector.catalog.Identifier ident,
                                   java.util.function.Function<Table,T> func)

withIcebergTable

protected <T> T withIcebergTable(org.apache.spark.sql.connector.catalog.Identifier ident,
                                 java.util.function.Function<Table,T> func)

toIdentifier

protected org.apache.spark.sql.connector.catalog.Identifier toIdentifier(java.lang.String identifierAsString,
                                                                         java.lang.String argName)

toCatalogAndIdentifier

protected Spark3Util.CatalogAndIdentifier toCatalogAndIdentifier(java.lang.String identifierAsString,
                                                                 java.lang.String argName,
                                                                 org.apache.spark.sql.connector.catalog.CatalogPlugin catalog)

loadSparkTable

protected SparkTable loadSparkTable(org.apache.spark.sql.connector.catalog.Identifier ident)

loadRows

protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadRows(org.apache.spark.sql.connector.catalog.Identifier tableIdent,
                                                                          java.util.Map<java.lang.String,java.lang.String> options)

refreshSparkCache

protected void refreshSparkCache(org.apache.spark.sql.connector.catalog.Identifier ident,
                                 org.apache.spark.sql.connector.catalog.Table table)

filterExpression

protected Expression filterExpression(org.apache.spark.sql.connector.catalog.Identifier ident,
                                      java.lang.String where)

newInternalRow

protected org.apache.spark.sql.catalyst.InternalRow newInternalRow(java.lang.Object... values)

closeService
```
protected void closeService()
```
Closes this procedure's executor service if a new one was created with BaseProcedure.executorService(int, String). Does not block for any remaining tasks.

executorService
```
protected java.util.concurrent.ExecutorService executorService(int threadPoolSize,
                                                               java.lang.String nameFormat)
```
Starts a new executor service which can be used by this procedure in its work. The pool will be automatically shut down if withIcebergTable(Identifier, Function) or modifyIcebergTable(Identifier, Function) are called. If these methods are not used then the service can be shut down with closeService() or left to be closed when this class is finalized.

Parameters:

threadPoolSize - number of threads in the service

nameFormat - name prefix for threads created in this service

Returns:

the new executor service owned by this procedure

Modifier and Type	Field	Description
`protected static org.apache.spark.sql.types.DataType`	`STRING_ARRAY`
`protected static org.apache.spark.sql.types.DataType`	`STRING_MAP`

Class RemoveOrphanFilesProcedure

Field Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

STRING_MAP

STRING_ARRAY

Method Detail

builder

parameters

outputType

call

description

spark

actions

tableCatalog

modifyIcebergTable

withIcebergTable

toIdentifier

toCatalogAndIdentifier

loadSparkTable

loadRows

refreshSparkCache

filterExpression

newInternalRow

closeService

executorService