Class RemoveOrphanFilesProcedure
- java.lang.Object
-
- org.apache.iceberg.spark.procedures.RemoveOrphanFilesProcedure
-
- All Implemented Interfaces:
Procedure
public class RemoveOrphanFilesProcedure extends java.lang.Object
A procedure that removes orphan files in a table.- See Also:
SparkActions.deleteOrphanFiles(Table)
-
-
Field Summary
Fields Modifier and Type Field Description protected static org.apache.spark.sql.types.DataType
STRING_ARRAY
protected static org.apache.spark.sql.types.DataType
STRING_MAP
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected SparkActions
actions()
static SparkProcedures.ProcedureBuilder
builder()
org.apache.spark.sql.catalyst.InternalRow[]
call(org.apache.spark.sql.catalyst.InternalRow args)
Executes this procedure.protected void
closeService()
Closes this procedure's executor service if a new one was created withexecutorService(int, String)
.java.lang.String
description()
Returns the description of this procedure.protected java.util.concurrent.ExecutorService
executorService(int threadPoolSize, java.lang.String nameFormat)
Starts a new executor service which can be used by this procedure in its work.protected Expression
filterExpression(org.apache.spark.sql.connector.catalog.Identifier ident, java.lang.String where)
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>
loadRows(org.apache.spark.sql.connector.catalog.Identifier tableIdent, java.util.Map<java.lang.String,java.lang.String> options)
protected SparkTable
loadSparkTable(org.apache.spark.sql.connector.catalog.Identifier ident)
protected <T> T
modifyIcebergTable(org.apache.spark.sql.connector.catalog.Identifier ident, java.util.function.Function<Table,T> func)
protected org.apache.spark.sql.catalyst.InternalRow
newInternalRow(java.lang.Object... values)
org.apache.spark.sql.types.StructType
outputType()
Returns the type of rows produced by this procedure.ProcedureParameter[]
parameters()
Returns the input parameters of this procedure.protected void
refreshSparkCache(org.apache.spark.sql.connector.catalog.Identifier ident, org.apache.spark.sql.connector.catalog.Table table)
protected org.apache.spark.sql.SparkSession
spark()
protected org.apache.spark.sql.connector.catalog.TableCatalog
tableCatalog()
protected Spark3Util.CatalogAndIdentifier
toCatalogAndIdentifier(java.lang.String identifierAsString, java.lang.String argName, org.apache.spark.sql.connector.catalog.CatalogPlugin catalog)
protected org.apache.spark.sql.connector.catalog.Identifier
toIdentifier(java.lang.String identifierAsString, java.lang.String argName)
protected <T> T
withIcebergTable(org.apache.spark.sql.connector.catalog.Identifier ident, java.util.function.Function<Table,T> func)
-
-
-
Method Detail
-
builder
public static SparkProcedures.ProcedureBuilder builder()
-
parameters
public ProcedureParameter[] parameters()
Description copied from interface:Procedure
Returns the input parameters of this procedure.
-
outputType
public org.apache.spark.sql.types.StructType outputType()
Description copied from interface:Procedure
Returns the type of rows produced by this procedure.
-
call
public org.apache.spark.sql.catalyst.InternalRow[] call(org.apache.spark.sql.catalyst.InternalRow args)
Description copied from interface:Procedure
Executes this procedure.Spark will align the provided arguments according to the input parameters defined in
Procedure.parameters()
either by position or by name before execution.Implementations may provide a summary of execution by returning one or many rows as a result. The schema of output rows must match the defined output type in
Procedure.outputType()
.- Parameters:
args
- input arguments- Returns:
- the result of executing this procedure with the given arguments
-
description
public java.lang.String description()
Description copied from interface:Procedure
Returns the description of this procedure.
-
spark
protected org.apache.spark.sql.SparkSession spark()
-
actions
protected SparkActions actions()
-
tableCatalog
protected org.apache.spark.sql.connector.catalog.TableCatalog tableCatalog()
-
modifyIcebergTable
protected <T> T modifyIcebergTable(org.apache.spark.sql.connector.catalog.Identifier ident, java.util.function.Function<Table,T> func)
-
withIcebergTable
protected <T> T withIcebergTable(org.apache.spark.sql.connector.catalog.Identifier ident, java.util.function.Function<Table,T> func)
-
toIdentifier
protected org.apache.spark.sql.connector.catalog.Identifier toIdentifier(java.lang.String identifierAsString, java.lang.String argName)
-
toCatalogAndIdentifier
protected Spark3Util.CatalogAndIdentifier toCatalogAndIdentifier(java.lang.String identifierAsString, java.lang.String argName, org.apache.spark.sql.connector.catalog.CatalogPlugin catalog)
-
loadSparkTable
protected SparkTable loadSparkTable(org.apache.spark.sql.connector.catalog.Identifier ident)
-
loadRows
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadRows(org.apache.spark.sql.connector.catalog.Identifier tableIdent, java.util.Map<java.lang.String,java.lang.String> options)
-
refreshSparkCache
protected void refreshSparkCache(org.apache.spark.sql.connector.catalog.Identifier ident, org.apache.spark.sql.connector.catalog.Table table)
-
filterExpression
protected Expression filterExpression(org.apache.spark.sql.connector.catalog.Identifier ident, java.lang.String where)
-
newInternalRow
protected org.apache.spark.sql.catalyst.InternalRow newInternalRow(java.lang.Object... values)
-
closeService
protected void closeService()
Closes this procedure's executor service if a new one was created withexecutorService(int, String)
. Does not block for any remaining tasks.
-
executorService
protected java.util.concurrent.ExecutorService executorService(int threadPoolSize, java.lang.String nameFormat)
Starts a new executor service which can be used by this procedure in its work. The pool will be automatically shut down ifwithIcebergTable(Identifier, Function)
ormodifyIcebergTable(Identifier, Function)
are called. If these methods are not used then the service can be shut down withcloseService()
or left to be closed when this class is finalized.- Parameters:
threadPoolSize
- number of threads in the servicenameFormat
- name prefix for threads created in this service- Returns:
- the new executor service owned by this procedure
-
-