public class RemoveOrphanFilesProcedure
extends java.lang.Object
SparkActions.deleteOrphanFiles(Table)
Modifier and Type | Class and Description |
---|---|
protected static class |
org.apache.iceberg.spark.procedures.BaseProcedure.Builder<T extends org.apache.iceberg.spark.procedures.BaseProcedure> |
Modifier and Type | Field and Description |
---|---|
protected static org.apache.spark.sql.types.DataType |
STRING_MAP |
Modifier and Type | Method and Description |
---|---|
protected SparkActions |
actions() |
static SparkProcedures.ProcedureBuilder |
builder() |
org.apache.spark.sql.catalyst.InternalRow[] |
call(org.apache.spark.sql.catalyst.InternalRow args)
Executes this procedure.
|
protected void |
closeService()
Closes this procedure's executor service if a new one was created with
executorService(int, String) . |
java.lang.String |
description()
Returns the description of this procedure.
|
protected java.util.concurrent.ExecutorService |
executorService(int threadPoolSize,
java.lang.String nameFormat)
Starts a new executor service which can be used by this procedure in its work.
|
protected SparkTable |
loadSparkTable(org.apache.spark.sql.connector.catalog.Identifier ident) |
protected <T> T |
modifyIcebergTable(org.apache.spark.sql.connector.catalog.Identifier ident,
java.util.function.Function<Table,T> func) |
protected org.apache.spark.sql.catalyst.InternalRow |
newInternalRow(java.lang.Object... values) |
org.apache.spark.sql.types.StructType |
outputType()
Returns the type of rows produced by this procedure.
|
ProcedureParameter[] |
parameters()
Returns the input parameters of this procedure.
|
protected void |
refreshSparkCache(org.apache.spark.sql.connector.catalog.Identifier ident,
org.apache.spark.sql.connector.catalog.Table table) |
protected org.apache.spark.sql.SparkSession |
spark() |
protected org.apache.spark.sql.connector.catalog.TableCatalog |
tableCatalog() |
protected Spark3Util.CatalogAndIdentifier |
toCatalogAndIdentifier(java.lang.String identifierAsString,
java.lang.String argName,
org.apache.spark.sql.connector.catalog.CatalogPlugin catalog) |
protected org.apache.spark.sql.connector.catalog.Identifier |
toIdentifier(java.lang.String identifierAsString,
java.lang.String argName) |
protected <T> T |
withIcebergTable(org.apache.spark.sql.connector.catalog.Identifier ident,
java.util.function.Function<Table,T> func) |
public static SparkProcedures.ProcedureBuilder builder()
public ProcedureParameter[] parameters()
Procedure
public org.apache.spark.sql.types.StructType outputType()
Procedure
public org.apache.spark.sql.catalyst.InternalRow[] call(org.apache.spark.sql.catalyst.InternalRow args)
Procedure
Spark will align the provided arguments according to the input parameters defined in Procedure.parameters()
either by position or by name before execution.
Implementations may provide a summary of execution by returning one or many rows as a
result. The schema of output rows must match the defined output type in Procedure.outputType()
.
args
- input argumentspublic java.lang.String description()
Procedure
protected org.apache.spark.sql.SparkSession spark()
protected SparkActions actions()
protected org.apache.spark.sql.connector.catalog.TableCatalog tableCatalog()
protected <T> T modifyIcebergTable(org.apache.spark.sql.connector.catalog.Identifier ident, java.util.function.Function<Table,T> func)
protected <T> T withIcebergTable(org.apache.spark.sql.connector.catalog.Identifier ident, java.util.function.Function<Table,T> func)
protected org.apache.spark.sql.connector.catalog.Identifier toIdentifier(java.lang.String identifierAsString, java.lang.String argName)
protected Spark3Util.CatalogAndIdentifier toCatalogAndIdentifier(java.lang.String identifierAsString, java.lang.String argName, org.apache.spark.sql.connector.catalog.CatalogPlugin catalog)
protected SparkTable loadSparkTable(org.apache.spark.sql.connector.catalog.Identifier ident)
protected void refreshSparkCache(org.apache.spark.sql.connector.catalog.Identifier ident, org.apache.spark.sql.connector.catalog.Table table)
protected org.apache.spark.sql.catalyst.InternalRow newInternalRow(java.lang.Object... values)
protected void closeService()
executorService(int, String)
. Does not block for any remaining tasks.protected java.util.concurrent.ExecutorService executorService(int threadPoolSize, java.lang.String nameFormat)
withIcebergTable(Identifier, Function)
or modifyIcebergTable(Identifier, Function)
are called. If these methods are not used then the
service can be shut down with closeService()
or left to be closed when this class is
finalized.threadPoolSize
- number of threads in the servicenameFormat
- name prefix for threads created in this service