java.lang.Object

org.apache.iceberg.spark.procedures.RewritePositionDeleteFilesProcedure

All Implemented Interfaces:: Procedure

public class RewritePositionDeleteFilesProcedure extends Object

A procedure that rewrites position delete files in a table.

See Also:

Field Summary

Fields

Modifier and Type

Field

Description

protected static final org.apache.spark.sql.types.DataType

STRING_ARRAY

protected static final org.apache.spark.sql.types.DataType

STRING_MAP
Method Summary

Modifier and Type

Method

Description

protected SparkActions

actions()

static SparkProcedures.ProcedureBuilder

builder()

org.apache.spark.sql.catalyst.InternalRow[]

call(org.apache.spark.sql.catalyst.InternalRow args)

Executes this procedure.

protected void

closeService()

Closes this procedure's executor service if a new one was created with BaseProcedure.executorService(int, String).

String

description()

Returns the description of this procedure.

protected ExecutorService

executorService(int threadPoolSize, String nameFormat)

Starts a new executor service which can be used by this procedure in its work.

protected Expression

filterExpression(org.apache.spark.sql.connector.catalog.Identifier ident, String where)

protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>

loadRows(org.apache.spark.sql.connector.catalog.Identifier tableIdent, Map<String,String> options)

protected SparkTable

loadSparkTable(org.apache.spark.sql.connector.catalog.Identifier ident)

protected <T> T

modifyIcebergTable(org.apache.spark.sql.connector.catalog.Identifier ident, Function<Table,T> func)

protected org.apache.spark.sql.catalyst.InternalRow

newInternalRow(Object... values)

org.apache.spark.sql.types.StructType

outputType()

Returns the type of rows produced by this procedure.

ProcedureParameter[]

parameters()

Returns the input parameters of this procedure.

protected void

refreshSparkCache(org.apache.spark.sql.connector.catalog.Identifier ident, org.apache.spark.sql.connector.catalog.Table table)

protected org.apache.spark.sql.SparkSession

spark()

protected org.apache.spark.sql.connector.catalog.TableCatalog

tableCatalog()

protected Spark3Util.CatalogAndIdentifier

toCatalogAndIdentifier(String identifierAsString, String argName, org.apache.spark.sql.connector.catalog.CatalogPlugin catalog)

protected org.apache.spark.sql.connector.catalog.Identifier

toIdentifier(String identifierAsString, String argName)

protected <T> T

withIcebergTable(org.apache.spark.sql.connector.catalog.Identifier ident, Function<Table,T> func)

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- STRING_MAP
  
  protected static final org.apache.spark.sql.types.DataType STRING_MAP
- STRING_ARRAY
  
  protected static final org.apache.spark.sql.types.DataType STRING_ARRAY
Method Details
- builder
  
  public static SparkProcedures.ProcedureBuilder builder()
- parameters
  
  public ProcedureParameter[] parameters()
  
  Description copied from interface: Procedure
  
  Returns the input parameters of this procedure.
- outputType
  
  public org.apache.spark.sql.types.StructType outputType()
  
  Description copied from interface: Procedure
  
  Returns the type of rows produced by this procedure.
- call
  
  public org.apache.spark.sql.catalyst.InternalRow[] call(org.apache.spark.sql.catalyst.InternalRow args)
  
  Description copied from interface: Procedure
  
  Executes this procedure.
  Spark will align the provided arguments according to the input parameters defined in Procedure.parameters() either by position or by name before execution.
  Implementations may provide a summary of execution by returning one or many rows as a result. The schema of output rows must match the defined output type in Procedure.outputType().
  
  Parameters:
  
  args - input arguments
  
  Returns:
  
  the result of executing this procedure with the given arguments
- description
  
  public String description()
  
  Description copied from interface: Procedure
  
  Returns the description of this procedure.
- spark
  
  protected org.apache.spark.sql.SparkSession spark()
- actions
  
  protected SparkActions actions()
- tableCatalog
  
  protected org.apache.spark.sql.connector.catalog.TableCatalog tableCatalog()
- modifyIcebergTable
  
  protected <T> T modifyIcebergTable(org.apache.spark.sql.connector.catalog.Identifier ident, Function<Table,T> func)
- withIcebergTable
  
  protected <T> T withIcebergTable(org.apache.spark.sql.connector.catalog.Identifier ident, Function<Table,T> func)
- toIdentifier
  
  protected org.apache.spark.sql.connector.catalog.Identifier toIdentifier(String identifierAsString, String argName)
- toCatalogAndIdentifier
  
  protected Spark3Util.CatalogAndIdentifier toCatalogAndIdentifier(String identifierAsString, String argName, org.apache.spark.sql.connector.catalog.CatalogPlugin catalog)
- loadSparkTable
  
  protected SparkTable loadSparkTable(org.apache.spark.sql.connector.catalog.Identifier ident)
- loadRows
  
  protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadRows(org.apache.spark.sql.connector.catalog.Identifier tableIdent, Map<String,String> options)
- refreshSparkCache
  
  protected void refreshSparkCache(org.apache.spark.sql.connector.catalog.Identifier ident, org.apache.spark.sql.connector.catalog.Table table)
- filterExpression
  
  protected Expression filterExpression(org.apache.spark.sql.connector.catalog.Identifier ident, String where)
- newInternalRow
  
  protected org.apache.spark.sql.catalyst.InternalRow newInternalRow(Object... values)
- closeService
  
  protected void closeService()
  
  Closes this procedure's executor service if a new one was created with BaseProcedure.executorService(int, String). Does not block for any remaining tasks.
- executorService
  
  protected ExecutorService executorService(int threadPoolSize, String nameFormat)
  
  Starts a new executor service which can be used by this procedure in its work. The pool will be automatically shut down if withIcebergTable(Identifier, Function) or modifyIcebergTable(Identifier, Function) are called. If these methods are not used then the service can be shut down with closeService() or left to be closed when this class is finalized.
  
  Parameters:
  
  threadPoolSize - number of threads in the service
  
  nameFormat - name prefix for threads created in this service
  
  Returns:
  
  the new executor service owned by this procedure

Class RewritePositionDeleteFilesProcedure

Field Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

STRING_MAP

STRING_ARRAY

Method Details

builder

parameters

outputType

call

description

spark

actions

tableCatalog

modifyIcebergTable

withIcebergTable

toIdentifier

toCatalogAndIdentifier

loadSparkTable

loadRows

refreshSparkCache

filterExpression

newInternalRow

closeService

executorService