Class RewriteTablePathSparkAction

java.lang.Object
org.apache.iceberg.spark.actions.RewriteTablePathSparkAction
All Implemented Interfaces:
Action<RewriteTablePath,RewriteTablePath.Result>, RewriteTablePath

public class RewriteTablePathSparkAction extends Object implements RewriteTablePath
  • Field Details

  • Method Details

    • self

      protected RewriteTablePath self()
    • rewriteLocationPrefix

      public RewriteTablePath rewriteLocationPrefix(String sPrefix, String tPrefix)
      Description copied from interface: RewriteTablePath
      Configure a source prefix that will be replaced by the specified target prefix in all paths
      Specified by:
      rewriteLocationPrefix in interface RewriteTablePath
      Parameters:
      sPrefix - the source prefix to be replaced
      tPrefix - the target prefix
      Returns:
      this for method chaining
    • startVersion

      public RewriteTablePath startVersion(String sVersion)
      Description copied from interface: RewriteTablePath
      First metadata version to rewrite, identified by name of a metadata.json file in the table's metadata log. It is optional, if provided then this action will only rewrite metadata files added after this version.
      Specified by:
      startVersion in interface RewriteTablePath
      Parameters:
      sVersion - name of a metadata.json file. For example, "00001-8893aa9e-f92e-4443-80e7-cfa42238a654.metadata.json".
      Returns:
      this for method chaining
    • endVersion

      public RewriteTablePath endVersion(String eVersion)
      Description copied from interface: RewriteTablePath
      Last metadata version to rewrite, identified by name of a metadata.json file in the table's metadata log. It is optional, if provided then this action will only rewrite metadata files added before this file, including the file itself.
      Specified by:
      endVersion in interface RewriteTablePath
      Parameters:
      eVersion - name of a metadata.json file. For example, "00001-8893aa9e-f92e-4443-80e7-cfa42238a654.metadata.json".
      Returns:
      this for method chaining
    • stagingLocation

      public RewriteTablePath stagingLocation(String stagingLocation)
      Description copied from interface: RewriteTablePath
      Custom staging location. It is optional. By default, staging location is a subdirectory under table's metadata directory.
      Specified by:
      stagingLocation in interface RewriteTablePath
      Parameters:
      stagingLocation - the staging location
      Returns:
      this for method chaining
    • createFileList

      public RewriteTablePath createFileList(boolean createFileListFlag)
      Description copied from interface: RewriteTablePath
      Whether to create the file list.

      The default value is true, which means the file list will be created. If set to false, the file list will not be created.

      Specified by:
      createFileList in interface RewriteTablePath
      Parameters:
      createFileListFlag - true to create the file list, false to skip it
      Returns:
      this instance for method chaining
    • executeWith

      public RewriteTablePath executeWith(ExecutorService service)
      Description copied from interface: RewriteTablePath
      Passes an alternative executor service that will be used for version file and manifest list rewriting. If this method is not called, these operations will be performed sequentially.
      Specified by:
      executeWith in interface RewriteTablePath
      Parameters:
      service - an executor service to parallelize metadata rewriting
      Returns:
      this for method chaining
    • execute

      public RewriteTablePath.Result execute()
      Description copied from interface: Action
      Executes this action.
      Specified by:
      execute in interface Action<RewriteTablePath,RewriteTablePath.Result>
      Returns:
      the result of this action
    • spark

      protected org.apache.spark.sql.SparkSession spark()
    • sparkContext

      protected org.apache.spark.api.java.JavaSparkContext sparkContext()
    • option

      public RewriteTablePath option(String name, String value)
    • options

      public RewriteTablePath options(Map<String,String> newOptions)
    • options

      protected Map<String,String> options()
    • withJobGroupInfo

      protected <T> T withJobGroupInfo(JobGroupInfo info, Supplier<T> supplier)
    • newJobGroupInfo

      protected JobGroupInfo newJobGroupInfo(String groupId, String desc)
    • newStaticTable

      protected Table newStaticTable(TableMetadata metadata, FileIO io)
    • newStaticTable

      protected Table newStaticTable(String metadataFileLocation, FileIO io)
    • contentFileDS

      protected org.apache.spark.sql.Dataset<FileInfo> contentFileDS(Table table)
    • contentFileDS

      protected org.apache.spark.sql.Dataset<FileInfo> contentFileDS(Table table, Set<Long> snapshotIds)
    • manifestDS

      protected org.apache.spark.sql.Dataset<FileInfo> manifestDS(Table table)
    • manifestDS

      protected org.apache.spark.sql.Dataset<FileInfo> manifestDS(Table table, Set<Long> snapshotIds)
    • manifestListDS

      protected org.apache.spark.sql.Dataset<FileInfo> manifestListDS(Table table)
    • manifestListDS

      protected org.apache.spark.sql.Dataset<FileInfo> manifestListDS(Table table, Set<Long> snapshotIds)
    • statisticsFileDS

      protected org.apache.spark.sql.Dataset<FileInfo> statisticsFileDS(Table table, Set<Long> snapshotIds)
    • otherMetadataFileDS

      protected org.apache.spark.sql.Dataset<FileInfo> otherMetadataFileDS(Table table)
    • allReachableOtherMetadataFileDS

      protected org.apache.spark.sql.Dataset<FileInfo> allReachableOtherMetadataFileDS(Table table)
    • loadMetadataTable

      protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(Table table, MetadataTableType type)
    • deleteFiles

      protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(ExecutorService executorService, Consumer<String> deleteFunc, Iterator<FileInfo> files)
      Deletes files and keeps track of how many files were removed for each file type.
      Parameters:
      executorService - an executor service to use for parallel deletes
      deleteFunc - a delete func
      files - an iterator of Spark rows of the structure (path: String, type: String)
      Returns:
      stats on which files were deleted
    • deleteFiles

      protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(SupportsBulkOperations io, Iterator<FileInfo> files)