Class RewriteDataFilesSparkAction

java.lang.Object
org.apache.iceberg.spark.actions.RewriteDataFilesSparkAction
All Implemented Interfaces:
Action<RewriteDataFiles,RewriteDataFiles.Result>, RewriteDataFiles, SnapshotUpdate<RewriteDataFiles,RewriteDataFiles.Result>

public class RewriteDataFilesSparkAction extends Object implements RewriteDataFiles
  • Field Details

  • Method Details

    • self

      protected RewriteDataFilesSparkAction self()
    • binPack

      public RewriteDataFilesSparkAction binPack()
      Description copied from interface: RewriteDataFiles
      Choose BINPACK as a strategy for this rewrite operation
      Specified by:
      binPack in interface RewriteDataFiles
      Returns:
      this for method chaining
    • sort

      public RewriteDataFilesSparkAction sort(SortOrder sortOrder)
      Description copied from interface: RewriteDataFiles
      Choose SORT as a strategy for this rewrite operation and manually specify the sortOrder to use
      Specified by:
      sort in interface RewriteDataFiles
      Parameters:
      sortOrder - user defined sortOrder
      Returns:
      this for method chaining
    • sort

      Description copied from interface: RewriteDataFiles
      Choose SORT as a strategy for this rewrite operation using the table's sortOrder
      Specified by:
      sort in interface RewriteDataFiles
      Returns:
      this for method chaining
    • zOrder

      public RewriteDataFilesSparkAction zOrder(String... columnNames)
      Description copied from interface: RewriteDataFiles
      Choose Z-ORDER as a strategy for this rewrite operation with a specified list of columns to use
      Specified by:
      zOrder in interface RewriteDataFiles
      Parameters:
      columnNames - Columns to be used to generate Z-Values
      Returns:
      this for method chaining
    • filter

      public RewriteDataFilesSparkAction filter(Expression expression)
      Description copied from interface: RewriteDataFiles
      A user provided filter for determining which files will be considered by the rewrite strategy. This will be used in addition to whatever rules the rewrite strategy generates. For example this would be used for providing a restriction to only run rewrite on a specific partition.
      Specified by:
      filter in interface RewriteDataFiles
      Parameters:
      expression - An iceberg expression used to determine which files will be considered for rewriting
      Returns:
      this for chaining
    • execute

      public RewriteDataFiles.Result execute()
      Description copied from interface: Action
      Executes this action.
      Specified by:
      execute in interface Action<RewriteDataFiles,RewriteDataFiles.Result>
      Returns:
      the result of this action
    • snapshotProperty

      public RewriteDataFilesSparkAction snapshotProperty(String property, String value)
    • commit

      protected void commit(SnapshotUpdate<?> update)
    • commitSummary

      protected Map<String,String> commitSummary()
    • spark

      protected org.apache.spark.sql.SparkSession spark()
    • sparkContext

      protected org.apache.spark.api.java.JavaSparkContext sparkContext()
    • option

      public RewriteDataFilesSparkAction option(String name, String value)
    • options

      public RewriteDataFilesSparkAction options(Map<String,String> newOptions)
    • options

      protected Map<String,String> options()
    • withJobGroupInfo

      protected <T> T withJobGroupInfo(JobGroupInfo info, Supplier<T> supplier)
    • newJobGroupInfo

      protected JobGroupInfo newJobGroupInfo(String groupId, String desc)
    • newStaticTable

      protected Table newStaticTable(TableMetadata metadata, FileIO io)
    • contentFileDS

      protected org.apache.spark.sql.Dataset<FileInfo> contentFileDS(Table table)
    • contentFileDS

      protected org.apache.spark.sql.Dataset<FileInfo> contentFileDS(Table table, Set<Long> snapshotIds)
    • manifestDS

      protected org.apache.spark.sql.Dataset<FileInfo> manifestDS(Table table)
    • manifestDS

      protected org.apache.spark.sql.Dataset<FileInfo> manifestDS(Table table, Set<Long> snapshotIds)
    • manifestListDS

      protected org.apache.spark.sql.Dataset<FileInfo> manifestListDS(Table table)
    • manifestListDS

      protected org.apache.spark.sql.Dataset<FileInfo> manifestListDS(Table table, Set<Long> snapshotIds)
    • statisticsFileDS

      protected org.apache.spark.sql.Dataset<FileInfo> statisticsFileDS(Table table, Set<Long> snapshotIds)
    • otherMetadataFileDS

      protected org.apache.spark.sql.Dataset<FileInfo> otherMetadataFileDS(Table table)
    • allReachableOtherMetadataFileDS

      protected org.apache.spark.sql.Dataset<FileInfo> allReachableOtherMetadataFileDS(Table table)
    • loadMetadataTable

      protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(Table table, MetadataTableType type)
    • deleteFiles

      protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(ExecutorService executorService, Consumer<String> deleteFunc, Iterator<FileInfo> files)
      Deletes files and keeps track of how many files were removed for each file type.
      Parameters:
      executorService - an executor service to use for parallel deletes
      deleteFunc - a delete func
      files - an iterator of Spark rows of the structure (path: String, type: String)
      Returns:
      stats on which files were deleted
    • deleteFiles

      protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(SupportsBulkOperations io, Iterator<FileInfo> files)