Class BaseRewriteDataFilesAction<ThisT>

java.lang.Object
org.apache.iceberg.actions.BaseRewriteDataFilesAction<ThisT>
All Implemented Interfaces:
Action<ThisT,RewriteDataFilesActionResult>, SnapshotUpdateAction<ThisT,RewriteDataFilesActionResult>
Direct Known Subclasses:
RewriteDataFilesAction

public abstract class BaseRewriteDataFilesAction<ThisT> extends Object
  • Constructor Details

    • BaseRewriteDataFilesAction

      protected BaseRewriteDataFilesAction(Table table)
  • Method Details

    • table

      protected Table table()
    • spec

      protected PartitionSpec spec()
    • encryptionManager

      protected EncryptionManager encryptionManager()
    • caseSensitive

      protected boolean caseSensitive()
    • caseSensitive

      public BaseRewriteDataFilesAction<ThisT> caseSensitive(boolean newCaseSensitive)
      Is it case sensitive
      Parameters:
      newCaseSensitive - caseSensitive
      Returns:
      this for method chaining
    • outputSpecId

      public BaseRewriteDataFilesAction<ThisT> outputSpecId(int specId)
      Pass a PartitionSpec id to specify which PartitionSpec should be used in DataFile rewrite
      Parameters:
      specId - PartitionSpec id to rewrite
      Returns:
      this for method chaining
    • targetSizeInBytes

      public BaseRewriteDataFilesAction<ThisT> targetSizeInBytes(long targetSize)
      Specify the target rewrite data file size in bytes
      Parameters:
      targetSize - size in bytes of rewrite data file
      Returns:
      this for method chaining
    • splitLookback

      public BaseRewriteDataFilesAction<ThisT> splitLookback(int lookback)
      Specify the number of "bins" considered when trying to pack the next file split into a task. Increasing this usually makes tasks a bit more even by considering more ways to pack file regions into a single task with extra planning cost.

      This configuration can reorder the incoming file regions, to preserve order for lower/upper bounds in file metadata, user can use a lookback of 1.

      Parameters:
      lookback - number of "bins" considered when trying to pack the next file split into a task.
      Returns:
      this for method chaining
    • splitOpenFileCost

      public BaseRewriteDataFilesAction<ThisT> splitOpenFileCost(long openFileCost)
      Specify the minimum file size to count to pack into one "bin". If the read file size is smaller than this specified threshold, Iceberg will use this value to do count.

      this configuration controls the number of files to compact for each task, small value would lead to a high compaction, the default value is 4MB.

      Parameters:
      openFileCost - minimum file size to count to pack into one "bin".
      Returns:
      this for method chaining
    • filter

      Pass a row Expression to filter DataFiles to be rewritten. Note that all files that may contain data matching the filter may be rewritten.
      Parameters:
      expr - Expression to filter out DataFiles
      Returns:
      this for method chaining
    • useStartingSequenceNumber

      public BaseRewriteDataFilesAction<ThisT> useStartingSequenceNumber(boolean useStarting)
      If the compaction should use the sequence number of the snapshot at compaction start time for new data files, instead of using the sequence number of the newly produced snapshot.

      This avoids commit conflicts with updates that add newer equality deletes at a higher sequence number.

      Parameters:
      useStarting - use starting sequence number if set to true
      Returns:
      this for method chaining
    • execute

      public RewriteDataFilesActionResult execute()
      Description copied from interface: Action
      Executes this action.
      Returns:
      the result of this action
    • fileIO

      protected abstract FileIO fileIO()
    • rewriteDataForTasks

      protected abstract List<DataFile> rewriteDataForTasks(List<CombinedScanTask> combinedScanTask)
    • self

      protected abstract ThisT self()
    • set

      public ThisT set(String property, String value)
      Specified by:
      set in interface SnapshotUpdateAction<ThisT,R>
    • commit

      protected void commit(SnapshotUpdate<?> update)
    • metadataTableName

      protected String metadataTableName(MetadataTableType type)
    • metadataTableName

      protected String metadataTableName(String tableName, MetadataTableType type)