Class RewriteDataFiles.Builder

java.lang.Object
org.apache.iceberg.flink.maintenance.api.MaintenanceTaskBuilder<RewriteDataFiles.Builder>
org.apache.iceberg.flink.maintenance.api.RewriteDataFiles.Builder
Enclosing class:
RewriteDataFiles

public static class RewriteDataFiles.Builder extends MaintenanceTaskBuilder<RewriteDataFiles.Builder>
  • Constructor Details

    • Builder

      public Builder()
  • Method Details

    • partialProgressEnabled

      public RewriteDataFiles.Builder partialProgressEnabled(boolean newPartialProgressEnabled)
      Allows committing compacted data files in batches. See RewriteDataFiles.PARTIAL_PROGRESS_ENABLED for more details.
      Parameters:
      newPartialProgressEnabled - to enable partial commits
    • partialProgressMaxCommits

      public RewriteDataFiles.Builder partialProgressMaxCommits(int newPartialProgressMaxCommits)
      Configures the size of batches if partialProgressEnabled. See RewriteDataFiles.PARTIAL_PROGRESS_MAX_COMMITS for more details.
      Parameters:
      newPartialProgressMaxCommits - to target number of the commits per run
    • maxRewriteBytes

      public RewriteDataFiles.Builder maxRewriteBytes(long newMaxRewriteBytes)
      Configures the maximum byte size of the rewrites for one scheduled compaction. This could be used to limit the resources used by the compaction.
      Parameters:
      newMaxRewriteBytes - to limit the size of the rewrites
    • targetFileSizeBytes

      public RewriteDataFiles.Builder targetFileSizeBytes(long targetFileSizeBytes)
      Configures the target file size. See RewriteDataFiles.TARGET_FILE_SIZE_BYTES for more details.
      Parameters:
      targetFileSizeBytes - target file size
    • minFileSizeBytes

      public RewriteDataFiles.Builder minFileSizeBytes(long minFileSizeBytes)
      Configures the min file size considered for rewriting. See SizeBasedFileRewritePlanner.MIN_FILE_SIZE_BYTES for more details.
      Parameters:
      minFileSizeBytes - min file size
    • maxFileSizeBytes

      public RewriteDataFiles.Builder maxFileSizeBytes(long maxFileSizeBytes)
      Configures the max file size considered for rewriting. See SizeBasedFileRewritePlanner.MAX_FILE_SIZE_BYTES for more details.
      Parameters:
      maxFileSizeBytes - max file size
    • minInputFiles

      public RewriteDataFiles.Builder minInputFiles(int minInputFiles)
      Configures the minimum file number after a rewrite is always initiated. See description see SizeBasedFileRewritePlanner.MIN_INPUT_FILES for more details.
      Parameters:
      minInputFiles - min file number
    • deleteFileThreshold

      public RewriteDataFiles.Builder deleteFileThreshold(int deleteFileThreshold)
      Configures the minimum delete file number for a file after a rewrite is always initiated. See BinPackRewriteFilePlanner.DELETE_FILE_THRESHOLD for more details.
      Parameters:
      deleteFileThreshold - min delete file number
    • rewriteAll

      public RewriteDataFiles.Builder rewriteAll(boolean rewriteAll)
      Overrides other options and forces rewriting of all provided files.
      Parameters:
      rewriteAll - enables a full rewrite
    • maxFileGroupSizeBytes

      public RewriteDataFiles.Builder maxFileGroupSizeBytes(long maxFileGroupSizeBytes)
      Configures the group size for rewriting. See SizeBasedFileRewritePlanner.MAX_FILE_GROUP_SIZE_BYTES for more details.
      Parameters:
      maxFileGroupSizeBytes - file group size for rewrite
    • maxFileGroupInputFiles

      public RewriteDataFiles.Builder maxFileGroupInputFiles(long maxFileGroupInputFiles)
      Configures the max file count for rewriting. See SizeBasedFileRewritePlanner.MAX_FILE_GROUP_INPUT_FILES for more details.
      Parameters:
      maxFileGroupInputFiles - file count for rewrite
    • maxFilesToRewrite

      public RewriteDataFiles.Builder maxFilesToRewrite(int maxFilesToRewrite)
      Configures max files to rewrite. See BinPackRewriteFilePlanner.MAX_FILES_TO_REWRITE for more details.
      Parameters:
      maxFilesToRewrite - maximum files to rewrite
    • filter

      @Deprecated public RewriteDataFiles.Builder filter(Expression newFilter)
      Deprecated.
      will be removed in 1.12.0. Use filter(SerializableSupplier) instead
      A user provided filter for determining which files will be considered by the rewrite strategy.
      Parameters:
      newFilter - the filter expression to apply
      Returns:
      this for method chaining
    • filter

      public RewriteDataFiles.Builder filter(org.apache.flink.util.function.SerializableSupplier<Expression> newFilterSupplier)
      A user-provided supplier of a filter expression that determines which files are considered by the rewrite strategy.

      The supplier is evaluated by the planner on every compaction trigger, allowing a fresh filter to be produced for each compaction run.

      This is particularly useful for time-relative filters. For example, a supplier such as () -> Expressions.greaterThanOrEqual("ts", LocalDateTime.now(ZoneOffset.UTC).minus(Duration.ofDays(3)).toString()) ensures that each compaction rewrites files from the last 3 days relative to the time the compaction is planned, rather than relative to when the job was started.

      Parameters:
      newFilterSupplier - the supplier providing the filter expression to apply
      Returns:
      this for method chaining
    • branch

      public RewriteDataFiles.Builder branch(String newBranch)
      Sets the branch to compact. When set, the planner reads from the branch's snapshot and commits are made to this branch.
      Parameters:
      newBranch - the branch name
      Returns:
      this for method chaining
    • config

      public RewriteDataFiles.Builder config(RewriteDataFilesConfig rewriteDataFilesConfig)
      Configures the properties for the rewriter.
      Parameters:
      rewriteDataFilesConfig - properties for the rewriter