Class BaseRewriteManifests

java.lang.Object
org.apache.iceberg.BaseRewriteManifests
All Implemented Interfaces:
PendingUpdate<Snapshot>, RewriteManifests, SnapshotUpdate<RewriteManifests>

public class BaseRewriteManifests extends Object implements RewriteManifests
  • Method Details

    • self

      protected RewriteManifests self()
    • operation

      protected String operation()
      A string that describes the action that produced the new snapshot.
      Returns:
      a string operation
    • set

      public RewriteManifests set(String property, String value)
      Description copied from interface: SnapshotUpdate
      Set a summary property in the snapshot produced by this update.
      Specified by:
      set in interface SnapshotUpdate<RewriteManifests>
      Parameters:
      property - a String property name
      value - a String property value
      Returns:
      this for method chaining
    • summary

      protected Map<String,String> summary()
    • clusterBy

      public RewriteManifests clusterBy(Function<DataFile,Object> func)
      Description copied from interface: RewriteManifests
      Groups an existing DataFile by a cluster key produced by a function. The cluster key will determine which data file will be associated with a particular manifest. All data files with the same cluster key will be written to the same manifest (unless the file is large and split into multiple files). Manifests deleted via RewriteManifests.deleteManifest(ManifestFile) or added via RewriteManifests.addManifest(ManifestFile) are ignored during the rewrite process.
      Specified by:
      clusterBy in interface RewriteManifests
      Parameters:
      func - Function used to cluster data files to manifests.
      Returns:
      this for method chaining
    • rewriteIf

      public RewriteManifests rewriteIf(Predicate<ManifestFile> pred)
      Description copied from interface: RewriteManifests
      Determines which existing ManifestFile for the table should be rewritten. Manifests that do not match the predicate are kept as-is. If this is not called and no predicate is set, then all manifests will be rewritten.
      Specified by:
      rewriteIf in interface RewriteManifests
      Parameters:
      pred - Predicate used to determine which manifests to rewrite. If true then the manifest file will be included for rewrite. If false then the manifest is kept as-is.
      Returns:
      this for method chaining
    • deleteManifest

      public RewriteManifests deleteManifest(ManifestFile manifest)
      Description copied from interface: RewriteManifests
      Deletes a manifest file from the table.
      Specified by:
      deleteManifest in interface RewriteManifests
      Parameters:
      manifest - a manifest to delete
      Returns:
      this for method chaining
    • addManifest

      public RewriteManifests addManifest(ManifestFile manifest)
      Description copied from interface: RewriteManifests
      Adds a manifest file to the table. The added manifest cannot contain new or deleted files.

      By default, the manifest will be rewritten to ensure all entries have explicit snapshot IDs. In that case, it is always the responsibility of the caller to manage the lifecycle of the original manifest.

      If manifest entries are allowed to inherit the snapshot ID assigned on commit, the manifest should never be deleted manually if the commit succeeds as it will become part of the table metadata and will be cleaned up on expiry. If the manifest gets merged with others while preparing a new snapshot, it will be deleted automatically if this operation is successful. If the commit fails, the manifest will never be deleted and it is up to the caller whether to delete or reuse it.

      Specified by:
      addManifest in interface RewriteManifests
      Parameters:
      manifest - a manifest to add
      Returns:
      this for method chaining
    • apply

      public List<ManifestFile> apply(TableMetadata base, Snapshot snapshot)
      Apply the update's changes to the given metadata and snapshot. Return the new manifest list.
      Parameters:
      base - the base table metadata to apply changes to
      snapshot - snapshot to apply the changes to
      Returns:
      a manifest list for the new snapshot.
    • cleanUncommitted

      protected void cleanUncommitted(Set<ManifestFile> committed)
      Clean up any uncommitted manifests that were created.

      Manifests may not be committed if apply is called more because a commit conflict has occurred. Implementations may keep around manifests because the same changes will be made by both apply calls. This method instructs the implementation to clean up those manifests and passes the paths of the manifests that were actually committed.

      Parameters:
      committed - a set of manifest paths that were actually committed
    • stageOnly

      public RewriteManifests stageOnly()
      Description copied from interface: SnapshotUpdate
      Called to stage a snapshot in table metadata, but not update the current snapshot id.
      Specified by:
      stageOnly in interface SnapshotUpdate<ThisT>
      Returns:
      this for method chaining
    • scanManifestsWith

      public RewriteManifests scanManifestsWith(ExecutorService executorService)
      Description copied from interface: SnapshotUpdate
      Use a particular executor to scan manifests. The default worker pool will be used by default.
      Specified by:
      scanManifestsWith in interface SnapshotUpdate<ThisT>
      Parameters:
      executorService - the provided executor
      Returns:
      this for method chaining
    • commitMetrics

      protected CommitMetrics commitMetrics()
    • reportWith

      protected RewriteManifests reportWith(MetricsReporter newReporter)
    • targetBranch

      protected void targetBranch(String branch)
      A setter for the target branch on which snapshot producer operation should be performed
      Parameters:
      branch - to set as target branch
    • targetBranch

      protected String targetBranch()
    • workerPool

      protected ExecutorService workerPool()
    • deleteWith

      public RewriteManifests deleteWith(Consumer<String> deleteCallback)
      Description copied from interface: SnapshotUpdate
      Set a callback to delete files instead of the table's default.
      Specified by:
      deleteWith in interface SnapshotUpdate<ThisT>
      Parameters:
      deleteCallback - a String consumer used to delete locations.
      Returns:
      this for method chaining
    • validate

      protected void validate(TableMetadata currentMetadata, Snapshot snapshot)
      Validate the current metadata.

      Child operations can override this to add custom validation.

      Parameters:
      currentMetadata - current table metadata to validate
      snapshot - ending snapshot on the lineage which is being validated
    • apply

      public Snapshot apply()
      Description copied from interface: PendingUpdate
      Apply the pending changes and return the uncommitted changes for validation.

      This does not result in a permanent update.

      Specified by:
      apply in interface PendingUpdate<ThisT>
      Returns:
      the uncommitted changes that would be committed by calling PendingUpdate.commit()
    • current

      protected TableMetadata current()
    • refresh

      protected TableMetadata refresh()
    • commit

      public void commit()
      Description copied from interface: PendingUpdate
      Apply the pending changes and commit.

      Changes are committed by calling the underlying table's commit method.

      Once the commit is successful, the updated table will be refreshed.

      Specified by:
      commit in interface PendingUpdate<ThisT>
    • cleanAll

      protected void cleanAll()
    • deleteFile

      protected void deleteFile(String path)
    • manifestListPath

      protected OutputFile manifestListPath()
    • newManifestOutputFile

      protected EncryptedOutputFile newManifestOutputFile()
    • newManifestWriter

      protected ManifestWriter<DataFile> newManifestWriter(PartitionSpec spec)
    • newDeleteManifestWriter

      protected ManifestWriter<DeleteFile> newDeleteManifestWriter(PartitionSpec spec)
    • newRollingManifestWriter

      protected RollingManifestWriter<DataFile> newRollingManifestWriter(PartitionSpec spec)
    • newRollingDeleteManifestWriter

      protected RollingManifestWriter<DeleteFile> newRollingDeleteManifestWriter(PartitionSpec spec)
    • newManifestReader

      protected ManifestReader<DataFile> newManifestReader(ManifestFile manifest)
    • newDeleteManifestReader

      protected ManifestReader<DeleteFile> newDeleteManifestReader(ManifestFile manifest)
    • snapshotId

      protected long snapshotId()
    • canInheritSnapshotId

      protected boolean canInheritSnapshotId()
    • cleanupAfterCommit

      protected boolean cleanupAfterCommit()
    • writeDataManifests

      protected List<ManifestFile> writeDataManifests(Collection<DataFile> files, PartitionSpec spec)
    • writeDataManifests

      protected List<ManifestFile> writeDataManifests(Collection<DataFile> files, Long dataSeq, PartitionSpec spec)
    • writeDeleteManifests

      protected List<ManifestFile> writeDeleteManifests(Collection<DeleteFile> files, PartitionSpec spec)