Interface RewriteManifests

All Superinterfaces:
PendingUpdate<Snapshot>, SnapshotUpdate<RewriteManifests>
All Known Implementing Classes:
BaseRewriteManifests

public interface RewriteManifests extends SnapshotUpdate<RewriteManifests>
API for rewriting manifests for a table.

This API accumulates manifest files, produces a new Snapshot of the table described only by the manifest files that were added, and commits that snapshot as the current.

This API can be used to rewrite matching manifests according to a clustering function as well as to replace specific manifests. Manifests that are deleted or added directly are ignored during the rewrite process. The set of active files in replaced manifests must be the same as in new manifests.

When committing, these changes will be applied to the latest table snapshot. Commit conflicts will be resolved by applying the changes to the new latest snapshot and reattempting the commit.

  • Method Details

    • clusterBy

      Groups an existing DataFile by a cluster key produced by a function. The cluster key will determine which data file will be associated with a particular manifest. All data files with the same cluster key will be written to the same manifest (unless the file is large and split into multiple files). Manifests deleted via deleteManifest(ManifestFile) or added via addManifest(ManifestFile) are ignored during the rewrite process.
      Parameters:
      func - Function used to cluster data files to manifests.
      Returns:
      this for method chaining
    • rewriteIf

      RewriteManifests rewriteIf(Predicate<ManifestFile> predicate)
      Determines which existing ManifestFile for the table should be rewritten. Manifests that do not match the predicate are kept as-is. If this is not called and no predicate is set, then all manifests will be rewritten.
      Parameters:
      predicate - Predicate used to determine which manifests to rewrite. If true then the manifest file will be included for rewrite. If false then then manifest is kept as-is.
      Returns:
      this for method chaining
    • deleteManifest

      RewriteManifests deleteManifest(ManifestFile manifest)
      Deletes a manifest file from the table.
      Parameters:
      manifest - a manifest to delete
      Returns:
      this for method chaining
    • addManifest

      RewriteManifests addManifest(ManifestFile manifest)
      Adds a manifest file to the table. The added manifest cannot contain new or deleted files.

      By default, the manifest will be rewritten to ensure all entries have explicit snapshot IDs. In that case, it is always the responsibility of the caller to manage the lifecycle of the original manifest.

      If manifest entries are allowed to inherit the snapshot ID assigned on commit, the manifest should never be deleted manually if the commit succeeds as it will become part of the table metadata and will be cleaned up on expiry. If the manifest gets merged with others while preparing a new snapshot, it will be deleted automatically if this operation is successful. If the commit fails, the manifest will never be deleted and it is up to the caller whether to delete or reuse it.

      Parameters:
      manifest - a manifest to add
      Returns:
      this for method chaining