Interface RewriteFiles

All Superinterfaces:
PendingUpdate<Snapshot>, SnapshotUpdate<RewriteFiles>

public interface RewriteFiles extends SnapshotUpdate<RewriteFiles>
API for replacing files in a table.

This API accumulates file additions and deletions, produces a new Snapshot of the changes, and commits that snapshot as the current.

When committing, these changes will be applied to the latest table snapshot. Commit conflicts will be resolved by applying the changes to the new latest snapshot and reattempting the commit. If any of the deleted files are no longer in the latest snapshot when reattempting, the commit will throw a ValidationException.

Note that the new state of the table after each rewrite must be logically equivalent to the original table state.

  • Method Details

    • deleteFile

      default RewriteFiles deleteFile(DataFile dataFile)
      Remove a data file from the current table state.

      This rewrite operation may change the size or layout of the data files. When applicable, it is also recommended to discard already deleted records while rewriting data files. However, the set of live data records must never change.

      Parameters:
      dataFile - a rewritten data file
      Returns:
      this for method chaining
    • deleteFile

      default RewriteFiles deleteFile(DeleteFile deleteFile)
      Remove a delete file from the table state.

      This rewrite operation may change the size or layout of the delete files. When applicable, it is also recommended to discard delete records for files that are no longer part of the table state. However, the set of applicable delete records must never change.

      Parameters:
      deleteFile - a rewritten delete file
      Returns:
      this for method chaining
    • addFile

      default RewriteFiles addFile(DataFile dataFile)
      Add a new data file.

      This rewrite operation may change the size or layout of the data files. When applicable, it is also recommended to discard already deleted records while rewriting data files. However, the set of live data records must never change.

      Parameters:
      dataFile - a new data file
      Returns:
      this for method chaining
    • addFile

      default RewriteFiles addFile(DeleteFile deleteFile)
      Add a new delete file.

      This rewrite operation may change the size or layout of the delete files. When applicable, it is also recommended to discard delete records for files that are no longer part of the table state. However, the set of applicable delete records must never change.

      Parameters:
      deleteFile - a new delete file
      Returns:
      this for method chaining
    • addFile

      default RewriteFiles addFile(DeleteFile deleteFile, long dataSequenceNumber)
      Add a new delete file with the given data sequence number.

      This rewrite operation may change the size or layout of the delete files. When applicable, it is also recommended to discard delete records for files that are no longer part of the table state. However, the set of applicable delete records must never change.

      To ensure equivalence in the set of applicable delete records, the sequence number of the delete file must be the max sequence number of the delete files that it is replacing. Rewriting equality deletes that belong to different sequence numbers is not allowed.

      Parameters:
      deleteFile - a new delete file
      dataSequenceNumber - data sequence number to append on the file
      Returns:
      this for method chaining
    • dataSequenceNumber

      default RewriteFiles dataSequenceNumber(long sequenceNumber)
      Configure the data sequence number for this rewrite operation. This data sequence number will be used for all new data files that are added in this rewrite. This method is helpful to avoid commit conflicts between data compaction and adding equality deletes.
      Parameters:
      sequenceNumber - a data sequence number
      Returns:
      this for method chaining
    • rewriteFiles

      @Deprecated default RewriteFiles rewriteFiles(Set<DataFile> filesToDelete, Set<DataFile> filesToAdd)
      Deprecated.
      since 1.3.0, will be removed in 2.0.0
      Add a rewrite that replaces one set of data files with another set that contains the same data.
      Parameters:
      filesToDelete - files that will be replaced (deleted), cannot be null or empty.
      filesToAdd - files that will be added, cannot be null or empty.
      Returns:
      this for method chaining
    • rewriteFiles

      @Deprecated RewriteFiles rewriteFiles(Set<DataFile> filesToDelete, Set<DataFile> filesToAdd, long sequenceNumber)
      Deprecated.
      since 1.3.0, will be removed in 2.0.0
      Add a rewrite that replaces one set of data files with another set that contains the same data. The sequence number provided will be used for all the data files added.
      Parameters:
      filesToDelete - files that will be replaced (deleted), cannot be null or empty.
      filesToAdd - files that will be added, cannot be null or empty.
      sequenceNumber - sequence number to use for all data files added
      Returns:
      this for method chaining
    • rewriteFiles

      @Deprecated RewriteFiles rewriteFiles(Set<DataFile> dataFilesToReplace, Set<DeleteFile> deleteFilesToReplace, Set<DataFile> dataFilesToAdd, Set<DeleteFile> deleteFilesToAdd)
      Deprecated.
      since 1.3.0, will be removed in 2.0.0
      Add a rewrite that replaces one set of files with another set that contains the same data.
      Parameters:
      dataFilesToReplace - data files that will be replaced (deleted).
      deleteFilesToReplace - delete files that will be replaced (deleted).
      dataFilesToAdd - data files that will be added.
      deleteFilesToAdd - delete files that will be added.
      Returns:
      this for method chaining.
    • validateFromSnapshot

      RewriteFiles validateFromSnapshot(long snapshotId)
      Set the snapshot ID used in any reads for this operation.

      Validations will check changes after this snapshot ID. If this is not called, all ancestor snapshots through the table's initial snapshot are validated.

      Parameters:
      snapshotId - a snapshot ID
      Returns:
      this for method chaining