Interface RewriteFiles
- All Superinterfaces:
PendingUpdate<Snapshot>
,SnapshotUpdate<RewriteFiles>
This API accumulates file additions and deletions, produces a new Snapshot
of the
changes, and commits that snapshot as the current.
When committing, these changes will be applied to the latest table snapshot. Commit conflicts
will be resolved by applying the changes to the new latest snapshot and reattempting the commit.
If any of the deleted files are no longer in the latest snapshot when reattempting, the commit
will throw a ValidationException
.
Note that the new state of the table after each rewrite must be logically equivalent to the original table state.
-
Method Summary
Modifier and TypeMethodDescriptiondefault RewriteFiles
Add a new data file.default RewriteFiles
addFile
(DeleteFile deleteFile) Add a new delete file.default RewriteFiles
addFile
(DeleteFile deleteFile, long dataSequenceNumber) Add a new delete file with the given data sequence number.default RewriteFiles
dataSequenceNumber
(long sequenceNumber) Configure the data sequence number for this rewrite operation.default RewriteFiles
deleteFile
(DataFile dataFile) Remove a data file from the current table state.default RewriteFiles
deleteFile
(DeleteFile deleteFile) Remove a delete file from the table state.default RewriteFiles
rewriteFiles
(Set<DataFile> filesToDelete, Set<DataFile> filesToAdd) Deprecated.since 1.3.0, will be removed in 2.0.0rewriteFiles
(Set<DataFile> filesToDelete, Set<DataFile> filesToAdd, long sequenceNumber) Deprecated.since 1.3.0, will be removed in 2.0.0rewriteFiles
(Set<DataFile> dataFilesToReplace, Set<DeleteFile> deleteFilesToReplace, Set<DataFile> dataFilesToAdd, Set<DeleteFile> deleteFilesToAdd) Deprecated.since 1.3.0, will be removed in 2.0.0validateFromSnapshot
(long snapshotId) Set the snapshot ID used in any reads for this operation.Methods inherited from interface org.apache.iceberg.PendingUpdate
apply, commit, updateEvent
Methods inherited from interface org.apache.iceberg.SnapshotUpdate
deleteWith, scanManifestsWith, set, stageOnly, toBranch
-
Method Details
-
deleteFile
Remove a data file from the current table state.This rewrite operation may change the size or layout of the data files. When applicable, it is also recommended to discard already deleted records while rewriting data files. However, the set of live data records must never change.
- Parameters:
dataFile
- a rewritten data file- Returns:
- this for method chaining
-
deleteFile
Remove a delete file from the table state.This rewrite operation may change the size or layout of the delete files. When applicable, it is also recommended to discard delete records for files that are no longer part of the table state. However, the set of applicable delete records must never change.
- Parameters:
deleteFile
- a rewritten delete file- Returns:
- this for method chaining
-
addFile
Add a new data file.This rewrite operation may change the size or layout of the data files. When applicable, it is also recommended to discard already deleted records while rewriting data files. However, the set of live data records must never change.
- Parameters:
dataFile
- a new data file- Returns:
- this for method chaining
-
addFile
Add a new delete file.This rewrite operation may change the size or layout of the delete files. When applicable, it is also recommended to discard delete records for files that are no longer part of the table state. However, the set of applicable delete records must never change.
- Parameters:
deleteFile
- a new delete file- Returns:
- this for method chaining
-
addFile
Add a new delete file with the given data sequence number.This rewrite operation may change the size or layout of the delete files. When applicable, it is also recommended to discard delete records for files that are no longer part of the table state. However, the set of applicable delete records must never change.
To ensure equivalence in the set of applicable delete records, the sequence number of the delete file must be the max sequence number of the delete files that it is replacing. Rewriting equality deletes that belong to different sequence numbers is not allowed.
- Parameters:
deleteFile
- a new delete filedataSequenceNumber
- data sequence number to append on the file- Returns:
- this for method chaining
-
dataSequenceNumber
Configure the data sequence number for this rewrite operation. This data sequence number will be used for all new data files that are added in this rewrite. This method is helpful to avoid commit conflicts between data compaction and adding equality deletes.- Parameters:
sequenceNumber
- a data sequence number- Returns:
- this for method chaining
-
rewriteFiles
@Deprecated default RewriteFiles rewriteFiles(Set<DataFile> filesToDelete, Set<DataFile> filesToAdd) Deprecated.since 1.3.0, will be removed in 2.0.0Add a rewrite that replaces one set of data files with another set that contains the same data.- Parameters:
filesToDelete
- files that will be replaced (deleted), cannot be null or empty.filesToAdd
- files that will be added, cannot be null or empty.- Returns:
- this for method chaining
-
rewriteFiles
@Deprecated RewriteFiles rewriteFiles(Set<DataFile> filesToDelete, Set<DataFile> filesToAdd, long sequenceNumber) Deprecated.since 1.3.0, will be removed in 2.0.0Add a rewrite that replaces one set of data files with another set that contains the same data. The sequence number provided will be used for all the data files added.- Parameters:
filesToDelete
- files that will be replaced (deleted), cannot be null or empty.filesToAdd
- files that will be added, cannot be null or empty.sequenceNumber
- sequence number to use for all data files added- Returns:
- this for method chaining
-
rewriteFiles
@Deprecated RewriteFiles rewriteFiles(Set<DataFile> dataFilesToReplace, Set<DeleteFile> deleteFilesToReplace, Set<DataFile> dataFilesToAdd, Set<DeleteFile> deleteFilesToAdd) Deprecated.since 1.3.0, will be removed in 2.0.0Add a rewrite that replaces one set of files with another set that contains the same data.- Parameters:
dataFilesToReplace
- data files that will be replaced (deleted).deleteFilesToReplace
- delete files that will be replaced (deleted).dataFilesToAdd
- data files that will be added.deleteFilesToAdd
- delete files that will be added.- Returns:
- this for method chaining.
-
validateFromSnapshot
Set the snapshot ID used in any reads for this operation.Validations will check changes after this snapshot ID. If this is not called, all ancestor snapshots through the table's initial snapshot are validated.
- Parameters:
snapshotId
- a snapshot ID- Returns:
- this for method chaining
-