public interface OverwriteFiles extends SnapshotUpdate<OverwriteFiles>
This API accumulates file additions and produces a new Snapshot
of the table by replacing
all the deleted files with the set of additions. This operation is used to implement idempotent
writes that always replace a section of a table with new data or update/delete operations that
eagerly overwrite files.
Overwrites can be validated. The default validation mode is idempotent, meaning the overwrite is correct and should be committed out regardless of other concurrent changes to the table. For example, this can be used for replacing all the data for day D with query results. Alternatively, this API can be configured for overwriting certain files with their filtered versions while ensuring no new data that would need to be filtered has been added.
When committing, these changes will be applied to the latest table snapshot. Commit conflicts will be resolved by applying the changes to the new latest snapshot and reattempting the commit.
Modifier and Type | Method and Description |
---|---|
OverwriteFiles |
addFile(DataFile file)
Add a
DataFile to the table. |
OverwriteFiles |
caseSensitive(boolean caseSensitive)
Enables or disables case sensitive expression binding for validations that accept expressions.
|
OverwriteFiles |
deleteFile(DataFile file)
Delete a
DataFile from the table. |
OverwriteFiles |
overwriteByRowFilter(Expression expr)
Delete files that match an
Expression on data rows from the table. |
OverwriteFiles |
validateAddedFilesMatchOverwriteFilter()
Signal that each file added to the table must match the overwrite expression.
|
OverwriteFiles |
validateFromSnapshot(long snapshotId)
Set the snapshot ID used in any reads for this operation.
|
OverwriteFiles |
validateNoConflictingAppends(Expression conflictDetectionFilter)
Enables validation that files added concurrently do not conflict with this commit's operation.
|
OverwriteFiles |
validateNoConflictingAppends(java.lang.Long readSnapshotId,
Expression conflictDetectionFilter)
Deprecated.
this will be removed in 0.11.0;
use
validateNoConflictingAppends(Expression) and validateFromSnapshot(long) instead |
deleteWith, set, stageOnly
apply, commit, updateEvent
OverwriteFiles overwriteByRowFilter(Expression expr)
Expression
on data rows from the table.
A file is selected to be deleted by the expression if it could contain any rows that match the
expression (candidate files are selected using an
inclusive projection
). These candidate files are
deleted if all of the rows in the file must match the expression (the partition data matches
the expression's Projections.strict(PartitionSpec)
strict projection}). This guarantees
that files are deleted if and only if all rows in the file must match the expression.
Files that may contain some rows that match the expression and some rows that do not will
result in a ValidationException
.
expr
- an expression on rows in the tableValidationException
- If a file can contain both rows that match and rows that do notOverwriteFiles addFile(DataFile file)
DataFile
to the table.file
- a data fileOverwriteFiles deleteFile(DataFile file)
DataFile
from the table.file
- a data fileOverwriteFiles validateAddedFilesMatchOverwriteFilter()
If this method is called, each added file is validated on commit to ensure that it matches the overwrite row filter. This is used to ensure that writes are idempotent: that files cannot be added during a commit that would not be removed if the operation were run a second time.
OverwriteFiles validateFromSnapshot(long snapshotId)
Validations will check changes after this snapshot ID. If the from snapshot is not set, all ancestor snapshots through the table's initial snapshot are validated.
snapshotId
- a snapshot IDOverwriteFiles caseSensitive(boolean caseSensitive)
caseSensitive
- whether expression binding should be case sensitiveOverwriteFiles validateNoConflictingAppends(Expression conflictDetectionFilter)
This method should be called when the table is queried to determine which files to delete/append. If a concurrent operation commits a new file after the data was read and that file might contain rows matching the specified conflict detection filter, the overwrite operation will detect this during retries and fail.
Calling this method with a correct conflict detection filter is required to maintain serializable isolation for eager update/delete operations. Otherwise, the isolation level will be snapshot isolation.
Validation applies to files added to the table since the snapshot passed to validateFromSnapshot(long)
.
conflictDetectionFilter
- an expression on rows in the table@Deprecated OverwriteFiles validateNoConflictingAppends(java.lang.Long readSnapshotId, Expression conflictDetectionFilter)
validateNoConflictingAppends(Expression)
and validateFromSnapshot(long)
insteadThis method should be called when the table is queried to determine which files to delete/append. If a concurrent operation commits a new file after the data was read and that file might contain rows matching the specified conflict detection filter, the overwrite operation will detect this during retries and fail.
Calling this method with a correct conflict detection filter is required to maintain serializable isolation for eager update/delete operations. Otherwise, the isolation level will be snapshot isolation.
readSnapshotId
- the snapshot id that was used to read the data or null if the table was emptyconflictDetectionFilter
- an expression on rows in the table