public interface ReplacePartitions extends SnapshotUpdate<ReplacePartitions>
This is provided to implement SQL compatible with Hive table operations but is not
recommended. Instead, use the
overwrite API to explicitly overwrite data.
The default validation mode is idempotent, meaning the overwrite is correct and should be
committed out regardless of other concurrent changes to the table. Alternatively, this API can be
configured to validate that no new data or deletes have been applied since a snapshot ID
associated when this operation began. This can be done by calling
validateNoConflictingData(), to ensure that no
conflicting delete files or data files respectively have been written since the snapshot passed
This API accumulates file additions and produces a new
Snapshot of the table by
replacing all files in partitions with new data with the new additions. This operation is used to
implement dynamic partition replacement.
When committing, these changes will be applied to the latest table snapshot. Commit conflicts will be resolved by applying the changes to the new latest snapshot and reattempting the commit.
|Modifier and Type||Method and Description|
Validate that no partitions will be replaced and the operation is append-only.
Set the snapshot ID used in validations for this operation.
Enables validation that data added concurrently does not conflict with this commit's operation.
Enables validation that deletes that happened concurrently do not conflict with this commit's operation.
deleteWith, scanManifestsWith, set, stageOnly, toBranch
ReplacePartitions addFile(DataFile file)
DataFileto the table.
file- a data file
ReplacePartitions validateFromSnapshot(long snapshotId)
All validations will check changes after this snapshot ID. If this is not called, validation will occur from the beginning of the table's history.
This method should be called before this operation is committed. If a concurrent operation committed a data or delta file or removed a data file after the given snapshot ID that might contain rows matching a partition marked for deletion, validation will detect this and fail.
snapshotId- a snapshot ID, it should be set to when this operation started to read the table.
Validating concurrent deletes is required during non-idempotent replace partition operations. This will check if a concurrent operation deletes data in any of the partitions being overwritten, as the replace partition must be aborted to avoid undeleting rows that were removed concurrently.
Validating concurrent data files is required during non-idempotent replace partition operations. This will check if a concurrent operation inserts data in any of the partitions being overwritten, as the replace partition must be aborted to avoid removing rows added concurrently.