Interface ReplacePartitions
-
- All Superinterfaces:
PendingUpdate<Snapshot>
,SnapshotUpdate<ReplacePartitions>
- All Known Implementing Classes:
BaseReplacePartitions
public interface ReplacePartitions extends SnapshotUpdate<ReplacePartitions>
API for overwriting files in a table by partition.This is provided to implement SQL compatible with Hive table operations but is not recommended. Instead, use the
overwrite API
to explicitly overwrite data.The default validation mode is idempotent, meaning the overwrite is correct and should be committed out regardless of other concurrent changes to the table. Alternatively, this API can be configured to validate that no new data or deletes have been applied since a snapshot ID associated when this operation began. This can be done by calling
validateNoConflictingDeletes()
,validateNoConflictingData()
, to ensure that no conflicting delete files or data files respectively have been written since the snapshot passed tovalidateFromSnapshot(long)
.This API accumulates file additions and produces a new
Snapshot
of the table by replacing all files in partitions with new data with the new additions. This operation is used to implement dynamic partition replacement.When committing, these changes will be applied to the latest table snapshot. Commit conflicts will be resolved by applying the changes to the new latest snapshot and reattempting the commit.
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description ReplacePartitions
addFile(DataFile file)
Add aDataFile
to the table.ReplacePartitions
validateAppendOnly()
Validate that no partitions will be replaced and the operation is append-only.ReplacePartitions
validateFromSnapshot(long snapshotId)
Set the snapshot ID used in validations for this operation.ReplacePartitions
validateNoConflictingData()
Enables validation that data added concurrently does not conflict with this commit's operation.ReplacePartitions
validateNoConflictingDeletes()
Enables validation that deletes that happened concurrently do not conflict with this commit's operation.-
Methods inherited from interface org.apache.iceberg.PendingUpdate
apply, commit, updateEvent
-
Methods inherited from interface org.apache.iceberg.SnapshotUpdate
deleteWith, scanManifestsWith, set, stageOnly, toBranch
-
-
-
-
Method Detail
-
addFile
ReplacePartitions addFile(DataFile file)
Add aDataFile
to the table.- Parameters:
file
- a data file- Returns:
- this for method chaining
-
validateAppendOnly
ReplacePartitions validateAppendOnly()
Validate that no partitions will be replaced and the operation is append-only.- Returns:
- this for method chaining
-
validateFromSnapshot
ReplacePartitions validateFromSnapshot(long snapshotId)
Set the snapshot ID used in validations for this operation.All validations will check changes after this snapshot ID. If this is not called, validation will occur from the beginning of the table's history.
This method should be called before this operation is committed. If a concurrent operation committed a data or delta file or removed a data file after the given snapshot ID that might contain rows matching a partition marked for deletion, validation will detect this and fail.
- Parameters:
snapshotId
- a snapshot ID, it should be set to when this operation started to read the table.- Returns:
- this for method chaining
-
validateNoConflictingDeletes
ReplacePartitions validateNoConflictingDeletes()
Enables validation that deletes that happened concurrently do not conflict with this commit's operation.Validating concurrent deletes is required during non-idempotent replace partition operations. This will check if a concurrent operation deletes data in any of the partitions being overwritten, as the replace partition must be aborted to avoid undeleting rows that were removed concurrently.
- Returns:
- this for method chaining
-
validateNoConflictingData
ReplacePartitions validateNoConflictingData()
Enables validation that data added concurrently does not conflict with this commit's operation.Validating concurrent data files is required during non-idempotent replace partition operations. This will check if a concurrent operation inserts data in any of the partitions being overwritten, as the replace partition must be aborted to avoid removing rows added concurrently.
- Returns:
- this for method chaining
-
-