Class BaseReplacePartitions
- java.lang.Object
-
- org.apache.iceberg.BaseReplacePartitions
-
- All Implemented Interfaces:
PendingUpdate<Snapshot>
,ReplacePartitions
,SnapshotUpdate<ReplacePartitions>
public class BaseReplacePartitions extends java.lang.Object implements ReplacePartitions
-
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
add(DataFile file)
Add a data file to the new snapshot.protected void
add(DeleteFile file)
Add a delete file to the new snapshot.protected void
add(DeleteFile file, long dataSequenceNumber)
Add a delete file to the new snapshot.protected void
add(ManifestFile manifest)
Add all files in a manifest to the new snapshot.protected java.util.List<DataFile>
addedDataFiles()
protected org.apache.iceberg.DeleteFileIndex
addedDeleteFiles(TableMetadata base, java.lang.Long startingSnapshotId, Expression dataFilter, PartitionSet partitionSet, Snapshot parent)
Returns matching delete files have been added to the table since a starting snapshot.ReplacePartitions
addFile(DataFile file)
Add aDataFile
to the table.protected boolean
addsDataFiles()
protected boolean
addsDeleteFiles()
Snapshot
apply()
Apply the pending changes and return the uncommitted changes for validation.java.util.List<ManifestFile>
apply(TableMetadata base, Snapshot snapshot)
Apply the update's changes to the given metadata and snapshot.protected boolean
canInheritSnapshotId()
ThisT
caseSensitive(boolean isCaseSensitive)
protected void
cleanAll()
protected void
cleanUncommitted(java.util.Set<ManifestFile> committed)
Clean up any uncommitted manifests that were created.void
commit()
Apply the pending changes and commit.protected CommitMetrics
commitMetrics()
protected TableMetadata
current()
protected PartitionSpec
dataSpec()
protected void
delete(java.lang.CharSequence path)
Add a specific data path to be deleted in the new snapshot.protected void
delete(DataFile file)
Add a specific data file to be deleted in the new snapshot.protected void
delete(DeleteFile file)
Add a specific delete file to be deleted in the new snapshot.protected void
deleteByRowFilter(Expression expr)
Add a filter to match files to delete.protected void
deleteFile(java.lang.String path)
protected boolean
deletesDataFiles()
protected boolean
deletesDeleteFiles()
ThisT
deleteWith(java.util.function.Consumer<java.lang.String> deleteCallback)
Set a callback to delete files instead of the table's default.protected void
dropPartition(int specId, StructLike partition)
Add a partition tuple to drop from the table during the delete phase.protected void
failAnyDelete()
protected void
failMissingDeletePaths()
protected boolean
isCaseSensitive()
protected OutputFile
manifestListPath()
protected ManifestReader<DeleteFile>
newDeleteManifestReader(ManifestFile manifest)
protected ManifestWriter<DeleteFile>
newDeleteManifestWriter(PartitionSpec spec)
protected OutputFile
newManifestOutput()
protected ManifestReader<DataFile>
newManifestReader(ManifestFile manifest)
protected ManifestWriter<DataFile>
newManifestWriter(PartitionSpec spec)
protected RollingManifestWriter<DeleteFile>
newRollingDeleteManifestWriter(PartitionSpec spec)
protected RollingManifestWriter<DataFile>
newRollingManifestWriter(PartitionSpec spec)
protected java.lang.String
operation()
A string that describes the action that produced the new snapshot.protected TableMetadata
refresh()
protected ThisT
reportWith(MetricsReporter newReporter)
protected Expression
rowFilter()
ThisT
scanManifestsWith(java.util.concurrent.ExecutorService executorService)
Use a particular executor to scan manifests.protected ReplacePartitions
self()
ThisT
set(java.lang.String property, java.lang.String value)
Set a summary property in the snapshot produced by this update.protected void
setNewDataFilesDataSequenceNumber(long sequenceNumber)
protected long
snapshotId()
ThisT
stageOnly()
Called to stage a snapshot in table metadata, but not update the current snapshot id.protected java.util.Map<java.lang.String,java.lang.String>
summary()
protected java.lang.String
targetBranch()
protected void
targetBranch(java.lang.String branch)
A setter for the target branch on which snapshot producer operation should be performedBaseReplacePartitions
toBranch(java.lang.String branch)
Perform operations on a particular branchjava.lang.Object
updateEvent()
Generates update event to notify about metadata changesvoid
validate(TableMetadata currentMetadata, Snapshot parent)
Validate the current metadata.protected void
validateAddedDataFiles(TableMetadata base, java.lang.Long startingSnapshotId, Expression conflictDetectionFilter, Snapshot parent)
Validates that no files matching a filter have been added to the table since a starting snapshot.protected void
validateAddedDataFiles(TableMetadata base, java.lang.Long startingSnapshotId, PartitionSet partitionSet, Snapshot parent)
Validates that no files matching given partitions have been added to the table since a starting snapshot.ReplacePartitions
validateAppendOnly()
Validate that no partitions will be replaced and the operation is append-only.protected void
validateDataFilesExist(TableMetadata base, java.lang.Long startingSnapshotId, CharSequenceSet requiredDataFiles, boolean skipDeletes, Expression conflictDetectionFilter, Snapshot parent)
protected void
validateDeletedDataFiles(TableMetadata base, java.lang.Long startingSnapshotId, Expression dataFilter, Snapshot parent)
Validates that no files matching a filter have been deleted from the table since a starting snapshot.protected void
validateDeletedDataFiles(TableMetadata base, java.lang.Long startingSnapshotId, PartitionSet partitionSet, Snapshot parent)
Validates that no files matching a filter have been deleted from the table since a starting snapshot.ReplacePartitions
validateFromSnapshot(long newStartingSnapshotId)
Set the snapshot ID used in validations for this operation.ReplacePartitions
validateNoConflictingData()
Enables validation that data added concurrently does not conflict with this commit's operation.ReplacePartitions
validateNoConflictingDeletes()
Enables validation that deletes that happened concurrently do not conflict with this commit's operation.protected void
validateNoNewDeleteFiles(TableMetadata base, java.lang.Long startingSnapshotId, Expression dataFilter, Snapshot parent)
Validates that no delete files matching a filter have been added to the table since a starting snapshot.protected void
validateNoNewDeleteFiles(TableMetadata base, java.lang.Long startingSnapshotId, PartitionSet partitionSet, Snapshot parent)
Validates that no delete files matching a partition set have been added to the table since a starting snapshot.protected void
validateNoNewDeletesForDataFiles(TableMetadata base, java.lang.Long startingSnapshotId, java.lang.Iterable<DataFile> dataFiles, Snapshot parent)
Validates that no new delete files that must be applied to the given data files have been added to the table since a starting snapshot.protected void
validateNoNewDeletesForDataFiles(TableMetadata base, java.lang.Long startingSnapshotId, Expression dataFilter, java.lang.Iterable<DataFile> dataFiles, Snapshot parent)
Validates that no new delete files that must be applied to the given data files have been added to the table since a starting snapshot.protected java.util.concurrent.ExecutorService
workerPool()
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.iceberg.PendingUpdate
apply, commit, updateEvent
-
Methods inherited from interface org.apache.iceberg.SnapshotUpdate
deleteWith, scanManifestsWith, set, stageOnly
-
-
-
-
Method Detail
-
self
protected ReplacePartitions self()
-
operation
protected java.lang.String operation()
A string that describes the action that produced the new snapshot.- Returns:
- a string operation
-
addFile
public ReplacePartitions addFile(DataFile file)
Description copied from interface:ReplacePartitions
Add aDataFile
to the table.- Specified by:
addFile
in interfaceReplacePartitions
- Parameters:
file
- a data file- Returns:
- this for method chaining
-
validateAppendOnly
public ReplacePartitions validateAppendOnly()
Description copied from interface:ReplacePartitions
Validate that no partitions will be replaced and the operation is append-only.- Specified by:
validateAppendOnly
in interfaceReplacePartitions
- Returns:
- this for method chaining
-
validateFromSnapshot
public ReplacePartitions validateFromSnapshot(long newStartingSnapshotId)
Description copied from interface:ReplacePartitions
Set the snapshot ID used in validations for this operation.All validations will check changes after this snapshot ID. If this is not called, validation will occur from the beginning of the table's history.
This method should be called before this operation is committed. If a concurrent operation committed a data or delta file or removed a data file after the given snapshot ID that might contain rows matching a partition marked for deletion, validation will detect this and fail.
- Specified by:
validateFromSnapshot
in interfaceReplacePartitions
- Parameters:
newStartingSnapshotId
- a snapshot ID, it should be set to when this operation started to read the table.- Returns:
- this for method chaining
-
validateNoConflictingDeletes
public ReplacePartitions validateNoConflictingDeletes()
Description copied from interface:ReplacePartitions
Enables validation that deletes that happened concurrently do not conflict with this commit's operation.Validating concurrent deletes is required during non-idempotent replace partition operations. This will check if a concurrent operation deletes data in any of the partitions being overwritten, as the replace partition must be aborted to avoid undeleting rows that were removed concurrently.
- Specified by:
validateNoConflictingDeletes
in interfaceReplacePartitions
- Returns:
- this for method chaining
-
validateNoConflictingData
public ReplacePartitions validateNoConflictingData()
Description copied from interface:ReplacePartitions
Enables validation that data added concurrently does not conflict with this commit's operation.Validating concurrent data files is required during non-idempotent replace partition operations. This will check if a concurrent operation inserts data in any of the partitions being overwritten, as the replace partition must be aborted to avoid removing rows added concurrently.
- Specified by:
validateNoConflictingData
in interfaceReplacePartitions
- Returns:
- this for method chaining
-
toBranch
public BaseReplacePartitions toBranch(java.lang.String branch)
Description copied from interface:SnapshotUpdate
Perform operations on a particular branch- Specified by:
toBranch
in interfaceSnapshotUpdate<ReplacePartitions>
- Parameters:
branch
- which is name of SnapshotRef of type branch.
-
validate
public void validate(TableMetadata currentMetadata, Snapshot parent)
Validate the current metadata.Child operations can override this to add custom validation.
- Parameters:
currentMetadata
- current table metadata to validateparent
- ending snapshot on the lineage which is being validated
-
apply
public java.util.List<ManifestFile> apply(TableMetadata base, Snapshot snapshot)
Apply the update's changes to the given metadata and snapshot. Return the new manifest list.- Parameters:
base
- the base table metadata to apply changes tosnapshot
- snapshot to apply the changes to- Returns:
- a manifest list for the new snapshot.
-
set
public ThisT set(java.lang.String property, java.lang.String value)
Description copied from interface:SnapshotUpdate
Set a summary property in the snapshot produced by this update.- Parameters:
property
- a String property namevalue
- a String property value- Returns:
- this for method chaining
-
caseSensitive
public ThisT caseSensitive(boolean isCaseSensitive)
-
isCaseSensitive
protected boolean isCaseSensitive()
-
dataSpec
protected PartitionSpec dataSpec()
-
rowFilter
protected Expression rowFilter()
-
addedDataFiles
protected java.util.List<DataFile> addedDataFiles()
-
failAnyDelete
protected void failAnyDelete()
-
failMissingDeletePaths
protected void failMissingDeletePaths()
-
deleteByRowFilter
protected void deleteByRowFilter(Expression expr)
Add a filter to match files to delete. A file will be deleted if all of the rows it contains match this or any other filter passed to this method.- Parameters:
expr
- an expression to match rows.
-
dropPartition
protected void dropPartition(int specId, StructLike partition)
Add a partition tuple to drop from the table during the delete phase.
-
delete
protected void delete(DataFile file)
Add a specific data file to be deleted in the new snapshot.
-
delete
protected void delete(DeleteFile file)
Add a specific delete file to be deleted in the new snapshot.
-
delete
protected void delete(java.lang.CharSequence path)
Add a specific data path to be deleted in the new snapshot.
-
deletesDataFiles
protected boolean deletesDataFiles()
-
deletesDeleteFiles
protected boolean deletesDeleteFiles()
-
addsDataFiles
protected boolean addsDataFiles()
-
addsDeleteFiles
protected boolean addsDeleteFiles()
-
add
protected void add(DataFile file)
Add a data file to the new snapshot.
-
add
protected void add(DeleteFile file)
Add a delete file to the new snapshot.
-
add
protected void add(DeleteFile file, long dataSequenceNumber)
Add a delete file to the new snapshot.
-
add
protected void add(ManifestFile manifest)
Add all files in a manifest to the new snapshot.
-
validateAddedDataFiles
protected void validateAddedDataFiles(TableMetadata base, java.lang.Long startingSnapshotId, PartitionSet partitionSet, Snapshot parent)
Validates that no files matching given partitions have been added to the table since a starting snapshot.- Parameters:
base
- table metadata to validatestartingSnapshotId
- id of the snapshot current at the start of the operationpartitionSet
- a set of partitions to filter new conflicting data filesparent
- ending snapshot on the lineage being validated
-
validateAddedDataFiles
protected void validateAddedDataFiles(TableMetadata base, java.lang.Long startingSnapshotId, Expression conflictDetectionFilter, Snapshot parent)
Validates that no files matching a filter have been added to the table since a starting snapshot.- Parameters:
base
- table metadata to validatestartingSnapshotId
- id of the snapshot current at the start of the operationconflictDetectionFilter
- an expression used to find new conflicting data files
-
validateNoNewDeletesForDataFiles
protected void validateNoNewDeletesForDataFiles(TableMetadata base, java.lang.Long startingSnapshotId, java.lang.Iterable<DataFile> dataFiles, Snapshot parent)
Validates that no new delete files that must be applied to the given data files have been added to the table since a starting snapshot.- Parameters:
base
- table metadata to validatestartingSnapshotId
- id of the snapshot current at the start of the operationdataFiles
- data files to validate have no new row deletesparent
- ending snapshot on the branch being validated
-
validateNoNewDeletesForDataFiles
protected void validateNoNewDeletesForDataFiles(TableMetadata base, java.lang.Long startingSnapshotId, Expression dataFilter, java.lang.Iterable<DataFile> dataFiles, Snapshot parent)
Validates that no new delete files that must be applied to the given data files have been added to the table since a starting snapshot.- Parameters:
base
- table metadata to validatestartingSnapshotId
- id of the snapshot current at the start of the operationdataFilter
- a data filterdataFiles
- data files to validate have no new row deletesparent
- ending snapshot on the branch being validated
-
validateNoNewDeleteFiles
protected void validateNoNewDeleteFiles(TableMetadata base, java.lang.Long startingSnapshotId, Expression dataFilter, Snapshot parent)
Validates that no delete files matching a filter have been added to the table since a starting snapshot.- Parameters:
base
- table metadata to validatestartingSnapshotId
- id of the snapshot current at the start of the operationdataFilter
- an expression used to find new conflicting delete filesparent
- ending snapshot on the branch being validated
-
validateNoNewDeleteFiles
protected void validateNoNewDeleteFiles(TableMetadata base, java.lang.Long startingSnapshotId, PartitionSet partitionSet, Snapshot parent)
Validates that no delete files matching a partition set have been added to the table since a starting snapshot.- Parameters:
base
- table metadata to validatestartingSnapshotId
- id of the snapshot current at the start of the operationpartitionSet
- a partition set used to find new conflicting delete filesparent
- ending snapshot on the branch being validated
-
addedDeleteFiles
protected org.apache.iceberg.DeleteFileIndex addedDeleteFiles(TableMetadata base, java.lang.Long startingSnapshotId, Expression dataFilter, PartitionSet partitionSet, Snapshot parent)
Returns matching delete files have been added to the table since a starting snapshot.- Parameters:
base
- table metadata to validatestartingSnapshotId
- id of the snapshot current at the start of the operationdataFilter
- an expression used to find delete filespartitionSet
- a partition set used to find delete filesparent
- parent snapshot of the branch
-
validateDeletedDataFiles
protected void validateDeletedDataFiles(TableMetadata base, java.lang.Long startingSnapshotId, Expression dataFilter, Snapshot parent)
Validates that no files matching a filter have been deleted from the table since a starting snapshot.- Parameters:
base
- table metadata to validatestartingSnapshotId
- id of the snapshot current at the start of the operationdataFilter
- an expression used to find deleted data filesparent
- ending snapshot on the branch being validated
-
validateDeletedDataFiles
protected void validateDeletedDataFiles(TableMetadata base, java.lang.Long startingSnapshotId, PartitionSet partitionSet, Snapshot parent)
Validates that no files matching a filter have been deleted from the table since a starting snapshot.- Parameters:
base
- table metadata to validatestartingSnapshotId
- id of the snapshot current at the start of the operationpartitionSet
- a partition set used to find deleted data filesparent
- ending snapshot on the branch being validated
-
setNewDataFilesDataSequenceNumber
protected void setNewDataFilesDataSequenceNumber(long sequenceNumber)
-
validateDataFilesExist
protected void validateDataFilesExist(TableMetadata base, java.lang.Long startingSnapshotId, CharSequenceSet requiredDataFiles, boolean skipDeletes, Expression conflictDetectionFilter, Snapshot parent)
-
summary
protected java.util.Map<java.lang.String,java.lang.String> summary()
-
updateEvent
public java.lang.Object updateEvent()
Description copied from interface:PendingUpdate
Generates update event to notify about metadata changes- Returns:
- the generated event
-
cleanUncommitted
protected void cleanUncommitted(java.util.Set<ManifestFile> committed)
Clean up any uncommitted manifests that were created.Manifests may not be committed if apply is called more because a commit conflict has occurred. Implementations may keep around manifests because the same changes will be made by both apply calls. This method instructs the implementation to clean up those manifests and passes the paths of the manifests that were actually committed.
- Parameters:
committed
- a set of manifest paths that were actually committed
-
stageOnly
public ThisT stageOnly()
Description copied from interface:SnapshotUpdate
Called to stage a snapshot in table metadata, but not update the current snapshot id.- Specified by:
stageOnly
in interfaceSnapshotUpdate<ThisT>
- Returns:
- this for method chaining
-
scanManifestsWith
public ThisT scanManifestsWith(java.util.concurrent.ExecutorService executorService)
Description copied from interface:SnapshotUpdate
Use a particular executor to scan manifests. The default worker pool will be used by default.- Specified by:
scanManifestsWith
in interfaceSnapshotUpdate<ThisT>
- Parameters:
executorService
- the provided executor- Returns:
- this for method chaining
-
commitMetrics
protected CommitMetrics commitMetrics()
-
reportWith
protected ThisT reportWith(MetricsReporter newReporter)
-
targetBranch
protected void targetBranch(java.lang.String branch)
A setter for the target branch on which snapshot producer operation should be performed- Parameters:
branch
- to set as target branch
-
targetBranch
protected java.lang.String targetBranch()
-
workerPool
protected java.util.concurrent.ExecutorService workerPool()
-
deleteWith
public ThisT deleteWith(java.util.function.Consumer<java.lang.String> deleteCallback)
Description copied from interface:SnapshotUpdate
Set a callback to delete files instead of the table's default.- Specified by:
deleteWith
in interfaceSnapshotUpdate<ThisT>
- Parameters:
deleteCallback
- a String consumer used to delete locations.- Returns:
- this for method chaining
-
apply
public Snapshot apply()
Description copied from interface:PendingUpdate
Apply the pending changes and return the uncommitted changes for validation.This does not result in a permanent update.
- Specified by:
apply
in interfacePendingUpdate<ThisT>
- Returns:
- the uncommitted changes that would be committed by calling
PendingUpdate.commit()
-
current
protected TableMetadata current()
-
refresh
protected TableMetadata refresh()
-
commit
public void commit()
Description copied from interface:PendingUpdate
Apply the pending changes and commit.Changes are committed by calling the underlying table's commit method.
Once the commit is successful, the updated table will be refreshed.
- Specified by:
commit
in interfacePendingUpdate<ThisT>
-
cleanAll
protected void cleanAll()
-
deleteFile
protected void deleteFile(java.lang.String path)
-
manifestListPath
protected OutputFile manifestListPath()
-
newManifestOutput
protected OutputFile newManifestOutput()
-
newManifestWriter
protected ManifestWriter<DataFile> newManifestWriter(PartitionSpec spec)
-
newDeleteManifestWriter
protected ManifestWriter<DeleteFile> newDeleteManifestWriter(PartitionSpec spec)
-
newRollingManifestWriter
protected RollingManifestWriter<DataFile> newRollingManifestWriter(PartitionSpec spec)
-
newRollingDeleteManifestWriter
protected RollingManifestWriter<DeleteFile> newRollingDeleteManifestWriter(PartitionSpec spec)
-
newManifestReader
protected ManifestReader<DataFile> newManifestReader(ManifestFile manifest)
-
newDeleteManifestReader
protected ManifestReader<DeleteFile> newDeleteManifestReader(ManifestFile manifest)
-
snapshotId
protected long snapshotId()
-
canInheritSnapshotId
protected boolean canInheritSnapshotId()
-
-