Class BaseOverwriteFiles
- All Implemented Interfaces:
- OverwriteFiles,- PendingUpdate<Snapshot>,- SnapshotUpdate<OverwriteFiles>
- 
Constructor SummaryConstructorsModifierConstructorDescriptionprotectedBaseOverwriteFiles(String tableName, TableOperations ops) 
- 
Method SummaryModifier and TypeMethodDescriptionprotected voidAdd a data file to the new snapshot.protected voidadd(DeleteFile file) Add a delete file to the new snapshot.protected voidadd(DeleteFile file, long dataSequenceNumber) Add a delete file to the new snapshot.protected voidadd(ManifestFile manifest) Add all files in a manifest to the new snapshot.protected org.apache.iceberg.DeleteFileIndexaddedDeleteFiles(TableMetadata base, Long startingSnapshotId, Expression dataFilter, PartitionSet partitionSet, Snapshot parent) Returns matching delete files have been added to the table since a starting snapshot.Add aDataFileto the table.protected booleanprotected booleanapply()Apply the pending changes and return the uncommitted changes for validation.apply(TableMetadata base, Snapshot snapshot) Apply the update's changes to the given metadata and snapshot.protected booleancaseSensitive(boolean isCaseSensitive) protected voidcleanAll()protected voidcleanUncommitted(Set<ManifestFile> committed) Clean up any uncommitted manifests that were created.protected booleanvoidcommit()Apply the pending changes and commit.protected CommitMetricsconflictDetectionFilter(Expression newConflictDetectionFilter) Sets a conflict detection filter used to validate concurrently added data and delete files.protected TableMetadatacurrent()protected PartitionSpecdataSpec()protected voiddelete(CharSequence path) Add a specific data path to be deleted in the new snapshot.protected voidAdd a specific data file to be deleted in the new snapshot.protected voiddelete(DeleteFile file) Add a specific delete file to be deleted in the new snapshot.protected voiddeleteByRowFilter(Expression expr) Add a filter to match files to delete.protected voiddeleteFile(String path) deleteFile(DataFile file) Delete aDataFilefrom the table.deleteFiles(DataFileSet dataFilesToDelete, DeleteFileSet deleteFilesToDelete) Deletes a set of data files from the table with their respective delete files.protected booleanprotected booleandeleteWith(Consumer<String> deleteCallback) Set a callback to delete files instead of the table's default.protected voiddropPartition(int specId, StructLike partition) Add a partition tuple to drop from the table during the delete phase.protected voidprotected voidprotected booleanprotected OutputFileprotected ManifestReader<DeleteFile> newDeleteManifestReader(ManifestFile manifest) protected ManifestWriter<DeleteFile> protected EncryptedOutputFileprotected ManifestReader<DataFile> newManifestReader(ManifestFile manifest) protected ManifestWriter<DataFile> protected RollingManifestWriter<DeleteFile> protected RollingManifestWriter<DataFile> protected StringA string that describes the action that produced the new snapshot.protected TableOperationsops()Delete files that match anExpressionon data rows from the table.protected TableMetadatarefresh()protected OverwriteFilesreportWith(MetricsReporter newReporter) protected ExpressionscanManifestsWith(ExecutorService executorService) Use a particular executor to scan manifests.protected OverwriteFilesself()Set a summary property in the snapshot produced by this update.protected voidsetNewDataFilesDataSequenceNumber(long sequenceNumber) protected longCalled to stage a snapshot in table metadata, but not update the current snapshot id.summary()protected Stringprotected voidtargetBranch(String branch) A setter for the target branch on which snapshot producer operation should be performedPerform operations on a particular branchGenerates update event to notify about metadata changesprotected voidvalidate(TableMetadata base, Snapshot parent) Validate the current metadata.protected voidvalidateAddedDataFiles(TableMetadata base, Long startingSnapshotId, Expression conflictDetectionFilter, Snapshot parent) Validates that no files matching a filter have been added to the table since a starting snapshot.protected voidvalidateAddedDataFiles(TableMetadata base, Long startingSnapshotId, PartitionSet partitionSet, Snapshot parent) Validates that no files matching given partitions have been added to the table since a starting snapshot.protected voidvalidateAddedDVs(TableMetadata base, Long startingSnapshotId, Expression conflictDetectionFilter, Snapshot parent) Signal that each file added to the table must match the overwrite expression.protected voidvalidateDataFilesExist(TableMetadata base, Long startingSnapshotId, CharSequenceSet requiredDataFiles, boolean skipDeletes, Expression conflictDetectionFilter, Snapshot parent) protected voidvalidateDeletedDataFiles(TableMetadata base, Long startingSnapshotId, Expression dataFilter, Snapshot parent) Validates that no files matching a filter have been deleted from the table since a starting snapshot.protected voidvalidateDeletedDataFiles(TableMetadata base, Long startingSnapshotId, PartitionSet partitionSet, Snapshot parent) Validates that no files matching a filter have been deleted from the table since a starting snapshot.validateFromSnapshot(long snapshotId) Set the snapshot ID used in any reads for this operation.protected voidEnables validation that data added concurrently does not conflict with this commit's operation.Enables validation that deletes that happened concurrently do not conflict with this commit's operation.protected voidvalidateNoNewDeleteFiles(TableMetadata base, Long startingSnapshotId, Expression dataFilter, Snapshot parent) Validates that no delete files matching a filter have been added to the table since a starting snapshot.protected voidvalidateNoNewDeleteFiles(TableMetadata base, Long startingSnapshotId, PartitionSet partitionSet, Snapshot parent) Validates that no delete files matching a partition set have been added to the table since a starting snapshot.protected voidvalidateNoNewDeletesForDataFiles(TableMetadata base, Long startingSnapshotId, Iterable<DataFile> dataFiles, Snapshot parent) Validates that no new delete files that must be applied to the given data files have been added to the table since a starting snapshot.protected voidvalidateNoNewDeletesForDataFiles(TableMetadata base, Long startingSnapshotId, Expression dataFilter, Iterable<DataFile> dataFiles, Snapshot parent) Validates that no new delete files that must be applied to the given data files have been added to the table since a starting snapshot.protected ExecutorServiceprotected List<ManifestFile> writeDataManifests(Collection<DataFile> files, Long dataSeq, PartitionSpec spec) protected List<ManifestFile> writeDataManifests(Collection<DataFile> files, PartitionSpec spec) protected List<ManifestFile> writeDeleteManifests(Collection<DeleteFile> files, PartitionSpec spec) Methods inherited from class java.lang.Objectclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.iceberg.OverwriteFilescaseSensitiveMethods inherited from interface org.apache.iceberg.PendingUpdateapply, commit, updateEventMethods inherited from interface org.apache.iceberg.SnapshotUpdatedeleteWith, scanManifestsWith, set, stageOnly
- 
Constructor Details- 
BaseOverwriteFiles
 
- 
- 
Method Details- 
self
- 
operationA string that describes the action that produced the new snapshot.- Returns:
- a string operation
 
- 
overwriteByRowFilterDescription copied from interface:OverwriteFilesDelete files that match anExpressionon data rows from the table.A file is selected to be deleted by the expression if it could contain any rows that match the expression (candidate files are selected using an inclusive projection). These candidate files are deleted if all of the rows in the file must match the expression (the partition data matches the expression'sProjections.strict(PartitionSpec)strict projection}). This guarantees that files are deleted if and only if all rows in the file must match the expression.Files that may contain some rows that match the expression and some rows that do not will result in a ValidationException.- Specified by:
- overwriteByRowFilterin interface- OverwriteFiles
- Parameters:
- expr- an expression on rows in the table
- Returns:
- this for method chaining
 
- 
addFileDescription copied from interface:OverwriteFilesAdd aDataFileto the table.- Specified by:
- addFilein interface- OverwriteFiles
- Parameters:
- file- a data file
- Returns:
- this for method chaining
 
- 
deleteFilesDescription copied from interface:OverwriteFilesDeletes a set of data files from the table with their respective delete files.- Specified by:
- deleteFilesin interface- OverwriteFiles
- Parameters:
- dataFilesToDelete- the data files to be deleted from the table
- deleteFilesToDelete- the delete files corresponding to the data files to be deleted from the table
- Returns:
- this for method chaining
 
- 
deleteFileDescription copied from interface:OverwriteFilesDelete aDataFilefrom the table.- Specified by:
- deleteFilein interface- OverwriteFiles
- Parameters:
- file- a data file
- Returns:
- this for method chaining
 
- 
validateAddedFilesMatchOverwriteFilterDescription copied from interface:OverwriteFilesSignal that each file added to the table must match the overwrite expression.If this method is called, each added file is validated on commit to ensure that it matches the overwrite row filter. This is used to ensure that writes are idempotent: that files cannot be added during a commit that would not be removed if the operation were run a second time. - Specified by:
- validateAddedFilesMatchOverwriteFilterin interface- OverwriteFiles
- Returns:
- this for method chaining
 
- 
validateFromSnapshotDescription copied from interface:OverwriteFilesSet the snapshot ID used in any reads for this operation.Validations will check changes after this snapshot ID. If the from snapshot is not set, all ancestor snapshots through the table's initial snapshot are validated. - Specified by:
- validateFromSnapshotin interface- OverwriteFiles
- Parameters:
- snapshotId- a snapshot ID
- Returns:
- this for method chaining
 
- 
conflictDetectionFilterDescription copied from interface:OverwriteFilesSets a conflict detection filter used to validate concurrently added data and delete files.- Specified by:
- conflictDetectionFilterin interface- OverwriteFiles
- Parameters:
- newConflictDetectionFilter- an expression on rows in the table
- Returns:
- this for method chaining
 
- 
validateNoConflictingDataDescription copied from interface:OverwriteFilesEnables validation that data added concurrently does not conflict with this commit's operation.This method should be called while committing non-idempotent overwrite operations. If a concurrent operation commits a new file after the data was read and that file might contain rows matching the specified conflict detection filter, the overwrite operation will detect this and fail. Calling this method with a correct conflict detection filter is required to maintain isolation for non-idempotent overwrite operations. Validation uses the conflict detection filter passed to OverwriteFiles.conflictDetectionFilter(Expression)and applies to operations that happened after the snapshot passed toOverwriteFiles.validateFromSnapshot(long). If the conflict detection filter is not set, any new data added concurrently will fail this overwrite operation.- Specified by:
- validateNoConflictingDatain interface- OverwriteFiles
- Returns:
- this for method chaining
 
- 
validateNoConflictingDeletesDescription copied from interface:OverwriteFilesEnables validation that deletes that happened concurrently do not conflict with this commit's operation.Validating concurrent deletes is required during non-idempotent overwrite operations. If a concurrent operation deletes data in one of the files being overwritten, the overwrite operation must be aborted as it may undelete rows that were removed concurrently. Calling this method with a correct conflict detection filter is required to maintain isolation for non-idempotent overwrite operations. Validation uses the conflict detection filter passed to OverwriteFiles.conflictDetectionFilter(Expression)and applies to operations that happened after the snapshot passed toOverwriteFiles.validateFromSnapshot(long). If the conflict detection filter is not set, this operation will use the row filter provided inOverwriteFiles.overwriteByRowFilter(Expression)to check for new delete files and will ensure there are no conflicting deletes for data files removed viaOverwriteFiles.deleteFile(DataFile).- Specified by:
- validateNoConflictingDeletesin interface- OverwriteFiles
- Returns:
- this for method chaining
 
- 
toBranchDescription copied from interface:SnapshotUpdatePerform operations on a particular branch- Specified by:
- toBranchin interface- SnapshotUpdate<OverwriteFiles>
- Parameters:
- branch- which is name of SnapshotRef of type branch.
 
- 
validateValidate the current metadata.Child operations can override this to add custom validation. - Parameters:
- base- current table metadata to validate
- parent- ending snapshot on the lineage which is being validated
 
- 
setDescription copied from interface:SnapshotUpdateSet a summary property in the snapshot produced by this update.- Parameters:
- property- a String property name
- value- a String property value
- Returns:
- this for method chaining
 
- 
caseSensitive
- 
isCaseSensitiveprotected boolean isCaseSensitive()
- 
dataSpec
- 
rowFilter
- 
addedDataFiles
- 
failAnyDeleteprotected void failAnyDelete()
- 
failMissingDeletePathsprotected void failMissingDeletePaths()
- 
deleteByRowFilterAdd a filter to match files to delete. A file will be deleted if all of the rows it contains match this or any other filter passed to this method.- Parameters:
- expr- an expression to match rows.
 
- 
dropPartitionAdd a partition tuple to drop from the table during the delete phase.
- 
deleteAdd a specific data file to be deleted in the new snapshot.
- 
deleteAdd a specific delete file to be deleted in the new snapshot.
- 
deleteAdd a specific data path to be deleted in the new snapshot.
- 
deletesDataFilesprotected boolean deletesDataFiles()
- 
deletesDeleteFilesprotected boolean deletesDeleteFiles()
- 
addsDataFilesprotected boolean addsDataFiles()
- 
addsDeleteFilesprotected boolean addsDeleteFiles()
- 
addAdd a data file to the new snapshot.
- 
addAdd a delete file to the new snapshot.
- 
addAdd a delete file to the new snapshot.
- 
validateNewDeleteFile
- 
addAdd all files in a manifest to the new snapshot.
- 
validateAddedDataFilesprotected void validateAddedDataFiles(TableMetadata base, Long startingSnapshotId, PartitionSet partitionSet, Snapshot parent) Validates that no files matching given partitions have been added to the table since a starting snapshot.- Parameters:
- base- table metadata to validate
- startingSnapshotId- id of the snapshot current at the start of the operation
- partitionSet- a set of partitions to filter new conflicting data files
- parent- ending snapshot on the lineage being validated
 
- 
validateAddedDataFilesprotected void validateAddedDataFiles(TableMetadata base, Long startingSnapshotId, Expression conflictDetectionFilter, Snapshot parent) Validates that no files matching a filter have been added to the table since a starting snapshot.- Parameters:
- base- table metadata to validate
- startingSnapshotId- id of the snapshot current at the start of the operation
- conflictDetectionFilter- an expression used to find new conflicting data files
 
- 
validateNoNewDeletesForDataFilesprotected void validateNoNewDeletesForDataFiles(TableMetadata base, Long startingSnapshotId, Iterable<DataFile> dataFiles, Snapshot parent) Validates that no new delete files that must be applied to the given data files have been added to the table since a starting snapshot.- Parameters:
- base- table metadata to validate
- startingSnapshotId- id of the snapshot current at the start of the operation
- dataFiles- data files to validate have no new row deletes
- parent- ending snapshot on the branch being validated
 
- 
validateNoNewDeletesForDataFilesprotected void validateNoNewDeletesForDataFiles(TableMetadata base, Long startingSnapshotId, Expression dataFilter, Iterable<DataFile> dataFiles, Snapshot parent) Validates that no new delete files that must be applied to the given data files have been added to the table since a starting snapshot.- Parameters:
- base- table metadata to validate
- startingSnapshotId- id of the snapshot current at the start of the operation
- dataFilter- a data filter
- dataFiles- data files to validate have no new row deletes
- parent- ending snapshot on the branch being validated
 
- 
validateNoNewDeleteFilesprotected void validateNoNewDeleteFiles(TableMetadata base, Long startingSnapshotId, Expression dataFilter, Snapshot parent) Validates that no delete files matching a filter have been added to the table since a starting snapshot.- Parameters:
- base- table metadata to validate
- startingSnapshotId- id of the snapshot current at the start of the operation
- dataFilter- an expression used to find new conflicting delete files
- parent- ending snapshot on the branch being validated
 
- 
validateNoNewDeleteFilesprotected void validateNoNewDeleteFiles(TableMetadata base, Long startingSnapshotId, PartitionSet partitionSet, Snapshot parent) Validates that no delete files matching a partition set have been added to the table since a starting snapshot.- Parameters:
- base- table metadata to validate
- startingSnapshotId- id of the snapshot current at the start of the operation
- partitionSet- a partition set used to find new conflicting delete files
- parent- ending snapshot on the branch being validated
 
- 
addedDeleteFilesprotected org.apache.iceberg.DeleteFileIndex addedDeleteFiles(TableMetadata base, Long startingSnapshotId, Expression dataFilter, PartitionSet partitionSet, Snapshot parent) Returns matching delete files have been added to the table since a starting snapshot.- Parameters:
- base- table metadata to validate
- startingSnapshotId- id of the snapshot current at the start of the operation
- dataFilter- an expression used to find delete files
- partitionSet- a partition set used to find delete files
- parent- parent snapshot of the branch
 
- 
validateDeletedDataFilesprotected void validateDeletedDataFiles(TableMetadata base, Long startingSnapshotId, Expression dataFilter, Snapshot parent) Validates that no files matching a filter have been deleted from the table since a starting snapshot.- Parameters:
- base- table metadata to validate
- startingSnapshotId- id of the snapshot current at the start of the operation
- dataFilter- an expression used to find deleted data files
- parent- ending snapshot on the branch being validated
 
- 
validateDeletedDataFilesprotected void validateDeletedDataFiles(TableMetadata base, Long startingSnapshotId, PartitionSet partitionSet, Snapshot parent) Validates that no files matching a filter have been deleted from the table since a starting snapshot.- Parameters:
- base- table metadata to validate
- startingSnapshotId- id of the snapshot current at the start of the operation
- partitionSet- a partition set used to find deleted data files
- parent- ending snapshot on the branch being validated
 
- 
setNewDataFilesDataSequenceNumberprotected void setNewDataFilesDataSequenceNumber(long sequenceNumber) 
- 
validateDataFilesExistprotected void validateDataFilesExist(TableMetadata base, Long startingSnapshotId, CharSequenceSet requiredDataFiles, boolean skipDeletes, Expression conflictDetectionFilter, Snapshot parent) 
- 
validateAddedDVsprotected void validateAddedDVs(TableMetadata base, Long startingSnapshotId, Expression conflictDetectionFilter, Snapshot parent) 
- 
summary
- 
applyApply the update's changes to the given metadata and snapshot. Return the new manifest list.- Parameters:
- base- the base table metadata to apply changes to
- snapshot- snapshot to apply the changes to
- Returns:
- a manifest list for the new snapshot.
 
- 
updateEventDescription copied from interface:PendingUpdateGenerates update event to notify about metadata changes- Returns:
- the generated event
 
- 
cleanUncommittedClean up any uncommitted manifests that were created.Manifests may not be committed if apply is called more because a commit conflict has occurred. Implementations may keep around manifests because the same changes will be made by both apply calls. This method instructs the implementation to clean up those manifests and passes the paths of the manifests that were actually committed. - Parameters:
- committed- a set of manifest paths that were actually committed
 
- 
stageOnlyDescription copied from interface:SnapshotUpdateCalled to stage a snapshot in table metadata, but not update the current snapshot id.- Specified by:
- stageOnlyin interface- SnapshotUpdate<ThisT>
- Returns:
- this for method chaining
 
- 
scanManifestsWithDescription copied from interface:SnapshotUpdateUse a particular executor to scan manifests. The default worker pool will be used by default.- Specified by:
- scanManifestsWithin interface- SnapshotUpdate<ThisT>
- Parameters:
- executorService- the provided executor
- Returns:
- this for method chaining
 
- 
ops
- 
commitMetrics
- 
reportWith
- 
targetBranchA setter for the target branch on which snapshot producer operation should be performed- Parameters:
- branch- to set as target branch
 
- 
targetBranch
- 
workerPool
- 
deleteWithDescription copied from interface:SnapshotUpdateSet a callback to delete files instead of the table's default.- Specified by:
- deleteWithin interface- SnapshotUpdate<ThisT>
- Parameters:
- deleteCallback- a String consumer used to delete locations.
- Returns:
- this for method chaining
 
- 
applyDescription copied from interface:PendingUpdateApply the pending changes and return the uncommitted changes for validation.This does not result in a permanent update. - Specified by:
- applyin interface- PendingUpdate<ThisT>
- Returns:
- the uncommitted changes that would be committed by calling PendingUpdate.commit()
 
- 
current
- 
refresh
- 
commitpublic void commit()Description copied from interface:PendingUpdateApply the pending changes and commit.Changes are committed by calling the underlying table's commit method. Once the commit is successful, the updated table will be refreshed. - Specified by:
- commitin interface- PendingUpdate<ThisT>
 
- 
cleanAllprotected void cleanAll()
- 
deleteFile
- 
manifestListPath
- 
newManifestOutputFile
- 
newManifestWriter
- 
newDeleteManifestWriter
- 
newRollingManifestWriter
- 
newRollingDeleteManifestWriter
- 
newManifestReader
- 
newDeleteManifestReader
- 
snapshotIdprotected long snapshotId()
- 
canInheritSnapshotIdprotected boolean canInheritSnapshotId()
- 
cleanupAfterCommitprotected boolean cleanupAfterCommit()
- 
writeDataManifests
- 
writeDataManifestsprotected List<ManifestFile> writeDataManifests(Collection<DataFile> files, Long dataSeq, PartitionSpec spec) 
- 
writeDeleteManifests
 
-