Class BaseReplacePartitions

    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      protected void add​(DataFile file)
      Add a data file to the new snapshot.
      protected void add​(DeleteFile file)
      Add a delete file to the new snapshot.
      protected void add​(DeleteFile file, long dataSequenceNumber)
      Add a delete file to the new snapshot.
      protected void add​(ManifestFile manifest)
      Add all files in a manifest to the new snapshot.
      protected java.util.List<DataFile> addedDataFiles()  
      protected org.apache.iceberg.DeleteFileIndex addedDeleteFiles​(TableMetadata base, java.lang.Long startingSnapshotId, Expression dataFilter, PartitionSet partitionSet, Snapshot parent)
      Returns matching delete files have been added to the table since a starting snapshot.
      ReplacePartitions addFile​(DataFile file)
      Add a DataFile to the table.
      protected boolean addsDataFiles()  
      protected boolean addsDeleteFiles()  
      Snapshot apply()
      Apply the pending changes and return the uncommitted changes for validation.
      java.util.List<ManifestFile> apply​(TableMetadata base, Snapshot snapshot)
      Apply the update's changes to the given metadata and snapshot.
      protected boolean canInheritSnapshotId()  
      ThisT caseSensitive​(boolean isCaseSensitive)  
      protected void cleanAll()  
      protected void cleanUncommitted​(java.util.Set<ManifestFile> committed)
      Clean up any uncommitted manifests that were created.
      protected boolean cleanupAfterCommit()  
      void commit()
      Apply the pending changes and commit.
      protected CommitMetrics commitMetrics()  
      protected TableMetadata current()  
      protected PartitionSpec dataSpec()  
      protected void delete​(java.lang.CharSequence path)
      Add a specific data path to be deleted in the new snapshot.
      protected void delete​(DataFile file)
      Add a specific data file to be deleted in the new snapshot.
      protected void delete​(DeleteFile file)
      Add a specific delete file to be deleted in the new snapshot.
      protected void deleteByRowFilter​(Expression expr)
      Add a filter to match files to delete.
      protected void deleteFile​(java.lang.String path)  
      protected boolean deletesDataFiles()  
      protected boolean deletesDeleteFiles()  
      ThisT deleteWith​(java.util.function.Consumer<java.lang.String> deleteCallback)
      Set a callback to delete files instead of the table's default.
      protected void dropPartition​(int specId, StructLike partition)
      Add a partition tuple to drop from the table during the delete phase.
      protected void failAnyDelete()  
      protected void failMissingDeletePaths()  
      protected boolean isCaseSensitive()  
      protected OutputFile manifestListPath()  
      protected ManifestReader<DeleteFile> newDeleteManifestReader​(ManifestFile manifest)  
      protected ManifestWriter<DeleteFile> newDeleteManifestWriter​(PartitionSpec spec)  
      protected EncryptedOutputFile newManifestOutputFile()  
      protected ManifestReader<DataFile> newManifestReader​(ManifestFile manifest)  
      protected ManifestWriter<DataFile> newManifestWriter​(PartitionSpec spec)  
      protected RollingManifestWriter<DeleteFile> newRollingDeleteManifestWriter​(PartitionSpec spec)  
      protected RollingManifestWriter<DataFile> newRollingManifestWriter​(PartitionSpec spec)  
      protected java.lang.String operation()
      A string that describes the action that produced the new snapshot.
      protected TableMetadata refresh()  
      protected ThisT reportWith​(MetricsReporter newReporter)  
      protected Expression rowFilter()  
      ThisT scanManifestsWith​(java.util.concurrent.ExecutorService executorService)
      Use a particular executor to scan manifests.
      protected ReplacePartitions self()  
      ThisT set​(java.lang.String property, java.lang.String value)
      Set a summary property in the snapshot produced by this update.
      protected void setNewDataFilesDataSequenceNumber​(long sequenceNumber)  
      protected long snapshotId()  
      ThisT stageOnly()
      Called to stage a snapshot in table metadata, but not update the current snapshot id.
      protected java.util.Map<java.lang.String,​java.lang.String> summary()  
      protected java.lang.String targetBranch()  
      protected void targetBranch​(java.lang.String branch)
      A setter for the target branch on which snapshot producer operation should be performed
      BaseReplacePartitions toBranch​(java.lang.String branch)
      Perform operations on a particular branch
      java.lang.Object updateEvent()
      Generates update event to notify about metadata changes
      void validate​(TableMetadata currentMetadata, Snapshot parent)
      Validate the current metadata.
      protected void validateAddedDataFiles​(TableMetadata base, java.lang.Long startingSnapshotId, Expression conflictDetectionFilter, Snapshot parent)
      Validates that no files matching a filter have been added to the table since a starting snapshot.
      protected void validateAddedDataFiles​(TableMetadata base, java.lang.Long startingSnapshotId, PartitionSet partitionSet, Snapshot parent)
      Validates that no files matching given partitions have been added to the table since a starting snapshot.
      ReplacePartitions validateAppendOnly()
      Validate that no partitions will be replaced and the operation is append-only.
      protected void validateDataFilesExist​(TableMetadata base, java.lang.Long startingSnapshotId, CharSequenceSet requiredDataFiles, boolean skipDeletes, Expression conflictDetectionFilter, Snapshot parent)  
      protected void validateDeletedDataFiles​(TableMetadata base, java.lang.Long startingSnapshotId, Expression dataFilter, Snapshot parent)
      Validates that no files matching a filter have been deleted from the table since a starting snapshot.
      protected void validateDeletedDataFiles​(TableMetadata base, java.lang.Long startingSnapshotId, PartitionSet partitionSet, Snapshot parent)
      Validates that no files matching a filter have been deleted from the table since a starting snapshot.
      ReplacePartitions validateFromSnapshot​(long newStartingSnapshotId)
      Set the snapshot ID used in validations for this operation.
      ReplacePartitions validateNoConflictingData()
      Enables validation that data added concurrently does not conflict with this commit's operation.
      ReplacePartitions validateNoConflictingDeletes()
      Enables validation that deletes that happened concurrently do not conflict with this commit's operation.
      protected void validateNoNewDeleteFiles​(TableMetadata base, java.lang.Long startingSnapshotId, Expression dataFilter, Snapshot parent)
      Validates that no delete files matching a filter have been added to the table since a starting snapshot.
      protected void validateNoNewDeleteFiles​(TableMetadata base, java.lang.Long startingSnapshotId, PartitionSet partitionSet, Snapshot parent)
      Validates that no delete files matching a partition set have been added to the table since a starting snapshot.
      protected void validateNoNewDeletesForDataFiles​(TableMetadata base, java.lang.Long startingSnapshotId, java.lang.Iterable<DataFile> dataFiles, Snapshot parent)
      Validates that no new delete files that must be applied to the given data files have been added to the table since a starting snapshot.
      protected void validateNoNewDeletesForDataFiles​(TableMetadata base, java.lang.Long startingSnapshotId, Expression dataFilter, java.lang.Iterable<DataFile> dataFiles, Snapshot parent)
      Validates that no new delete files that must be applied to the given data files have been added to the table since a starting snapshot.
      protected java.util.concurrent.ExecutorService workerPool()  
      protected java.util.List<ManifestFile> writeDataManifests​(java.util.Collection<DataFile> files, java.lang.Long dataSeq, PartitionSpec spec)  
      protected java.util.List<ManifestFile> writeDataManifests​(java.util.Collection<DataFile> files, PartitionSpec spec)  
      protected java.util.List<ManifestFile> writeDeleteManifests​(java.util.Collection<DeleteFile> files, PartitionSpec spec)  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • operation

        protected java.lang.String operation()
        A string that describes the action that produced the new snapshot.
        Returns:
        a string operation
      • validateFromSnapshot

        public ReplacePartitions validateFromSnapshot​(long newStartingSnapshotId)
        Description copied from interface: ReplacePartitions
        Set the snapshot ID used in validations for this operation.

        All validations will check changes after this snapshot ID. If this is not called, validation will occur from the beginning of the table's history.

        This method should be called before this operation is committed. If a concurrent operation committed a data or delta file or removed a data file after the given snapshot ID that might contain rows matching a partition marked for deletion, validation will detect this and fail.

        Specified by:
        validateFromSnapshot in interface ReplacePartitions
        Parameters:
        newStartingSnapshotId - a snapshot ID, it should be set to when this operation started to read the table.
        Returns:
        this for method chaining
      • validateNoConflictingDeletes

        public ReplacePartitions validateNoConflictingDeletes()
        Description copied from interface: ReplacePartitions
        Enables validation that deletes that happened concurrently do not conflict with this commit's operation.

        Validating concurrent deletes is required during non-idempotent replace partition operations. This will check if a concurrent operation deletes data in any of the partitions being overwritten, as the replace partition must be aborted to avoid undeleting rows that were removed concurrently.

        Specified by:
        validateNoConflictingDeletes in interface ReplacePartitions
        Returns:
        this for method chaining
      • validateNoConflictingData

        public ReplacePartitions validateNoConflictingData()
        Description copied from interface: ReplacePartitions
        Enables validation that data added concurrently does not conflict with this commit's operation.

        Validating concurrent data files is required during non-idempotent replace partition operations. This will check if a concurrent operation inserts data in any of the partitions being overwritten, as the replace partition must be aborted to avoid removing rows added concurrently.

        Specified by:
        validateNoConflictingData in interface ReplacePartitions
        Returns:
        this for method chaining
      • validate

        public void validate​(TableMetadata currentMetadata,
                             Snapshot parent)
        Validate the current metadata.

        Child operations can override this to add custom validation.

        Parameters:
        currentMetadata - current table metadata to validate
        parent - ending snapshot on the lineage which is being validated
      • apply

        public java.util.List<ManifestFile> apply​(TableMetadata base,
                                                  Snapshot snapshot)
        Apply the update's changes to the given metadata and snapshot. Return the new manifest list.
        Parameters:
        base - the base table metadata to apply changes to
        snapshot - snapshot to apply the changes to
        Returns:
        a manifest list for the new snapshot.
      • set

        public ThisT set​(java.lang.String property,
                         java.lang.String value)
        Description copied from interface: SnapshotUpdate
        Set a summary property in the snapshot produced by this update.
        Parameters:
        property - a String property name
        value - a String property value
        Returns:
        this for method chaining
      • caseSensitive

        public ThisT caseSensitive​(boolean isCaseSensitive)
      • isCaseSensitive

        protected boolean isCaseSensitive()
      • addedDataFiles

        protected java.util.List<DataFile> addedDataFiles()
      • failAnyDelete

        protected void failAnyDelete()
      • failMissingDeletePaths

        protected void failMissingDeletePaths()
      • deleteByRowFilter

        protected void deleteByRowFilter​(Expression expr)
        Add a filter to match files to delete. A file will be deleted if all of the rows it contains match this or any other filter passed to this method.
        Parameters:
        expr - an expression to match rows.
      • dropPartition

        protected void dropPartition​(int specId,
                                     StructLike partition)
        Add a partition tuple to drop from the table during the delete phase.
      • delete

        protected void delete​(DataFile file)
        Add a specific data file to be deleted in the new snapshot.
      • delete

        protected void delete​(DeleteFile file)
        Add a specific delete file to be deleted in the new snapshot.
      • delete

        protected void delete​(java.lang.CharSequence path)
        Add a specific data path to be deleted in the new snapshot.
      • deletesDataFiles

        protected boolean deletesDataFiles()
      • deletesDeleteFiles

        protected boolean deletesDeleteFiles()
      • addsDataFiles

        protected boolean addsDataFiles()
      • addsDeleteFiles

        protected boolean addsDeleteFiles()
      • add

        protected void add​(DataFile file)
        Add a data file to the new snapshot.
      • add

        protected void add​(DeleteFile file)
        Add a delete file to the new snapshot.
      • add

        protected void add​(DeleteFile file,
                           long dataSequenceNumber)
        Add a delete file to the new snapshot.
      • add

        protected void add​(ManifestFile manifest)
        Add all files in a manifest to the new snapshot.
      • validateAddedDataFiles

        protected void validateAddedDataFiles​(TableMetadata base,
                                              java.lang.Long startingSnapshotId,
                                              PartitionSet partitionSet,
                                              Snapshot parent)
        Validates that no files matching given partitions have been added to the table since a starting snapshot.
        Parameters:
        base - table metadata to validate
        startingSnapshotId - id of the snapshot current at the start of the operation
        partitionSet - a set of partitions to filter new conflicting data files
        parent - ending snapshot on the lineage being validated
      • validateAddedDataFiles

        protected void validateAddedDataFiles​(TableMetadata base,
                                              java.lang.Long startingSnapshotId,
                                              Expression conflictDetectionFilter,
                                              Snapshot parent)
        Validates that no files matching a filter have been added to the table since a starting snapshot.
        Parameters:
        base - table metadata to validate
        startingSnapshotId - id of the snapshot current at the start of the operation
        conflictDetectionFilter - an expression used to find new conflicting data files
      • validateNoNewDeletesForDataFiles

        protected void validateNoNewDeletesForDataFiles​(TableMetadata base,
                                                        java.lang.Long startingSnapshotId,
                                                        java.lang.Iterable<DataFile> dataFiles,
                                                        Snapshot parent)
        Validates that no new delete files that must be applied to the given data files have been added to the table since a starting snapshot.
        Parameters:
        base - table metadata to validate
        startingSnapshotId - id of the snapshot current at the start of the operation
        dataFiles - data files to validate have no new row deletes
        parent - ending snapshot on the branch being validated
      • validateNoNewDeletesForDataFiles

        protected void validateNoNewDeletesForDataFiles​(TableMetadata base,
                                                        java.lang.Long startingSnapshotId,
                                                        Expression dataFilter,
                                                        java.lang.Iterable<DataFile> dataFiles,
                                                        Snapshot parent)
        Validates that no new delete files that must be applied to the given data files have been added to the table since a starting snapshot.
        Parameters:
        base - table metadata to validate
        startingSnapshotId - id of the snapshot current at the start of the operation
        dataFilter - a data filter
        dataFiles - data files to validate have no new row deletes
        parent - ending snapshot on the branch being validated
      • validateNoNewDeleteFiles

        protected void validateNoNewDeleteFiles​(TableMetadata base,
                                                java.lang.Long startingSnapshotId,
                                                Expression dataFilter,
                                                Snapshot parent)
        Validates that no delete files matching a filter have been added to the table since a starting snapshot.
        Parameters:
        base - table metadata to validate
        startingSnapshotId - id of the snapshot current at the start of the operation
        dataFilter - an expression used to find new conflicting delete files
        parent - ending snapshot on the branch being validated
      • validateNoNewDeleteFiles

        protected void validateNoNewDeleteFiles​(TableMetadata base,
                                                java.lang.Long startingSnapshotId,
                                                PartitionSet partitionSet,
                                                Snapshot parent)
        Validates that no delete files matching a partition set have been added to the table since a starting snapshot.
        Parameters:
        base - table metadata to validate
        startingSnapshotId - id of the snapshot current at the start of the operation
        partitionSet - a partition set used to find new conflicting delete files
        parent - ending snapshot on the branch being validated
      • addedDeleteFiles

        protected org.apache.iceberg.DeleteFileIndex addedDeleteFiles​(TableMetadata base,
                                                                      java.lang.Long startingSnapshotId,
                                                                      Expression dataFilter,
                                                                      PartitionSet partitionSet,
                                                                      Snapshot parent)
        Returns matching delete files have been added to the table since a starting snapshot.
        Parameters:
        base - table metadata to validate
        startingSnapshotId - id of the snapshot current at the start of the operation
        dataFilter - an expression used to find delete files
        partitionSet - a partition set used to find delete files
        parent - parent snapshot of the branch
      • validateDeletedDataFiles

        protected void validateDeletedDataFiles​(TableMetadata base,
                                                java.lang.Long startingSnapshotId,
                                                Expression dataFilter,
                                                Snapshot parent)
        Validates that no files matching a filter have been deleted from the table since a starting snapshot.
        Parameters:
        base - table metadata to validate
        startingSnapshotId - id of the snapshot current at the start of the operation
        dataFilter - an expression used to find deleted data files
        parent - ending snapshot on the branch being validated
      • validateDeletedDataFiles

        protected void validateDeletedDataFiles​(TableMetadata base,
                                                java.lang.Long startingSnapshotId,
                                                PartitionSet partitionSet,
                                                Snapshot parent)
        Validates that no files matching a filter have been deleted from the table since a starting snapshot.
        Parameters:
        base - table metadata to validate
        startingSnapshotId - id of the snapshot current at the start of the operation
        partitionSet - a partition set used to find deleted data files
        parent - ending snapshot on the branch being validated
      • setNewDataFilesDataSequenceNumber

        protected void setNewDataFilesDataSequenceNumber​(long sequenceNumber)
      • validateDataFilesExist

        protected void validateDataFilesExist​(TableMetadata base,
                                              java.lang.Long startingSnapshotId,
                                              CharSequenceSet requiredDataFiles,
                                              boolean skipDeletes,
                                              Expression conflictDetectionFilter,
                                              Snapshot parent)
      • summary

        protected java.util.Map<java.lang.String,​java.lang.String> summary()
      • updateEvent

        public java.lang.Object updateEvent()
        Description copied from interface: PendingUpdate
        Generates update event to notify about metadata changes
        Returns:
        the generated event
      • cleanUncommitted

        protected void cleanUncommitted​(java.util.Set<ManifestFile> committed)
        Clean up any uncommitted manifests that were created.

        Manifests may not be committed if apply is called more because a commit conflict has occurred. Implementations may keep around manifests because the same changes will be made by both apply calls. This method instructs the implementation to clean up those manifests and passes the paths of the manifests that were actually committed.

        Parameters:
        committed - a set of manifest paths that were actually committed
      • stageOnly

        public ThisT stageOnly()
        Description copied from interface: SnapshotUpdate
        Called to stage a snapshot in table metadata, but not update the current snapshot id.
        Specified by:
        stageOnly in interface SnapshotUpdate<ThisT>
        Returns:
        this for method chaining
      • scanManifestsWith

        public ThisT scanManifestsWith​(java.util.concurrent.ExecutorService executorService)
        Description copied from interface: SnapshotUpdate
        Use a particular executor to scan manifests. The default worker pool will be used by default.
        Specified by:
        scanManifestsWith in interface SnapshotUpdate<ThisT>
        Parameters:
        executorService - the provided executor
        Returns:
        this for method chaining
      • targetBranch

        protected void targetBranch​(java.lang.String branch)
        A setter for the target branch on which snapshot producer operation should be performed
        Parameters:
        branch - to set as target branch
      • targetBranch

        protected java.lang.String targetBranch()
      • workerPool

        protected java.util.concurrent.ExecutorService workerPool()
      • deleteWith

        public ThisT deleteWith​(java.util.function.Consumer<java.lang.String> deleteCallback)
        Description copied from interface: SnapshotUpdate
        Set a callback to delete files instead of the table's default.
        Specified by:
        deleteWith in interface SnapshotUpdate<ThisT>
        Parameters:
        deleteCallback - a String consumer used to delete locations.
        Returns:
        this for method chaining
      • apply

        public Snapshot apply()
        Description copied from interface: PendingUpdate
        Apply the pending changes and return the uncommitted changes for validation.

        This does not result in a permanent update.

        Specified by:
        apply in interface PendingUpdate<ThisT>
        Returns:
        the uncommitted changes that would be committed by calling PendingUpdate.commit()
      • commit

        public void commit()
        Description copied from interface: PendingUpdate
        Apply the pending changes and commit.

        Changes are committed by calling the underlying table's commit method.

        Once the commit is successful, the updated table will be refreshed.

        Specified by:
        commit in interface PendingUpdate<ThisT>
      • cleanAll

        protected void cleanAll()
      • deleteFile

        protected void deleteFile​(java.lang.String path)
      • manifestListPath

        protected OutputFile manifestListPath()
      • snapshotId

        protected long snapshotId()
      • canInheritSnapshotId

        protected boolean canInheritSnapshotId()
      • cleanupAfterCommit

        protected boolean cleanupAfterCommit()
      • writeDataManifests

        protected java.util.List<ManifestFile> writeDataManifests​(java.util.Collection<DataFile> files,
                                                                  java.lang.Long dataSeq,
                                                                  PartitionSpec spec)