Class ExpireSnapshotsAction

  • All Implemented Interfaces:
    Action<ExpireSnapshotsAction,​ExpireSnapshotsActionResult>

    @Deprecated
    public class ExpireSnapshotsAction
    extends java.lang.Object
    implements Action<ExpireSnapshotsAction,​ExpireSnapshotsActionResult>
    Deprecated.
    since 0.12.0, will be removed in 0.13.0; use BaseExpireSnapshotsSparkAction instead.
    An action which performs the same operation as ExpireSnapshots but uses Spark to determine the delta in files between the pre and post-expiration table metadata. All of the same restrictions of Remove Snapshots also apply to this action.

    This implementation uses the metadata tables for the table being expired to list all Manifest and DataFiles. This is made into a Dataframe which are anti-joined with the same list read after the expiration. This operation will require a shuffle so parallelism can be controlled through spark.sql.shuffle.partitions. The expiration is done locally using a direct call to RemoveSnapshots. The snapshot expiration will be fully committed before any deletes are issued. Deletes are still performed locally after retrieving the results from the Spark executors.

    • Method Detail

      • streamDeleteResults

        public ExpireSnapshotsAction streamDeleteResults​(boolean stream)
        Deprecated.
        By default, all files to delete are brought to the driver at once which may be an issue with very long file lists. Set this to true to use toLocalIterator if you are running into memory issues when collecting the list of files to be deleted.
        Parameters:
        stream - whether to use toLocalIterator to stream results instead of collect.
        Returns:
        this for method chaining
      • executeDeleteWith

        public ExpireSnapshotsAction executeDeleteWith​(java.util.concurrent.ExecutorService executorService)
        Deprecated.
        An executor service used when deleting files. Only used during the local delete phase of this Spark action. Similar to ExpireSnapshots.executeDeleteWith(ExecutorService)
        Parameters:
        executorService - the service to use
        Returns:
        this for method chaining
      • expireOlderThan

        public ExpireSnapshotsAction expireOlderThan​(long timestampMillis)
        Deprecated.
        Expire all snapshots older than a given timestamp. Identical to ExpireSnapshots.expireOlderThan(long)
        Parameters:
        timestampMillis - all snapshots before this time will be expired
        Returns:
        this for method chaining
      • retainLast

        public ExpireSnapshotsAction retainLast​(int numSnapshots)
        Deprecated.
        Retain at least x snapshots when expiring Identical to ExpireSnapshots.retainLast(int)
        Parameters:
        numSnapshots - number of snapshots to leave
        Returns:
        this for method chaining
      • deleteWith

        public ExpireSnapshotsAction deleteWith​(java.util.function.Consumer<java.lang.String> newDeleteFunc)
        Deprecated.
        The Consumer used on files which have been determined to be expired. By default uses a filesystem delete. Identical to ExpireSnapshots.deleteWith(Consumer)
        Parameters:
        newDeleteFunc - Consumer which takes a path and deletes it
        Returns:
        this for method chaining
      • expire

        public org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> expire()
        Deprecated.
        Expires snapshots and commits the changes to the table, returning a Dataset of files to delete.

        This does not delete data files. To delete data files, run execute().

        This may be called before or after execute() is called to return the expired file list.

        Returns:
        a Dataset of files that are no longer referenced by the table