Class SnapshotUtil

java.lang.Object
org.apache.iceberg.util.SnapshotUtil

public class SnapshotUtil extends Object
  • Method Details

    • isAncestorOf

      public static boolean isAncestorOf(Table table, long snapshotId, long ancestorSnapshotId)
      Returns whether ancestorSnapshotId is an ancestor of snapshotId.
    • isAncestorOf

      public static boolean isAncestorOf(long snapshotId, long ancestorSnapshotId, Function<Long,Snapshot> lookup)
      Returns whether ancestorSnapshotId is an ancestor of snapshotId using the given lookup function.
    • isAncestorOf

      public static boolean isAncestorOf(Table table, long ancestorSnapshotId)
      Returns whether ancestorSnapshotId is an ancestor of the table's current state.
    • isParentAncestorOf

      public static boolean isParentAncestorOf(Table table, long snapshotId, long ancestorParentSnapshotId)
      Returns whether some ancestor of snapshotId has parentId matches ancestorParentSnapshotId
    • currentAncestors

      public static Iterable<Snapshot> currentAncestors(Table table)
      Returns an iterable that traverses the table's snapshots from the current to the last known ancestor.
      Parameters:
      table - a Table
      Returns:
      an iterable from the table's current snapshot to its last known ancestor
    • currentAncestorIds

      public static List<Long> currentAncestorIds(Table table)
      Return the snapshot IDs for the ancestors of the current table state.

      Ancestor IDs are ordered by commit time, descending. The first ID is the current snapshot, followed by its parent, and so on.

      Parameters:
      table - a Table
      Returns:
      a set of snapshot IDs of the known ancestor snapshots, including the current ID
    • oldestAncestor

      public static Snapshot oldestAncestor(Table table)
      Traverses the history of the table's current snapshot and finds the oldest Snapshot.
      Returns:
      null if there is no current snapshot in the table, else the oldest Snapshot.
    • oldestAncestorOf

      public static Snapshot oldestAncestorOf(Table table, long snapshotId)
    • oldestAncestorOf

      public static Snapshot oldestAncestorOf(long snapshotId, Function<Long,Snapshot> lookup)
      Traverses the history and finds the oldest ancestor of the specified snapshot.

      Oldest ancestor is defined as the ancestor snapshot whose parent is null or has been expired. If the specified snapshot has no parent or parent has been expired, the specified snapshot itself is returned.

      Parameters:
      snapshotId - the ID of the snapshot to find the oldest ancestor
      lookup - lookup function from snapshot ID to snapshot
      Returns:
      null if there is no current snapshot in the table, else the oldest Snapshot.
    • ancestorsOf

      public static Iterable<Snapshot> ancestorsOf(long snapshotId, Function<Long,Snapshot> lookup)
    • oldestAncestorAfter

      public static Snapshot oldestAncestorAfter(Table table, long timestampMillis)
      Traverses the history of the table's current snapshot, finds the oldest snapshot that was committed either at or after a given time.
      Parameters:
      table - a table
      timestampMillis - a timestamp in milliseconds
      Returns:
      the first snapshot after the given timestamp, or null if the current snapshot is older than the timestamp
      Throws:
      IllegalStateException - if the first ancestor after the given time can't be determined
    • snapshotIdsBetween

      public static List<Long> snapshotIdsBetween(Table table, long fromSnapshotId, long toSnapshotId)
      Returns list of snapshot ids in the range - (fromSnapshotId, toSnapshotId]

      This method assumes that fromSnapshotId is an ancestor of toSnapshotId.

    • ancestorIdsBetween

      public static Iterable<Long> ancestorIdsBetween(long latestSnapshotId, Long oldestSnapshotId, Function<Long,Snapshot> lookup)
    • ancestorsBetween

      public static Iterable<Snapshot> ancestorsBetween(Table table, long latestSnapshotId, Long oldestSnapshotId)
    • ancestorsBetween

      public static Iterable<Snapshot> ancestorsBetween(long latestSnapshotId, Long oldestSnapshotId, Function<Long,Snapshot> lookup)
    • ancestorIds

      public static List<Long> ancestorIds(Snapshot snapshot, Function<Long,Snapshot> lookup)
    • newFiles

      public static List<DataFile> newFiles(Long baseSnapshotId, long latestSnapshotId, Function<Long,Snapshot> lookup, FileIO io)
    • snapshotAfter

      public static Snapshot snapshotAfter(Table table, long snapshotId)
      Traverses the history of the table's current snapshot and finds the snapshot with the given snapshot id as its parent.
      Returns:
      the snapshot for which the given snapshot is the parent
      Throws:
      IllegalArgumentException - when the given snapshotId is not found in the table
      IllegalStateException - when the given snapshotId is not an ancestor of the current table state
    • snapshotIdAsOfTime

      public static long snapshotIdAsOfTime(Table table, long timestampMillis)
      Returns the ID of the most recent snapshot for the table as of the timestamp.
      Parameters:
      table - a Table
      timestampMillis - the timestamp in millis since the Unix epoch
      Returns:
      the snapshot ID
      Throws:
      IllegalArgumentException - when no snapshot is found in the table older than the timestamp
    • nullableSnapshotIdAsOfTime

      public static Long nullableSnapshotIdAsOfTime(Table table, long timestampMillis)
    • schemaFor

      public static Schema schemaFor(Table table, long snapshotId)
      Returns the schema of the table for the specified snapshot.
      Parameters:
      table - a Table
      snapshotId - the ID of the snapshot
      Returns:
      the schema
    • schemaFor

      public static Schema schemaFor(Table table, Long snapshotId, Long timestampMillis)
      Convenience method for returning the schema of the table for a snapshot, when we have a snapshot id or a timestamp. Only one of them should be specified (non-null), or an IllegalArgumentException is thrown.
      Parameters:
      table - a Table
      snapshotId - the ID of the snapshot
      timestampMillis - the timestamp in millis since the Unix epoch
      Returns:
      the schema
      Throws:
      IllegalArgumentException - if both snapshotId and timestampMillis are non-null
    • schemaFor

      public static Schema schemaFor(Table table, String ref)
      Return the schema of the snapshot at a given ref.

      If the ref does not exist or the ref is a branch, the table schema is returned because it will be the schema when the new branch is created. If the ref is a tag, then the snapshot schema is returned.

      Parameters:
      table - a Table
      ref - ref name of the table (nullable)
      Returns:
      schema of the specific snapshot at the given ref
    • schemaFor

      public static Schema schemaFor(TableMetadata metadata, String ref)
      Return the schema of the snapshot at a given ref.

      If the ref does not exist or the ref is a branch, the table schema is returned because it will be the schema when the new branch is created. If the ref is a tag, then the snapshot schema is returned.

      Parameters:
      metadata - a TableMetadata
      ref - ref name of the table (nullable)
      Returns:
      schema of the specific snapshot at the given branch
    • latestSnapshot

      public static Snapshot latestSnapshot(Table table, String branch)
      Fetch the snapshot at the head of the given branch in the given table.

      This method calls Table.currentSnapshot() instead of using branch API Table.snapshot(String) for the main branch so that existing code still goes through the old code path to ensure backwards compatibility.

      Parameters:
      table - a Table
      branch - branch name of the table (nullable)
      Returns:
      the latest snapshot for the given branch
    • latestSnapshot

      public static Snapshot latestSnapshot(TableMetadata metadata, String branch)
      Fetch the snapshot at the head of the given branch in the given table.

      This method calls TableMetadata.currentSnapshot() instead of using branch API TableMetadata.ref(String)} for the main branch so that existing code still goes through the old code path to ensure backwards compatibility.

      If branch does not exist, the table's latest snapshot is returned it will be the schema when the new branch is created.

      Parameters:
      metadata - a TableMetadata
      branch - branch name of the table metadata (nullable)
      Returns:
      the latest snapshot for the given branch