Class ChangelogIterator

java.lang.Object
org.apache.iceberg.spark.ChangelogIterator
All Implemented Interfaces:
Iterator<org.apache.spark.sql.Row>
Direct Known Subclasses:
ComputeUpdateIterator, RemoveNetCarryoverIterator

public abstract class ChangelogIterator extends Object implements Iterator<org.apache.spark.sql.Row>
An iterator that transforms rows from changelog tables within a single Spark task.
  • Field Details

    • DELETE

      protected static final String DELETE
    • INSERT

      protected static final String INSERT
    • UPDATE_BEFORE

      protected static final String UPDATE_BEFORE
    • UPDATE_AFTER

      protected static final String UPDATE_AFTER
  • Constructor Details

    • ChangelogIterator

      protected ChangelogIterator(Iterator<org.apache.spark.sql.Row> rowIterator, org.apache.spark.sql.types.StructType rowType)
  • Method Details

    • changeTypeIndex

      protected int changeTypeIndex()
    • rowType

      protected org.apache.spark.sql.types.StructType rowType()
    • changeType

      protected String changeType(org.apache.spark.sql.Row row)
    • rowIterator

      protected Iterator<org.apache.spark.sql.Row> rowIterator()
    • computeUpdates

      public static Iterator<org.apache.spark.sql.Row> computeUpdates(Iterator<org.apache.spark.sql.Row> rowIterator, org.apache.spark.sql.types.StructType rowType, String[] identifierFields)
      Creates an iterator composing RemoveCarryoverIterator and ComputeUpdateIterator to remove carry-over rows and compute update rows
      Parameters:
      rowIterator - the iterator of rows from a changelog table
      rowType - the schema of the rows
      identifierFields - the names of the identifier columns, which determine if rows are the same
      Returns:
      a new iterator instance
    • removeCarryovers

      public static Iterator<org.apache.spark.sql.Row> removeCarryovers(Iterator<org.apache.spark.sql.Row> rowIterator, org.apache.spark.sql.types.StructType rowType)
      Creates an iterator that removes carry-over rows from a changelog table.
      Parameters:
      rowIterator - the iterator of rows from a changelog table
      rowType - the schema of the rows
      Returns:
      a new iterator instance
    • removeNetCarryovers

      public static Iterator<org.apache.spark.sql.Row> removeNetCarryovers(Iterator<org.apache.spark.sql.Row> rowIterator, org.apache.spark.sql.types.StructType rowType)
    • isSameRecord

      protected boolean isSameRecord(org.apache.spark.sql.Row currentRow, org.apache.spark.sql.Row nextRow, int[] indicesToIdentifySameRow)
    • isDifferentValue

      protected boolean isDifferentValue(org.apache.spark.sql.Row currentRow, org.apache.spark.sql.Row nextRow, int idx)
    • generateIndicesToIdentifySameRow

      protected static int[] generateIndicesToIdentifySameRow(int totalColumnCount, Set<Integer> metadataColumnIndices)