Class ChangelogIterator

  • All Implemented Interfaces:
    java.util.Iterator<org.apache.spark.sql.Row>
    Direct Known Subclasses:
    ComputeUpdateIterator, RemoveNetCarryoverIterator

    public abstract class ChangelogIterator
    extends java.lang.Object
    implements java.util.Iterator<org.apache.spark.sql.Row>
    An iterator that transforms rows from changelog tables within a single Spark task.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected static java.lang.String DELETE  
      protected static java.lang.String INSERT  
      protected static java.lang.String UPDATE_AFTER  
      protected static java.lang.String UPDATE_BEFORE  
    • Constructor Summary

      Constructors 
      Modifier Constructor Description
      protected ChangelogIterator​(java.util.Iterator<org.apache.spark.sql.Row> rowIterator, org.apache.spark.sql.types.StructType rowType)  
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      protected java.lang.String changeType​(org.apache.spark.sql.Row row)  
      protected int changeTypeIndex()  
      static java.util.Iterator<org.apache.spark.sql.Row> computeUpdates​(java.util.Iterator<org.apache.spark.sql.Row> rowIterator, org.apache.spark.sql.types.StructType rowType, java.lang.String[] identifierFields)
      Creates an iterator composing RemoveCarryoverIterator and ComputeUpdateIterator to remove carry-over rows and compute update rows
      protected static int[] generateIndicesToIdentifySameRow​(int totalColumnCount, java.util.Set<java.lang.Integer> metadataColumnIndices)  
      protected boolean isDifferentValue​(org.apache.spark.sql.Row currentRow, org.apache.spark.sql.Row nextRow, int idx)  
      protected boolean isSameRecord​(org.apache.spark.sql.Row currentRow, org.apache.spark.sql.Row nextRow, int[] indicesToIdentifySameRow)  
      static java.util.Iterator<org.apache.spark.sql.Row> removeCarryovers​(java.util.Iterator<org.apache.spark.sql.Row> rowIterator, org.apache.spark.sql.types.StructType rowType)
      Creates an iterator that removes carry-over rows from a changelog table.
      static java.util.Iterator<org.apache.spark.sql.Row> removeNetCarryovers​(java.util.Iterator<org.apache.spark.sql.Row> rowIterator, org.apache.spark.sql.types.StructType rowType)  
      protected java.util.Iterator<org.apache.spark.sql.Row> rowIterator()  
      protected org.apache.spark.sql.types.StructType rowType()  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
      • Methods inherited from interface java.util.Iterator

        forEachRemaining, hasNext, next, remove
    • Field Detail

      • DELETE

        protected static final java.lang.String DELETE
      • INSERT

        protected static final java.lang.String INSERT
      • UPDATE_BEFORE

        protected static final java.lang.String UPDATE_BEFORE
      • UPDATE_AFTER

        protected static final java.lang.String UPDATE_AFTER
    • Constructor Detail

      • ChangelogIterator

        protected ChangelogIterator​(java.util.Iterator<org.apache.spark.sql.Row> rowIterator,
                                    org.apache.spark.sql.types.StructType rowType)
    • Method Detail

      • changeTypeIndex

        protected int changeTypeIndex()
      • rowType

        protected org.apache.spark.sql.types.StructType rowType()
      • changeType

        protected java.lang.String changeType​(org.apache.spark.sql.Row row)
      • rowIterator

        protected java.util.Iterator<org.apache.spark.sql.Row> rowIterator()
      • computeUpdates

        public static java.util.Iterator<org.apache.spark.sql.Row> computeUpdates​(java.util.Iterator<org.apache.spark.sql.Row> rowIterator,
                                                                                  org.apache.spark.sql.types.StructType rowType,
                                                                                  java.lang.String[] identifierFields)
        Creates an iterator composing RemoveCarryoverIterator and ComputeUpdateIterator to remove carry-over rows and compute update rows
        Parameters:
        rowIterator - the iterator of rows from a changelog table
        rowType - the schema of the rows
        identifierFields - the names of the identifier columns, which determine if rows are the same
        Returns:
        a new iterator instance
      • removeCarryovers

        public static java.util.Iterator<org.apache.spark.sql.Row> removeCarryovers​(java.util.Iterator<org.apache.spark.sql.Row> rowIterator,
                                                                                    org.apache.spark.sql.types.StructType rowType)
        Creates an iterator that removes carry-over rows from a changelog table.
        Parameters:
        rowIterator - the iterator of rows from a changelog table
        rowType - the schema of the rows
        Returns:
        a new iterator instance
      • removeNetCarryovers

        public static java.util.Iterator<org.apache.spark.sql.Row> removeNetCarryovers​(java.util.Iterator<org.apache.spark.sql.Row> rowIterator,
                                                                                       org.apache.spark.sql.types.StructType rowType)
      • isSameRecord

        protected boolean isSameRecord​(org.apache.spark.sql.Row currentRow,
                                       org.apache.spark.sql.Row nextRow,
                                       int[] indicesToIdentifySameRow)
      • isDifferentValue

        protected boolean isDifferentValue​(org.apache.spark.sql.Row currentRow,
                                           org.apache.spark.sql.Row nextRow,
                                           int idx)
      • generateIndicesToIdentifySameRow

        protected static int[] generateIndicesToIdentifySameRow​(int totalColumnCount,
                                                                java.util.Set<java.lang.Integer> metadataColumnIndices)