public class ChangelogIterator
extends java.lang.Object
implements java.util.Iterator<org.apache.spark.sql.Row>
It removes the carry-over rows. Carry-over rows are the result of a removal and insertion of the same row within an operation because of the copy-on-write mechanism. For example, given a file which contains row1 (id=1, data='a') and row2 (id=2, data='b'). A copy-on-write delete of row2 would require erasing this file and preserving row1 in a new file. The change-log table would report this as (id=1, data='a', op='DELETE') and (id=1, data='a', op='INSERT'), despite it not being an actual change to the table. The iterator finds the carry-over rows and removes them from the result.
This iterator also finds delete/insert rows which represent an update, and converts them into update records. For example, these two rows
will be marked as update-rows:
Modifier and Type | Method and Description |
---|---|
static java.util.Iterator<org.apache.spark.sql.Row> |
create(java.util.Iterator<org.apache.spark.sql.Row> rowIterator,
org.apache.spark.sql.types.StructType rowType,
java.lang.String[] identifierFields)
Creates an iterator for records of a changelog table.
|
boolean |
hasNext() |
org.apache.spark.sql.Row |
next() |
public static java.util.Iterator<org.apache.spark.sql.Row> create(java.util.Iterator<org.apache.spark.sql.Row> rowIterator, org.apache.spark.sql.types.StructType rowType, java.lang.String[] identifierFields)
rowIterator
- the iterator of rows from a changelog tablerowType
- the schema of the rowsidentifierFields
- the names of the identifier columns, which determine if rows are the
sameChangelogIterator
instance concatenated with the null-removal iteratorpublic boolean hasNext()
hasNext
in interface java.util.Iterator<org.apache.spark.sql.Row>
public org.apache.spark.sql.Row next()
next
in interface java.util.Iterator<org.apache.spark.sql.Row>