public abstract class ChangelogIterator
extends java.lang.Object
implements java.util.Iterator<org.apache.spark.sql.Row>
Modifier and Type | Field and Description |
---|---|
protected static java.lang.String |
DELETE |
protected static java.lang.String |
INSERT |
protected static java.lang.String |
UPDATE_AFTER |
protected static java.lang.String |
UPDATE_BEFORE |
Modifier | Constructor and Description |
---|---|
protected |
ChangelogIterator(java.util.Iterator<org.apache.spark.sql.Row> rowIterator,
org.apache.spark.sql.types.StructType rowType) |
Modifier and Type | Method and Description |
---|---|
protected java.lang.String |
changeType(org.apache.spark.sql.Row row) |
protected int |
changeTypeIndex() |
static java.util.Iterator<org.apache.spark.sql.Row> |
computeUpdates(java.util.Iterator<org.apache.spark.sql.Row> rowIterator,
org.apache.spark.sql.types.StructType rowType,
java.lang.String[] identifierFields)
Creates an iterator composing
RemoveCarryoverIterator and ComputeUpdateIterator
to remove carry-over rows and compute update rows |
protected static int[] |
generateIndicesToIdentifySameRow(int totalColumnCount,
java.util.Set<java.lang.Integer> metadataColumnIndices) |
protected boolean |
isDifferentValue(org.apache.spark.sql.Row currentRow,
org.apache.spark.sql.Row nextRow,
int idx) |
protected boolean |
isSameRecord(org.apache.spark.sql.Row currentRow,
org.apache.spark.sql.Row nextRow,
int[] indicesToIdentifySameRow) |
static java.util.Iterator<org.apache.spark.sql.Row> |
removeCarryovers(java.util.Iterator<org.apache.spark.sql.Row> rowIterator,
org.apache.spark.sql.types.StructType rowType)
Creates an iterator that removes carry-over rows from a changelog table.
|
static java.util.Iterator<org.apache.spark.sql.Row> |
removeNetCarryovers(java.util.Iterator<org.apache.spark.sql.Row> rowIterator,
org.apache.spark.sql.types.StructType rowType) |
protected java.util.Iterator<org.apache.spark.sql.Row> |
rowIterator() |
protected org.apache.spark.sql.types.StructType |
rowType() |
protected static final java.lang.String DELETE
protected static final java.lang.String INSERT
protected static final java.lang.String UPDATE_BEFORE
protected static final java.lang.String UPDATE_AFTER
protected ChangelogIterator(java.util.Iterator<org.apache.spark.sql.Row> rowIterator, org.apache.spark.sql.types.StructType rowType)
protected int changeTypeIndex()
protected org.apache.spark.sql.types.StructType rowType()
protected java.lang.String changeType(org.apache.spark.sql.Row row)
protected java.util.Iterator<org.apache.spark.sql.Row> rowIterator()
public static java.util.Iterator<org.apache.spark.sql.Row> computeUpdates(java.util.Iterator<org.apache.spark.sql.Row> rowIterator, org.apache.spark.sql.types.StructType rowType, java.lang.String[] identifierFields)
RemoveCarryoverIterator
and ComputeUpdateIterator
to remove carry-over rows and compute update rowsrowIterator
- the iterator of rows from a changelog tablerowType
- the schema of the rowsidentifierFields
- the names of the identifier columns, which determine if rows are the
samepublic static java.util.Iterator<org.apache.spark.sql.Row> removeCarryovers(java.util.Iterator<org.apache.spark.sql.Row> rowIterator, org.apache.spark.sql.types.StructType rowType)
rowIterator
- the iterator of rows from a changelog tablerowType
- the schema of the rowspublic static java.util.Iterator<org.apache.spark.sql.Row> removeNetCarryovers(java.util.Iterator<org.apache.spark.sql.Row> rowIterator, org.apache.spark.sql.types.StructType rowType)
protected boolean isSameRecord(org.apache.spark.sql.Row currentRow, org.apache.spark.sql.Row nextRow, int[] indicesToIdentifySameRow)
protected boolean isDifferentValue(org.apache.spark.sql.Row currentRow, org.apache.spark.sql.Row nextRow, int idx)
protected static int[] generateIndicesToIdentifySameRow(int totalColumnCount, java.util.Set<java.lang.Integer> metadataColumnIndices)