Package org.apache.iceberg.spark
Class ChangelogIterator
java.lang.Object
org.apache.iceberg.spark.ChangelogIterator
- All Implemented Interfaces:
Iterator<org.apache.spark.sql.Row>
- Direct Known Subclasses:
ComputeUpdateIterator
,RemoveNetCarryoverIterator
public abstract class ChangelogIterator
extends Object
implements Iterator<org.apache.spark.sql.Row>
An iterator that transforms rows from changelog tables within a single Spark task.
-
Field Summary
-
Constructor Summary
ModifierConstructorDescriptionprotected
ChangelogIterator
(Iterator<org.apache.spark.sql.Row> rowIterator, org.apache.spark.sql.types.StructType rowType) -
Method Summary
Modifier and TypeMethodDescriptionprotected String
changeType
(org.apache.spark.sql.Row row) protected int
static Iterator
<org.apache.spark.sql.Row> computeUpdates
(Iterator<org.apache.spark.sql.Row> rowIterator, org.apache.spark.sql.types.StructType rowType, String[] identifierFields) Creates an iterator composingRemoveCarryoverIterator
andComputeUpdateIterator
to remove carry-over rows and compute update rowsprotected static int[]
generateIndicesToIdentifySameRow
(int totalColumnCount, Set<Integer> metadataColumnIndices) protected boolean
isDifferentValue
(org.apache.spark.sql.Row currentRow, org.apache.spark.sql.Row nextRow, int idx) protected boolean
isSameRecord
(org.apache.spark.sql.Row currentRow, org.apache.spark.sql.Row nextRow, int[] indicesToIdentifySameRow) static Iterator
<org.apache.spark.sql.Row> removeCarryovers
(Iterator<org.apache.spark.sql.Row> rowIterator, org.apache.spark.sql.types.StructType rowType) Creates an iterator that removes carry-over rows from a changelog table.static Iterator
<org.apache.spark.sql.Row> removeNetCarryovers
(Iterator<org.apache.spark.sql.Row> rowIterator, org.apache.spark.sql.types.StructType rowType) protected Iterator
<org.apache.spark.sql.Row> protected org.apache.spark.sql.types.StructType
rowType()
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface java.util.Iterator
forEachRemaining, hasNext, next, remove
-
Field Details
-
DELETE
-
INSERT
-
UPDATE_BEFORE
-
UPDATE_AFTER
-
-
Constructor Details
-
ChangelogIterator
protected ChangelogIterator(Iterator<org.apache.spark.sql.Row> rowIterator, org.apache.spark.sql.types.StructType rowType)
-
-
Method Details
-
changeTypeIndex
protected int changeTypeIndex() -
rowType
protected org.apache.spark.sql.types.StructType rowType() -
changeType
-
rowIterator
-
computeUpdates
public static Iterator<org.apache.spark.sql.Row> computeUpdates(Iterator<org.apache.spark.sql.Row> rowIterator, org.apache.spark.sql.types.StructType rowType, String[] identifierFields) Creates an iterator composingRemoveCarryoverIterator
andComputeUpdateIterator
to remove carry-over rows and compute update rows- Parameters:
rowIterator
- the iterator of rows from a changelog tablerowType
- the schema of the rowsidentifierFields
- the names of the identifier columns, which determine if rows are the same- Returns:
- a new iterator instance
-
removeCarryovers
public static Iterator<org.apache.spark.sql.Row> removeCarryovers(Iterator<org.apache.spark.sql.Row> rowIterator, org.apache.spark.sql.types.StructType rowType) Creates an iterator that removes carry-over rows from a changelog table.- Parameters:
rowIterator
- the iterator of rows from a changelog tablerowType
- the schema of the rows- Returns:
- a new iterator instance
-
removeNetCarryovers
-
isSameRecord
protected boolean isSameRecord(org.apache.spark.sql.Row currentRow, org.apache.spark.sql.Row nextRow, int[] indicesToIdentifySameRow) -
isDifferentValue
protected boolean isDifferentValue(org.apache.spark.sql.Row currentRow, org.apache.spark.sql.Row nextRow, int idx) -
generateIndicesToIdentifySameRow
-