Package org.apache.iceberg.spark
Class ChangelogIterator
java.lang.Object
org.apache.iceberg.spark.ChangelogIterator
- All Implemented Interfaces:
- Iterator<org.apache.spark.sql.Row>
- Direct Known Subclasses:
- ComputeUpdateIterator,- RemoveNetCarryoverIterator
public abstract class ChangelogIterator
extends Object
implements Iterator<org.apache.spark.sql.Row>
An iterator that transforms rows from changelog tables within a single Spark task.
- 
Field SummaryFields
- 
Constructor SummaryConstructorsModifierConstructorDescriptionprotectedChangelogIterator(Iterator<org.apache.spark.sql.Row> rowIterator, org.apache.spark.sql.types.StructType rowType) 
- 
Method SummaryModifier and TypeMethodDescriptionprotected StringchangeType(org.apache.spark.sql.Row row) protected intstatic Iterator<org.apache.spark.sql.Row> computeUpdates(Iterator<org.apache.spark.sql.Row> rowIterator, org.apache.spark.sql.types.StructType rowType, String[] identifierFields) Creates an iterator composingRemoveCarryoverIteratorandComputeUpdateIteratorto remove carry-over rows and compute update rowsprotected static int[]generateIndicesToIdentifySameRow(int totalColumnCount, Set<Integer> metadataColumnIndices) protected booleanisDifferentValue(org.apache.spark.sql.Row currentRow, org.apache.spark.sql.Row nextRow, int idx) protected booleanisSameRecord(org.apache.spark.sql.Row currentRow, org.apache.spark.sql.Row nextRow, int[] indicesToIdentifySameRow) static Iterator<org.apache.spark.sql.Row> removeCarryovers(Iterator<org.apache.spark.sql.Row> rowIterator, org.apache.spark.sql.types.StructType rowType) Creates an iterator that removes carry-over rows from a changelog table.static Iterator<org.apache.spark.sql.Row> removeNetCarryovers(Iterator<org.apache.spark.sql.Row> rowIterator, org.apache.spark.sql.types.StructType rowType) protected Iterator<org.apache.spark.sql.Row> protected org.apache.spark.sql.types.StructTyperowType()Methods inherited from class java.lang.Objectclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface java.util.IteratorforEachRemaining, hasNext, next, remove
- 
Field Details- 
DELETE
- 
INSERT
- 
UPDATE_BEFORE
- 
UPDATE_AFTER
 
- 
- 
Constructor Details- 
ChangelogIteratorprotected ChangelogIterator(Iterator<org.apache.spark.sql.Row> rowIterator, org.apache.spark.sql.types.StructType rowType) 
 
- 
- 
Method Details- 
changeTypeIndexprotected int changeTypeIndex()
- 
rowTypeprotected org.apache.spark.sql.types.StructType rowType()
- 
changeType
- 
rowIterator
- 
computeUpdatespublic static Iterator<org.apache.spark.sql.Row> computeUpdates(Iterator<org.apache.spark.sql.Row> rowIterator, org.apache.spark.sql.types.StructType rowType, String[] identifierFields) Creates an iterator composingRemoveCarryoverIteratorandComputeUpdateIteratorto remove carry-over rows and compute update rows- Parameters:
- rowIterator- the iterator of rows from a changelog table
- rowType- the schema of the rows
- identifierFields- the names of the identifier columns, which determine if rows are the same
- Returns:
- a new iterator instance
 
- 
removeCarryoverspublic static Iterator<org.apache.spark.sql.Row> removeCarryovers(Iterator<org.apache.spark.sql.Row> rowIterator, org.apache.spark.sql.types.StructType rowType) Creates an iterator that removes carry-over rows from a changelog table.- Parameters:
- rowIterator- the iterator of rows from a changelog table
- rowType- the schema of the rows
- Returns:
- a new iterator instance
 
- 
removeNetCarryovers
- 
isSameRecordprotected boolean isSameRecord(org.apache.spark.sql.Row currentRow, org.apache.spark.sql.Row nextRow, int[] indicesToIdentifySameRow) 
- 
isDifferentValueprotected boolean isDifferentValue(org.apache.spark.sql.Row currentRow, org.apache.spark.sql.Row nextRow, int idx) 
- 
generateIndicesToIdentifySameRow
 
-