ChangelogIterator

Skip navigation links

Prev Class
Next Class

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

java.lang.Object
- org.apache.iceberg.spark.ChangelogIterator

All Implemented Interfaces:

java.util.Iterator<org.apache.spark.sql.Row>
```
public class ChangelogIterator
extends java.lang.Object
implements java.util.Iterator<org.apache.spark.sql.Row>
```
An iterator that transforms rows from changelog tables within a single Spark task. It assumes that rows are sorted by identifier columns and change type.
It removes the carry-over rows. Carry-over rows are the result of a removal and insertion of the same row within an operation because of the copy-on-write mechanism. For example, given a file which contains row1 (id=1, data='a') and row2 (id=2, data='b'). A copy-on-write delete of row2 would require erasing this file and preserving row1 in a new file. The change-log table would report this as (id=1, data='a', op='DELETE') and (id=1, data='a', op='INSERT'), despite it not being an actual change to the table. The iterator finds the carry-over rows and removes them from the result.
This iterator also finds delete/insert rows which represent an update, and converts them into update records. For example, these two rows
- (id=1, data='a', op='DELETE')
- (id=1, data='b', op='INSERT')
will be marked as update-rows:
- (id=1, data='a', op='UPDATE_BEFORE')
- (id=1, data='b', op='UPDATE_AFTER')

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`static java.util.Iterator<org.apache.spark.sql.Row>`	`create(java.util.Iterator<org.apache.spark.sql.Row> rowIterator, org.apache.spark.sql.types.StructType rowType, java.lang.String[] identifierFields)` Creates an iterator for records of a changelog table.
`boolean`	`hasNext()`
`org.apache.spark.sql.Row`	`next()`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface java.util.Iterator
forEachRemaining, remove

- Method Detail
  - create
```
public static java.util.Iterator<org.apache.spark.sql.Row> create(java.util.Iterator<org.apache.spark.sql.Row> rowIterator,
                                                                  org.apache.spark.sql.types.StructType rowType,
                                                                  java.lang.String[] identifierFields)
```
    Creates an iterator for records of a changelog table.
    
    Parameters:
    
    rowIterator - the iterator of rows from a changelog table
    
    rowType - the schema of the rows
    
    identifierFields - the names of the identifier columns, which determine if rows are the same
    
    Returns:
    
    a new ChangelogIterator instance concatenated with the null-removal iterator
  - hasNext
```
public boolean hasNext()
```
    Specified by:
    
    hasNext in interface java.util.Iterator<org.apache.spark.sql.Row>
  - next
```
public org.apache.spark.sql.Row next()
```
    Specified by:
    
    next in interface java.util.Iterator<org.apache.spark.sql.Row>

Skip navigation links

Prev Class
Next Class

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method