java.lang.Object

org.apache.iceberg.flink.sink.IcebergSink.Builder

Enclosing class:: IcebergSink

public static class IcebergSink.Builder extends Object

Method Summary

Modifier and Type

Method

Description

org.apache.flink.streaming.api.datastream.DataStreamSink<org.apache.flink.table.data.RowData>

append()

Append the iceberg sink operators to write records to iceberg table.

IcebergSink.Builder

distributionMode(DistributionMode mode)

Configure the write DistributionMode that the IcebergSink will use.

IcebergSink.Builder

equalityFieldColumns(List<String> columns)

Configuring the equality field columns for iceberg table that accept CDC or UPSERT events.

IcebergSink.Builder

flinkConf(org.apache.flink.configuration.ReadableConfig config)

IcebergSink.Builder

overwrite(boolean newOverwrite)

IcebergSink.Builder

set(String property, String value)

Set the write properties for IcebergSink.

IcebergSink.Builder

setAll(Map<String,String> properties)

Set the write properties for IcebergSink.

IcebergSink.Builder

setSnapshotProperty(String property, String value)

IcebergSink.Builder

snapshotProperties(Map<String,String> properties)

IcebergSink.Builder

table(Table newTable)

This iceberg SerializableTable instance is used for initializing IcebergStreamWriter which will write all the records into DataFiles and emit them to downstream operator.

IcebergSink.Builder

tableLoader(TableLoader newTableLoader)

The table loader is used for loading tables in IcebergCommitter lazily, we need this loader because Table is not serializable and could not just use the loaded table from Builder#table in the remote task manager.

IcebergSink.Builder

tableSchema(org.apache.flink.table.api.TableSchema newTableSchema)

IcebergSink.Builder

toBranch(String branch)

IcebergSink.Builder

uidSuffix(String newSuffix)

Set the uid suffix for IcebergSink operators.

IcebergSink.Builder

upsert(boolean enabled)

All INSERT/UPDATE_AFTER events from input stream will be transformed to UPSERT events, which means it will DELETE the old records and then INSERT the new records.

IcebergSink.Builder

writeParallelism(int newWriteParallelism)

Configuring the write parallel number for iceberg stream writer.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Method Details
- table
  
  public IcebergSink.Builder table(Table newTable)
  
  This iceberg SerializableTable instance is used for initializing IcebergStreamWriter which will write all the records into DataFiles and emit them to downstream operator. Providing a table would avoid so many table loading from each separate task.
  
  Parameters:
  
  newTable - the loaded iceberg table instance.
  
  Returns:
  
  IcebergSink.Builder to connect the iceberg table.
- tableLoader
  
  public IcebergSink.Builder tableLoader(TableLoader newTableLoader)
  
  The table loader is used for loading tables in IcebergCommitter lazily, we need this loader because Table is not serializable and could not just use the loaded table from Builder#table in the remote task manager.
  
  Parameters:
  
  newTableLoader - to load iceberg table inside tasks.
  
  Returns:
  
  IcebergSink.Builder to connect the iceberg table.
- set
  
  public IcebergSink.Builder set(String property, String value)
  
  Set the write properties for IcebergSink. View the supported properties in FlinkWriteOptions
- setAll
  
  public IcebergSink.Builder setAll(Map<String,String> properties)
  
  Set the write properties for IcebergSink. View the supported properties in FlinkWriteOptions
- tableSchema
  
  public IcebergSink.Builder tableSchema(org.apache.flink.table.api.TableSchema newTableSchema)
- overwrite
  
  public IcebergSink.Builder overwrite(boolean newOverwrite)
- flinkConf
  
  public IcebergSink.Builder flinkConf(org.apache.flink.configuration.ReadableConfig config)
- distributionMode
  
  public IcebergSink.Builder distributionMode(DistributionMode mode)
  
  Configure the write DistributionMode that the IcebergSink will use. Currently, flink support DistributionMode.NONE and DistributionMode.HASH.
  
  Parameters:
  
  mode - to specify the write distribution mode.
  
  Returns:
  
  IcebergSink.Builder to connect the iceberg table.
- writeParallelism
  
  public IcebergSink.Builder writeParallelism(int newWriteParallelism)
  
  Configuring the write parallel number for iceberg stream writer.
  
  Parameters:
  
  newWriteParallelism - the number of parallel iceberg stream writer.
  
  Returns:
  
  IcebergSink.Builder to connect the iceberg table.
- upsert
  
  public IcebergSink.Builder upsert(boolean enabled)
  
  All INSERT/UPDATE_AFTER events from input stream will be transformed to UPSERT events, which means it will DELETE the old records and then INSERT the new records. In partitioned table, the partition fields should be a subset of equality fields, otherwise the old row that located in partition-A could not be deleted by the new row that located in partition-B.
  
  Parameters:
  
  enabled - indicate whether it should transform all INSERT/UPDATE_AFTER events to UPSERT.
  
  Returns:
  
  IcebergSink.Builder to connect the iceberg table.
- equalityFieldColumns
  
  public IcebergSink.Builder equalityFieldColumns(List<String> columns)
  
  Configuring the equality field columns for iceberg table that accept CDC or UPSERT events.
  
  Parameters:
  
  columns - defines the iceberg table's key.
  
  Returns:
  
  IcebergSink.Builder to connect the iceberg table.
- uidSuffix
  
  public IcebergSink.Builder uidSuffix(String newSuffix)
  
  Set the uid suffix for IcebergSink operators. Note that IcebergSink internally consists of multiple operators (like writer, committer, aggregator). Actual operator uid will be appended with a suffix like "Sink Committer: $uidSuffix".
  Flink auto generates operator uid if not set explicitly. It is a recommended best-practice to set uid for all operators before deploying to production. Flink has an option to pipeline.auto-generate-uid=false to disable auto-generation and force explicit setting of all operator uid.
  Be careful with setting this for an existing job, because now we are changing the operator uid from an auto-generated one to this new value. When deploying the change with a checkpoint, Flink won't be able to restore the previous IcebergSink operator state (more specifically the committer operator state). You need to use --allowNonRestoredState to ignore the previous sink state. During restore IcebergSink state is used to check if last commit was actually successful or not. --allowNonRestoredState can lead to data loss if the Iceberg commit failed in the last completed checkpoint.
  
  Parameters:
  
  newSuffix - suffix for Flink sink operator uid and name
  
  Returns:
  
  IcebergSink.Builder to connect the iceberg table.
- snapshotProperties
  
  public IcebergSink.Builder snapshotProperties(Map<String,String> properties)
- setSnapshotProperty
  
  public IcebergSink.Builder setSnapshotProperty(String property, String value)
- toBranch
  
  public IcebergSink.Builder toBranch(String branch)
- append
  
  public org.apache.flink.streaming.api.datastream.DataStreamSink<org.apache.flink.table.data.RowData> append()
  
  Append the iceberg sink operators to write records to iceberg table.
  
  Returns:
  
  DataStreamSink for sink.

Class IcebergSink.Builder

Method Summary

Methods inherited from class java.lang.Object

Method Details

table

tableLoader

set

setAll

tableSchema

overwrite

flinkConf

distributionMode

writeParallelism

upsert

equalityFieldColumns

uidSuffix

snapshotProperties

setSnapshotProperty

toBranch

append