java.lang.Object

org.apache.iceberg.flink.sink.dynamic.DynamicIcebergSink.Builder<T>

Enclosing class:: DynamicIcebergSink

public static class DynamicIcebergSink.Builder<T> extends Object

Method Summary

Modifier and Type

Method

Description

org.apache.flink.streaming.api.datastream.DataStreamSink<org.apache.iceberg.flink.sink.dynamic.DynamicRecordInternal>

append()

Append the iceberg sink operators to write records to iceberg table.

DynamicIcebergSink.Builder<T>

cacheMaxSize(int maxSize)

Maximum size of the caches used in Dynamic Sink for table data and serializers.

DynamicIcebergSink.Builder<T>

cacheRefreshMs(long refreshMs)

Maximum interval for cache items renewals.

DynamicIcebergSink.Builder<T>

caseSensitive(boolean newCaseSensitive)

Set whether schema field name matching should be case-sensitive.

DynamicIcebergSink.Builder<T>

catalogLoader(CatalogLoader newCatalogLoader)

The catalog loader is used for loading tables in DynamicCommitter lazily, we need this loader because Table is not serializable and could not just use the loaded table from Builder#table in the remote task manager.

DynamicIcebergSink.Builder<T>

dropUnusedColumns(boolean newDropUnusedColumns)

Dropping columns is disabled by default to prevent issues with late or out-of-order data, as removed fields cannot be easily restored without data loss.

DynamicIcebergSink.Builder<T>

flinkConf(org.apache.flink.configuration.ReadableConfig config)

DynamicIcebergSink.Builder<T>

forInput(org.apache.flink.streaming.api.datastream.DataStream<T> inputStream)

DynamicIcebergSink.Builder<T>

generator(DynamicRecordGenerator<T> inputGenerator)

DynamicIcebergSink.Builder<T>

immediateTableUpdate(boolean newImmediateUpdate)

DynamicIcebergSink.Builder<T>

inputSchemasPerTableCacheMaxSize(int inputSchemasPerTableCacheMaxSize)

Maximum input Schema objects to cache per each Iceberg table.

DynamicIcebergSink.Builder<T>

overwrite(boolean newOverwrite)

DynamicIcebergSink.Builder<T>

set(String property, String value)

Set the write properties for IcebergSink.

DynamicIcebergSink.Builder<T>

setAll(Map<String,String> properties)

Set the write properties for IcebergSink.

DynamicIcebergSink.Builder<T>

setSnapshotProperty(String property, String value)

DynamicIcebergSink.Builder<T>

snapshotProperties(Map<String,String> properties)

DynamicIcebergSink.Builder<T>

tableCreator(TableCreator tableCreationFunction)

Logic to create a table.

DynamicIcebergSink.Builder<T>

toBranch(String branch)

DynamicIcebergSink.Builder<T>

uidPrefix(String newPrefix)

Set the uid prefix for IcebergSink operators.

DynamicIcebergSink.Builder<T>

writeParallelism(int newWriteParallelism)

Configuring the write parallel number for iceberg stream writer.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Method Details
- forInput
  
  public DynamicIcebergSink.Builder<T> forInput(org.apache.flink.streaming.api.datastream.DataStream<T> inputStream)
- generator
  
  public DynamicIcebergSink.Builder<T> generator(DynamicRecordGenerator<T> inputGenerator)
- catalogLoader
  
  public DynamicIcebergSink.Builder<T> catalogLoader(CatalogLoader newCatalogLoader)
  
  The catalog loader is used for loading tables in DynamicCommitter lazily, we need this loader because Table is not serializable and could not just use the loaded table from Builder#table in the remote task manager.
  
  Parameters:
  
  newCatalogLoader - to load iceberg table inside tasks.
  
  Returns:
  
  DynamicIcebergSink.Builder to connect the iceberg table.
- set
  
  public DynamicIcebergSink.Builder<T> set(String property, String value)
  
  Set the write properties for IcebergSink. View the supported properties in FlinkWriteOptions
- setAll
  
  public DynamicIcebergSink.Builder<T> setAll(Map<String,String> properties)
  
  Set the write properties for IcebergSink. View the supported properties in FlinkWriteOptions
- overwrite
  
  public DynamicIcebergSink.Builder<T> overwrite(boolean newOverwrite)
- flinkConf
  
  public DynamicIcebergSink.Builder<T> flinkConf(org.apache.flink.configuration.ReadableConfig config)
- tableCreator
  
  public DynamicIcebergSink.Builder<T> tableCreator(TableCreator tableCreationFunction)
  
  Logic to create a table. Allows setting custom table properties/location on a per-table basis.
- writeParallelism
  
  public DynamicIcebergSink.Builder<T> writeParallelism(int newWriteParallelism)
  
  Configuring the write parallel number for iceberg stream writer.
  
  Parameters:
  
  newWriteParallelism - the number of parallel iceberg stream writer.
  
  Returns:
  
  DynamicIcebergSink.Builder to connect the iceberg table.
- uidPrefix
  
  public DynamicIcebergSink.Builder<T> uidPrefix(String newPrefix)
  
  Set the uid prefix for IcebergSink operators. Note that IcebergSink internally consists of multiple operators (like writer, committer, aggregator) Actual operator uid will be appended with a suffix like "uidPrefix-writer".
  If provided, this prefix is also applied to operator names.
  Flink auto generates operator uid if not set explicitly. It is a recommended best-practice to set uid for all operators before deploying to production. Flink has an option to pipeline.auto-generate-uid=false to disable auto-generation and force explicit setting of all operator uid.
  Be careful with setting this for an existing job, because now we are changing the operator uid from an auto-generated one to this new value. When deploying the change with a checkpoint, Flink won't be able to restore the previous IcebergSink operator state (more specifically the committer operator state). You need to use --allowNonRestoredState to ignore the previous sink state. During restore IcebergSink state is used to check if last commit was actually successful or not. --allowNonRestoredState can lead to data loss if the Iceberg commit failed in the last completed checkpoint.
  
  Parameters:
  
  newPrefix - prefix for Flink sink operator uid and name
  
  Returns:
  
  DynamicIcebergSink.Builder to connect the iceberg table.
- snapshotProperties
  
  public DynamicIcebergSink.Builder<T> snapshotProperties(Map<String,String> properties)
- setSnapshotProperty
  
  public DynamicIcebergSink.Builder<T> setSnapshotProperty(String property, String value)
- toBranch
  
  public DynamicIcebergSink.Builder<T> toBranch(String branch)
- immediateTableUpdate
  
  public DynamicIcebergSink.Builder<T> immediateTableUpdate(boolean newImmediateUpdate)
- dropUnusedColumns
  
  public DynamicIcebergSink.Builder<T> dropUnusedColumns(boolean newDropUnusedColumns)
  
  Dropping columns is disabled by default to prevent issues with late or out-of-order data, as removed fields cannot be easily restored without data loss.
  You can opt-in to allow dropping columns. Once a column has been dropped, it is technically still possible to write data to that column because Iceberg maintains all past table schemas. However, regular queries won't be able to reference the column. If the field was to re-appear as part of a new schema, an entirely new column would be added, which apart from the name, has nothing in common with the old column, i.e. queries for the new column will never return data of the old column.
- cacheMaxSize
  
  public DynamicIcebergSink.Builder<T> cacheMaxSize(int maxSize)
  
  Maximum size of the caches used in Dynamic Sink for table data and serializers.
- cacheRefreshMs
  
  public DynamicIcebergSink.Builder<T> cacheRefreshMs(long refreshMs)
  
  Maximum interval for cache items renewals.
- inputSchemasPerTableCacheMaxSize
  
  public DynamicIcebergSink.Builder<T> inputSchemasPerTableCacheMaxSize(int inputSchemasPerTableCacheMaxSize)
  
  Maximum input Schema objects to cache per each Iceberg table. The cache improves Dynamic Sink performance by storing Schema comparison results.
- caseSensitive
  
  public DynamicIcebergSink.Builder<T> caseSensitive(boolean newCaseSensitive)
  
  Set whether schema field name matching should be case-sensitive. The default is to match the field names case-sensitive.
- append
  
  public org.apache.flink.streaming.api.datastream.DataStreamSink<org.apache.iceberg.flink.sink.dynamic.DynamicRecordInternal> append()
  Append the iceberg sink operators to write records to iceberg table.
  The topology splits records by distribution mode:
  
  Forward records (null distributionMode) go through a forward edge to a chained writer, avoiding any data shuffle.
  Shuffle records (non-null distributionMode) go through the standard Sink2 pipeline with hash/round-robin distribution.
  Both writers feed into a single shared pre-commit aggregator and committer, ensuring atomic commits across both paths.
  Returns:
  
  DataStreamSink for sink.

Class DynamicIcebergSink.Builder<T>

Method Summary

Methods inherited from class java.lang.Object

Method Details

forInput

generator

catalogLoader

set

setAll

overwrite

flinkConf

tableCreator

writeParallelism

uidPrefix

snapshotProperties

setSnapshotProperty

toBranch

immediateTableUpdate

dropUnusedColumns

cacheMaxSize

cacheRefreshMs

inputSchemasPerTableCacheMaxSize

caseSensitive

append