Class IcebergSink.Builder
- java.lang.Object
-
- org.apache.iceberg.flink.sink.IcebergSink.Builder
-
- Enclosing class:
- IcebergSink
public static class IcebergSink.Builder extends java.lang.Object
-
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description org.apache.flink.streaming.api.datastream.DataStreamSink<org.apache.flink.table.data.RowData>append()Append the iceberg sink operators to write records to iceberg table.IcebergSink.BuilderdistributionMode(DistributionMode mode)Configure the writeDistributionModethat the IcebergSink will use.IcebergSink.BuilderequalityFieldColumns(java.util.List<java.lang.String> columns)Configuring the equality field columns for iceberg table that accept CDC or UPSERT events.IcebergSink.BuilderflinkConf(org.apache.flink.configuration.ReadableConfig config)IcebergSink.Builderoverwrite(boolean newOverwrite)IcebergSink.Builderset(java.lang.String property, java.lang.String value)Set the write properties for IcebergSink.IcebergSink.BuildersetAll(java.util.Map<java.lang.String,java.lang.String> properties)Set the write properties for IcebergSink.IcebergSink.BuildersetSnapshotProperty(java.lang.String property, java.lang.String value)IcebergSink.BuildersnapshotProperties(java.util.Map<java.lang.String,java.lang.String> properties)IcebergSink.Buildertable(Table newTable)This icebergSerializableTableinstance is used for initializingIcebergStreamWriterwhich will write all the records intoDataFiles and emit them to downstream operator.IcebergSink.BuildertableLoader(TableLoader newTableLoader)The table loader is used for loading tables inIcebergCommitterlazily, we need this loader becauseTableis not serializable and could not just use the loaded table from Builder#table in the remote task manager.IcebergSink.BuildertableSchema(org.apache.flink.table.api.TableSchema newTableSchema)IcebergSink.BuildertoBranch(java.lang.String branch)IcebergSink.BuilderuidSuffix(java.lang.String newSuffix)Set the uid suffix for IcebergSink operators.IcebergSink.Builderupsert(boolean enabled)All INSERT/UPDATE_AFTER events from input stream will be transformed to UPSERT events, which means it will DELETE the old records and then INSERT the new records.IcebergSink.BuilderwriteParallelism(int newWriteParallelism)Configuring the write parallel number for iceberg stream writer.
-
-
-
Method Detail
-
table
public IcebergSink.Builder table(Table newTable)
This icebergSerializableTableinstance is used for initializingIcebergStreamWriterwhich will write all the records intoDataFiles and emit them to downstream operator. Providing a table would avoid so many table loading from each separate task.- Parameters:
newTable- the loaded iceberg table instance.- Returns:
IcebergSink.Builderto connect the iceberg table.
-
tableLoader
public IcebergSink.Builder tableLoader(TableLoader newTableLoader)
The table loader is used for loading tables inIcebergCommitterlazily, we need this loader becauseTableis not serializable and could not just use the loaded table from Builder#table in the remote task manager.- Parameters:
newTableLoader- to load iceberg table inside tasks.- Returns:
IcebergSink.Builderto connect the iceberg table.
-
set
public IcebergSink.Builder set(java.lang.String property, java.lang.String value)
Set the write properties for IcebergSink. View the supported properties inFlinkWriteOptions
-
setAll
public IcebergSink.Builder setAll(java.util.Map<java.lang.String,java.lang.String> properties)
Set the write properties for IcebergSink. View the supported properties inFlinkWriteOptions
-
tableSchema
public IcebergSink.Builder tableSchema(org.apache.flink.table.api.TableSchema newTableSchema)
-
overwrite
public IcebergSink.Builder overwrite(boolean newOverwrite)
-
flinkConf
public IcebergSink.Builder flinkConf(org.apache.flink.configuration.ReadableConfig config)
-
distributionMode
public IcebergSink.Builder distributionMode(DistributionMode mode)
Configure the writeDistributionModethat the IcebergSink will use. Currently, flink supportDistributionMode.NONEandDistributionMode.HASH.- Parameters:
mode- to specify the write distribution mode.- Returns:
IcebergSink.Builderto connect the iceberg table.
-
writeParallelism
public IcebergSink.Builder writeParallelism(int newWriteParallelism)
Configuring the write parallel number for iceberg stream writer.- Parameters:
newWriteParallelism- the number of parallel iceberg stream writer.- Returns:
IcebergSink.Builderto connect the iceberg table.
-
upsert
public IcebergSink.Builder upsert(boolean enabled)
All INSERT/UPDATE_AFTER events from input stream will be transformed to UPSERT events, which means it will DELETE the old records and then INSERT the new records. In partitioned table, the partition fields should be a subset of equality fields, otherwise the old row that located in partition-A could not be deleted by the new row that located in partition-B.- Parameters:
enabled- indicate whether it should transform all INSERT/UPDATE_AFTER events to UPSERT.- Returns:
IcebergSink.Builderto connect the iceberg table.
-
equalityFieldColumns
public IcebergSink.Builder equalityFieldColumns(java.util.List<java.lang.String> columns)
Configuring the equality field columns for iceberg table that accept CDC or UPSERT events.- Parameters:
columns- defines the iceberg table's key.- Returns:
IcebergSink.Builderto connect the iceberg table.
-
uidSuffix
public IcebergSink.Builder uidSuffix(java.lang.String newSuffix)
Set the uid suffix for IcebergSink operators. Note that IcebergSink internally consists of multiple operators (like writer, committer, aggregator). Actual operator uid will be appended with a suffix like "Sink Committer: $uidSuffix".Flink auto generates operator uid if not set explicitly. It is a recommended best-practice to set uid for all operators before deploying to production. Flink has an option to
pipeline.auto-generate-uid=falseto disable auto-generation and force explicit setting of all operator uid.Be careful with setting this for an existing job, because now we are changing the operator uid from an auto-generated one to this new value. When deploying the change with a checkpoint, Flink won't be able to restore the previous IcebergSink operator state (more specifically the committer operator state). You need to use
--allowNonRestoredStateto ignore the previous sink state. During restore IcebergSink state is used to check if last commit was actually successful or not.--allowNonRestoredStatecan lead to data loss if the Iceberg commit failed in the last completed checkpoint.- Parameters:
newSuffix- suffix for Flink sink operator uid and name- Returns:
IcebergSink.Builderto connect the iceberg table.
-
snapshotProperties
public IcebergSink.Builder snapshotProperties(java.util.Map<java.lang.String,java.lang.String> properties)
-
setSnapshotProperty
public IcebergSink.Builder setSnapshotProperty(java.lang.String property, java.lang.String value)
-
toBranch
public IcebergSink.Builder toBranch(java.lang.String branch)
-
append
public org.apache.flink.streaming.api.datastream.DataStreamSink<org.apache.flink.table.data.RowData> append()
Append the iceberg sink operators to write records to iceberg table.- Returns:
DataStreamSinkfor sink.
-
-