Class FlinkSink.Builder

  • Enclosing class:
    FlinkSink

    public static class FlinkSink.Builder
    extends java.lang.Object
    • Method Summary

      All Methods Instance Methods Concrete Methods Deprecated Methods 
      Modifier and Type Method Description
      org.apache.flink.streaming.api.datastream.DataStreamSink<java.lang.Void> append()
      Append the iceberg sink operators to write records to iceberg table.
      org.apache.flink.streaming.api.datastream.DataStreamSink<org.apache.flink.table.data.RowData> build()
      Deprecated.
      this will be removed in 0.14.0; use append() because its returned DataStreamSink has a more correct data type.
      FlinkSink.Builder distributionMode​(DistributionMode mode)
      Configure the write DistributionMode that the flink sink will use.
      FlinkSink.Builder equalityFieldColumns​(java.util.List<java.lang.String> columns)
      Configuring the equality field columns for iceberg table that accept CDC or UPSERT events.
      FlinkSink.Builder overwrite​(boolean newOverwrite)  
      FlinkSink.Builder table​(Table newTable)
      This iceberg Table instance is used for initializing IcebergStreamWriter which will write all the records into DataFiles and emit them to downstream operator.
      FlinkSink.Builder tableLoader​(TableLoader newTableLoader)
      The table loader is used for loading tables in IcebergFilesCommitter lazily, we need this loader because Table is not serializable and could not just use the loaded table from Builder#table in the remote task manager.
      FlinkSink.Builder tableSchema​(org.apache.flink.table.api.TableSchema newTableSchema)  
      FlinkSink.Builder uidPrefix​(java.lang.String newPrefix)
      Set the uid prefix for FlinkSink operators.
      FlinkSink.Builder upsert​(boolean enabled)
      All INSERT/UPDATE_AFTER events from input stream will be transformed to UPSERT events, which means it will DELETE the old records and then INSERT the new records.
      FlinkSink.Builder writeParallelism​(int newWriteParallelism)
      Configuring the write parallel number for iceberg stream writer.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • table

        public FlinkSink.Builder table​(Table newTable)
        This iceberg Table instance is used for initializing IcebergStreamWriter which will write all the records into DataFiles and emit them to downstream operator. Providing a table would avoid so many table loading from each separate task.
        Parameters:
        newTable - the loaded iceberg table instance.
        Returns:
        FlinkSink.Builder to connect the iceberg table.
      • tableLoader

        public FlinkSink.Builder tableLoader​(TableLoader newTableLoader)
        The table loader is used for loading tables in IcebergFilesCommitter lazily, we need this loader because Table is not serializable and could not just use the loaded table from Builder#table in the remote task manager.
        Parameters:
        newTableLoader - to load iceberg table inside tasks.
        Returns:
        FlinkSink.Builder to connect the iceberg table.
      • tableSchema

        public FlinkSink.Builder tableSchema​(org.apache.flink.table.api.TableSchema newTableSchema)
      • writeParallelism

        public FlinkSink.Builder writeParallelism​(int newWriteParallelism)
        Configuring the write parallel number for iceberg stream writer.
        Parameters:
        newWriteParallelism - the number of parallel iceberg stream writer.
        Returns:
        FlinkSink.Builder to connect the iceberg table.
      • upsert

        public FlinkSink.Builder upsert​(boolean enabled)
        All INSERT/UPDATE_AFTER events from input stream will be transformed to UPSERT events, which means it will DELETE the old records and then INSERT the new records. In partitioned table, the partition fields should be a subset of equality fields, otherwise the old row that located in partition-A could not be deleted by the new row that located in partition-B.
        Parameters:
        enabled - indicate whether it should transform all INSERT/UPDATE_AFTER events to UPSERT.
        Returns:
        FlinkSink.Builder to connect the iceberg table.
      • equalityFieldColumns

        public FlinkSink.Builder equalityFieldColumns​(java.util.List<java.lang.String> columns)
        Configuring the equality field columns for iceberg table that accept CDC or UPSERT events.
        Parameters:
        columns - defines the iceberg table's key.
        Returns:
        FlinkSink.Builder to connect the iceberg table.
      • uidPrefix

        public FlinkSink.Builder uidPrefix​(java.lang.String newPrefix)
        Set the uid prefix for FlinkSink operators. Note that FlinkSink internally consists of multiple operators (like writer, committer, dummy sink etc.) Actually operator uid will be appended with a suffix like "uidPrefix-writer".

        If provided, this prefix is also applied to operator names.

        Flink auto generates operator uid if not set explicitly. It is a recommended best-practice to set uid for all operators before deploying to production. Flink has an option to pipeline.auto-generate-uid=false to disable auto-generation and force explicit setting of all operator uid.

        Be careful with setting this for an existing job, because now we are changing the operator uid from an auto-generated one to this new value. When deploying the change with a checkpoint, Flink won't be able to restore the previous Flink sink operator state (more specifically the committer operator state). You need to use --allowNonRestoredState to ignore the previous sink state. During restore Flink sink state is used to check if last commit was actually successful or not. --allowNonRestoredState can lead to data loss if the Iceberg commit failed in the last completed checkpoint.
        Parameters:
        newPrefix - prefix for Flink sink operator uid and name
        Returns:
        FlinkSink.Builder to connect the iceberg table.
      • build

        @Deprecated
        public org.apache.flink.streaming.api.datastream.DataStreamSink<org.apache.flink.table.data.RowData> build()
        Deprecated.
        this will be removed in 0.14.0; use append() because its returned DataStreamSink has a more correct data type.
        Append the iceberg sink operators to write records to iceberg table.
        Returns:
        DataStreamSink for sink.
      • append

        public org.apache.flink.streaming.api.datastream.DataStreamSink<java.lang.Void> append()
        Append the iceberg sink operators to write records to iceberg table.
        Returns:
        DataStreamSink for sink.