From 28213d39ca68361092649bacf0628ce6f3c87241 Mon Sep 17 00:00:00 2001 From: WanYixian Date: Wed, 28 May 2025 14:30:25 +0800 Subject: [PATCH 1/2] add exactly-once --- integrations/destinations/apache-iceberg.mdx | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/integrations/destinations/apache-iceberg.mdx b/integrations/destinations/apache-iceberg.mdx index bb34e1ee..8d79833c 100644 --- a/integrations/destinations/apache-iceberg.mdx +++ b/integrations/destinations/apache-iceberg.mdx @@ -257,6 +257,19 @@ WITH ( Currently, RisingWave only supports Iceberg tables in format v2. +## Exactly-once delivery + +Exactly-once semantics ensures that each record is processed and written exactly once, even in the event of system failures or retries. + +For Iceberg sinks, this is achieved through a two-phase commit protocol: + +- **Pre-commit phase**: Incoming data is staged and validated. This step ensures that all necessary data are correctly prepared and consistent before committing. + +- **Commit phase**: Once all pre-commit steps succeed, the data is atomically committed to the Iceberg table. This guarantees that the operation is either fully applied or not applied at all. + +If a failure occurs before the commit, the system can safely retry without introducing duplicates, as Iceberg's commits are **idempotent**. This ensures every record hits the Iceberg sink exactly once, regardless of restarts or failures, maintaining data integrity and consistency across the pipeline. + + ## Examples This section includes several examples that you can use if you want to quickly experiment with sinking data to Iceberg. From a61567636b6dbeaa8331732794c0b3e73af0d026 Mon Sep 17 00:00:00 2001 From: WanYixian Date: Thu, 5 Jun 2025 11:15:27 +0800 Subject: [PATCH 2/2] remove implementation details and add note --- integrations/destinations/apache-iceberg.mdx | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/integrations/destinations/apache-iceberg.mdx b/integrations/destinations/apache-iceberg.mdx index 8d79833c..c541867b 100644 --- a/integrations/destinations/apache-iceberg.mdx +++ b/integrations/destinations/apache-iceberg.mdx @@ -259,16 +259,11 @@ Currently, RisingWave only supports Iceberg tables in format v2. ## Exactly-once delivery -Exactly-once semantics ensures that each record is processed and written exactly once, even in the event of system failures or retries. +RisingWave provides exactly-once delivery semantics for Iceberg sinks. This semantics guarantees that each data event is processed **once and only once**, even in the presence of failures such as retries or restarts. This level of delivery assurance is essential in scenarios where duplicate records can lead to incorrect analytics or data corruption in downstream systems. -For Iceberg sinks, this is achieved through a two-phase commit protocol: - -- **Pre-commit phase**: Incoming data is staged and validated. This step ensures that all necessary data are correctly prepared and consistent before committing. - -- **Commit phase**: Once all pre-commit steps succeed, the data is atomically committed to the Iceberg table. This guarantees that the operation is either fully applied or not applied at all. - -If a failure occurs before the commit, the system can safely retry without introducing duplicates, as Iceberg's commits are **idempotent**. This ensures every record hits the Iceberg sink exactly once, regardless of restarts or failures, maintaining data integrity and consistency across the pipeline. +Exactly-once delivery is achieved through a two-phase commit protocol involving a pre-commit phase and a commit phase. Iceberg’s commit operations are idempotent, which allows RisingWave to safely retry failed transactions without introducing duplicates. +By default, exactly-once semantics is disabled. To enable it for an Iceberg sink, include `is_exactly_once = 'true'` in the `WITH` clause of the sink definition. Note that enabling this option introduces additional coordination overhead due to metadata pre-commit, which may impact sink performance in high-throughput workloads. ## Examples