propagate scope in async failures #3950

igormq · 2025-06-07T09:38:13Z

Fix trace context loss in async Kafka error handling

This PR addresses an issue where the trace context is lost when handling Kafka message failures asynchronously.

Problem

When async returns are enabled and a consumer failure occurs, the trace context from the original message is not propagated. This leads to each step of the retry/DLT flow starting a new trace instead of continuing the original one.

Example (current behavior):
• Producer → trace 1
• Consumer → trace 1, fails → message goes to retry topic
• Retry listener → trace 2, fails → message goes to DLT topic
• DLT listener → trace 3

This breaks end-to-end traceability, as each listener receives a new trace ID.

Root cause

The issue stems from the handleAsyncFailure method, which runs in a different thread but does not propagate the original Observation (trace) context associated with the failed record.

Fix

Ensure that the observation context is correctly propagated when handling async failures. This preserves the trace ID across retry and DLT flows.

🔧 Tested using version 3.3.6 so I could build and validate the JAR in a real-world project.

...rc/main/java/org/springframework/kafka/listener/adapter/MessagingMessageListenerAdapter.java

artembilan

I'd like to see the fix issued against main.
And please, follow a DCO requirements.

igormq · 2025-06-10T10:59:40Z

I'd like to see the fix issued against main. And please, follow a DCO requirements.

done!

...ng-kafka/src/main/java/org/springframework/kafka/listener/KafkaMessageListenerContainer.java

...rc/main/java/org/springframework/kafka/listener/adapter/MessagingMessageListenerAdapter.java

spring-kafka/src/test/java/org/springframework/kafka/support/micrometer/ObservationTests.java

...rc/main/java/org/springframework/kafka/listener/adapter/MessagingMessageListenerAdapter.java

...ng-kafka/src/main/java/org/springframework/kafka/listener/KafkaMessageListenerContainer.java

artembilan · 2025-06-11T14:46:14Z

OK. The logic is like this:

The MessagingMessageListenerAdapter does this in its handleResult() on the completableFutureResult.whenComplete:

				else {
					Throwable cause = t instanceof CompletionException ? t.getCause() : t;
					observation.error(cause);
					asyncFailure(request, acknowledgment, consumer, cause, source);
				}

That asyncFailure() calls the mentioned asyncRetryCallback with an implementation in the KafkaMessageListenerContainer like this:

		private void callbackForAsyncFailure(ConsumerRecord<K, V> cRecord, RuntimeException ex) {
			this.failedRecords.addLast(new FailedRecordTuple<>(cRecord, ex));
		}

We probably can propagate that observation from the handleResult() down to the asyncFailure. Open scope there.
And get access to the currentObservation from the mentioned callbackForAsyncFailure to be populated to that FailedRecordTuple.
This way KafkaMessageListenerContainer.handleAsyncFailure() would be able to restore an observation from the tuple to in that invokeErrorHandlerBySingleRecord().

Not sure if that is a goal of your solution.
WDYT?

igormq · 2025-06-11T16:33:51Z

@artembilan , does it make sense what i did?

OK. The logic is like this:

The MessagingMessageListenerAdapter does this in its handleResult() on the completableFutureResult.whenComplete:
				else {
					Throwable cause = t instanceof CompletionException ? t.getCause() : t;
					observation.error(cause);
					asyncFailure(request, acknowledgment, consumer, cause, source);
				}
That asyncFailure() calls the mentioned asyncRetryCallback with an implementation in the KafkaMessageListenerContainer like this:
		private void callbackForAsyncFailure(ConsumerRecord<K, V> cRecord, RuntimeException ex) {
			this.failedRecords.addLast(new FailedRecordTuple<>(cRecord, ex));
		}
We probably can propagate that observation from the handleResult() down to the asyncFailure. Open scope there.

And get access to the currentObservation from the mentioned callbackForAsyncFailure to be populated to that FailedRecordTuple.

This way KafkaMessageListenerContainer.handleAsyncFailure() would be able to restore an observation from the tuple to in that invokeErrorHandlerBySingleRecord().

Not sure if that is a goal of your solution. WDYT?

@artembilan thank you a lot for the feedback. this is exactly what i was trying to achieve! made the changes accordingly!

...ng-kafka/src/main/java/org/springframework/kafka/listener/KafkaMessageListenerContainer.java

Signed-off-by: Igor Macedo Quintanilha <[email protected]>

igormq changed the base branch from main to 3.3.x June 7, 2025 09:38

igormq commented Jun 7, 2025

View reviewed changes

...rc/main/java/org/springframework/kafka/listener/adapter/MessagingMessageListenerAdapter.java Outdated Show resolved Hide resolved

artembilan requested changes Jun 9, 2025

View reviewed changes

igormq force-pushed the propagate-scope-in-async-failures branch from bfb8f6d to b4be8a3 Compare June 10, 2025 10:56

igormq changed the base branch from 3.3.x to main June 10, 2025 10:57

igormq force-pushed the propagate-scope-in-async-failures branch from b4be8a3 to 73aeaaf Compare June 10, 2025 10:59

igormq requested a review from artembilan June 10, 2025 10:59

artembilan requested changes Jun 10, 2025

View reviewed changes

igormq commented Jun 10, 2025

View reviewed changes

...rc/main/java/org/springframework/kafka/listener/adapter/MessagingMessageListenerAdapter.java Show resolved Hide resolved

artembilan requested changes Jun 10, 2025

View reviewed changes

...ng-kafka/src/main/java/org/springframework/kafka/listener/KafkaMessageListenerContainer.java Outdated Show resolved Hide resolved

igormq force-pushed the propagate-scope-in-async-failures branch 5 times, most recently from 2fea4fd to e604802 Compare June 11, 2025 11:19

igormq requested a review from artembilan June 11, 2025 11:38

artembilan requested changes Jun 11, 2025

View reviewed changes

...ng-kafka/src/main/java/org/springframework/kafka/listener/KafkaMessageListenerContainer.java Outdated Show resolved Hide resolved

igormq force-pushed the propagate-scope-in-async-failures branch from e604802 to 03ad634 Compare June 11, 2025 16:30

artembilan requested changes Jun 11, 2025

View reviewed changes

...ng-kafka/src/main/java/org/springframework/kafka/listener/KafkaMessageListenerContainer.java Outdated Show resolved Hide resolved

...ng-kafka/src/main/java/org/springframework/kafka/listener/KafkaMessageListenerContainer.java Outdated Show resolved Hide resolved

igormq force-pushed the propagate-scope-in-async-failures branch 2 times, most recently from addcdae to 583a1e7 Compare June 12, 2025 10:33

propagate scope in async failures

3e9fed6

Signed-off-by: Igor Macedo Quintanilha <[email protected]>

igormq force-pushed the propagate-scope-in-async-failures branch from 583a1e7 to 3e9fed6 Compare June 12, 2025 10:54

igormq requested a review from artembilan June 12, 2025 19:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

propagate scope in async failures #3950

propagate scope in async failures #3950

Uh oh!

igormq commented Jun 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

artembilan left a comment

Uh oh!

igormq commented Jun 10, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

artembilan commented Jun 11, 2025

Uh oh!

igormq commented Jun 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

propagate scope in async failures #3950

Are you sure you want to change the base?

propagate scope in async failures #3950

Uh oh!

Conversation

igormq commented Jun 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

artembilan left a comment

Choose a reason for hiding this comment

Uh oh!

igormq commented Jun 10, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

artembilan commented Jun 11, 2025

Uh oh!

igormq commented Jun 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

igormq commented Jun 7, 2025 •

edited

Loading