Skip to content

propagate scope in async failures #3950

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

igormq
Copy link

@igormq igormq commented Jun 7, 2025

Fix trace context loss in async Kafka error handling

This PR addresses an issue where the trace context is lost when handling Kafka message failures asynchronously.

Problem

When async returns are enabled and a consumer failure occurs, the trace context from the original message is not propagated. This leads to each step of the retry/DLT flow starting a new trace instead of continuing the original one.

Example (current behavior):
• Producer → trace 1
• Consumer → trace 1, fails → message goes to retry topic
• Retry listener → trace 2, fails → message goes to DLT topic
• DLT listener → trace 3

This breaks end-to-end traceability, as each listener receives a new trace ID.

Root cause

The issue stems from the handleAsyncFailure method, which runs in a different thread but does not propagate the original Observation (trace) context associated with the failed record.

Fix

Ensure that the observation context is correctly propagated when handling async failures. This preserves the trace ID across retry and DLT flows.

🔧 Tested using version 3.3.6 so I could build and validate the JAR in a real-world project.

@igormq igormq changed the base branch from main to 3.3.x June 7, 2025 09:38
Copy link
Member

@artembilan artembilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see the fix issued against main.
And please, follow a DCO requirements.

@igormq igormq force-pushed the propagate-scope-in-async-failures branch from bfb8f6d to b4be8a3 Compare June 10, 2025 10:56
@igormq igormq changed the base branch from 3.3.x to main June 10, 2025 10:57
@igormq igormq force-pushed the propagate-scope-in-async-failures branch from b4be8a3 to 73aeaaf Compare June 10, 2025 10:59
@igormq
Copy link
Author

igormq commented Jun 10, 2025

I'd like to see the fix issued against main. And please, follow a DCO requirements.

done!

@igormq igormq requested a review from artembilan June 10, 2025 10:59
@igormq igormq force-pushed the propagate-scope-in-async-failures branch 5 times, most recently from 2fea4fd to e604802 Compare June 11, 2025 11:19
@igormq igormq requested a review from artembilan June 11, 2025 11:38
@artembilan
Copy link
Member

OK. The logic is like this:

  1. The MessagingMessageListenerAdapter does this in its handleResult() on the completableFutureResult.whenComplete:
				else {
					Throwable cause = t instanceof CompletionException ? t.getCause() : t;
					observation.error(cause);
					asyncFailure(request, acknowledgment, consumer, cause, source);
				}
  1. That asyncFailure() calls the mentioned asyncRetryCallback with an implementation in the KafkaMessageListenerContainer like this:
		private void callbackForAsyncFailure(ConsumerRecord<K, V> cRecord, RuntimeException ex) {
			this.failedRecords.addLast(new FailedRecordTuple<>(cRecord, ex));
		}
  1. We probably can propagate that observation from the handleResult() down to the asyncFailure. Open scope there.
  2. And get access to the currentObservation from the mentioned callbackForAsyncFailure to be populated to that FailedRecordTuple.
  3. This way KafkaMessageListenerContainer.handleAsyncFailure() would be able to restore an observation from the tuple to in that invokeErrorHandlerBySingleRecord().

Not sure if that is a goal of your solution.
WDYT?

@igormq igormq force-pushed the propagate-scope-in-async-failures branch from e604802 to 03ad634 Compare June 11, 2025 16:30
@igormq
Copy link
Author

igormq commented Jun 11, 2025

@artembilan , does it make sense what i did?

OK. The logic is like this:

  1. The MessagingMessageListenerAdapter does this in its handleResult() on the completableFutureResult.whenComplete:
				else {
					Throwable cause = t instanceof CompletionException ? t.getCause() : t;
					observation.error(cause);
					asyncFailure(request, acknowledgment, consumer, cause, source);
				}
  1. That asyncFailure() calls the mentioned asyncRetryCallback with an implementation in the KafkaMessageListenerContainer like this:
		private void callbackForAsyncFailure(ConsumerRecord<K, V> cRecord, RuntimeException ex) {
			this.failedRecords.addLast(new FailedRecordTuple<>(cRecord, ex));
		}
  1. We probably can propagate that observation from the handleResult() down to the asyncFailure. Open scope there.
  2. And get access to the currentObservation from the mentioned callbackForAsyncFailure to be populated to that FailedRecordTuple.
  3. This way KafkaMessageListenerContainer.handleAsyncFailure() would be able to restore an observation from the tuple to in that invokeErrorHandlerBySingleRecord().

Not sure if that is a goal of your solution. WDYT?

@artembilan thank you a lot for the feedback. this is exactly what i was trying to achieve! made the changes accordingly!

@igormq igormq force-pushed the propagate-scope-in-async-failures branch 2 times, most recently from addcdae to 583a1e7 Compare June 12, 2025 10:33
Signed-off-by: Igor Macedo Quintanilha <[email protected]>
@igormq igormq force-pushed the propagate-scope-in-async-failures branch from 583a1e7 to 3e9fed6 Compare June 12, 2025 10:54
@igormq igormq requested a review from artembilan June 12, 2025 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants