Skip to content

Avoid unnecessary partition seeks on successful record recovery #2195

@v-chernyshev

Description

@v-chernyshev

I've reordered the sections for this request because, in my opinion, it is easier to understand the desired behaviour when the limitations of the current approach are explained first.

Current Behavior

Let's consider a situation where max.poll.records (500 by default) are available for consumption in a non-transactional record listener container, but the very first record cannot be processed immediately due to e.g. an issue with an external data provider. Here is what happens in the code in case of, at the very least, non-blocking retries:

  • KafkaMessageListenerContainer.ListenerConsumer.doInvokeRecordListener is eventually invoked with record pointing to the first record and iterator holding the remaining 499.
  • this.invokeOnMessage fails with an exception.
  • An error handler is installed, so this.invokeErrorHandler is called.
  • this.commonErrorHandler.remainingRecords() returns true. Both the failed record and all the remaining records from the iterator are drained into a temporary array, this.commonErrorHandler.handleRemaining leads to DefaultErrorHandler.handleRemaining that performs SeekUtils.seekOrRecover.
  • An attempt is made to recover the failed record.
  • No matter whether the recovery operation succeeds or not, the rest of the batch is rewound in seekPartitions.

Unfortunately, this unconditional rewind operation leads to enormous spikes in network I/O as pretty much the same records are requested from Kafka over and over again. To be precise, with a local Kafka broker and a test application that always triggers a recovery I could see spikes exceeding 2000 Mbps in iptraf.

Expected Behavior

I believe that it is not necessary to always rewind the remaining 499 records if the recovery is successful. This is what may happen with the failed record when non-blocking retries are enabled:

  • If the record cannot be processed due to a fatal exception (e.g. ClassCastException) then it is sent straight into the DLT topic.
  • If the exception is not fatal, then the record is sent into the next retry destination.

Both these cases allow the next iterator record to be processed immediately as if nothing happened. This very same record will be at the front of the batch during the next poll invocation anyway! The only special case I can think of is KafkaBackoffException, which does absolutely require rewinding all the offsets so that the same batch is consumed again when the affected partitions are resumed.

Context

The production rollout of our service that uses the non-blocking retries feature provided by Spring Kafka triggered a number of network I/O alarms, so we decided to find the root cause. This feature request is the result of the investigation :)

Implementing a workaround for this issue is possible but, admittedly, quite tricky as invokeErrorHandler is a private API method. These are the steps that may be taken:

  • Implement a custom CommonErrorHandler that:
    • Exposes an API that allows remainingRecords to return either true or false.
    • Overrides handleRecord in a way that delegates to handleRemaining of the default error handler with a singleton list as the second argument.
  • Implement a custom RecordInterceptor with an overridden failure method that uses the above API to make remainingRecords return true only if SeekUtils.isBackoffException is true.

ListenerContainerFactoryConfigurer.setContainerCustomizer may then be used to tie things together, e.g.:

@Bean(name = RetryTopicInternalBeanNames.LISTENER_CONTAINER_FACTORY_CONFIGURER_NAME)
public ListenerContainerFactoryConfigurer listenerContainerFactoryConfigurer(
        KafkaConsumerBackoffManager kafkaConsumerBackoffManager,
        DeadLetterPublishingRecovererFactory deadLetterPublishingRecovererFactory,
        @Qualifier(RetryTopicInternalBeanNames.INTERNAL_BACKOFF_CLOCK_BEAN_NAME) Clock clock) {
    final var configurer = new ListenerContainerFactoryConfigurer(...);

    configurer.setContainerCustomizer(container -> {
        final var customErrorHandler = new CustomErrorHandler(container.getCommonErrorHandler());
        container.setRecordInterceptor(new CustomRecordInterceptor<>(customErrorHandler));
        container.setCommonErrorHandler(customErrorHandler);
    });

    return configurer;
}

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions