Skip to content

Add ability to start reading from a custom offset in KafkaItemReader #737

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

callard71
Copy link

Hi,

The problem is that the reader overrides the fetch offsets even when we don't want to rely on the saved state from the execution context.

By removing the initialization of the partition offsets list, the customer relies on the values stored in the broker.

If the topic has not been read yet, the customer can now use the "auto.offset.reset" configuration to apply the corresponding behavior.

@fmbenhassine
Copy link
Contributor

fmbenhassine commented Jul 7, 2020

Hi @callard71 ,

Thank you for this PR. Two tests are failing with this change set (testReadFromSinglePartition and testReadFromMultiplePartitions) unless I add consumerProperties.setProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest"); to the consumer properties, which means this is a breaking change.

The code in this PR initializes the partition offsets to an empty map (to rely on the offset stored in Kafka) which is a good point but removes the default initialization that makes the reader start from the beginning. We need to keep the default behaviour (read from the beginning) but offer the possibility to start from a given offset (a custom one or the one stored in Kafka). We can do this by providing a setter for partitionOffsets:

/**
 * Setter for partition topics. This mapping tells the reader the offset to start
 * reading from in each partition. This is optional, defaults to starting from
 * offset 0 in each partition. Passing an empty map makes the reader start
 * from the offset stored in Kafka for the consumer group ID.
 * 
 * @param partitionOffsets mapping of starting offset in each partition
 */
public void setPartitionOffsets(Map<TopicPartition, Long> partitionOffsets) {
	this.partitionOffsets = partitionOffsets;
}

This behaviour is consistent with all readers that inherit from AbstractItemCountingItemStreamItemReader (ie flat file, xml, json, jdbc, jpa, etc) where the default is to read from the beginning of the datasource, with the ability to start reading from a given position thanks to setCurrentItemCount.

For consistency, I suggest to make the kafka item reader configurable in the same way and using the same default. The initialization code could be updated to something like this:

--this.partitionOffsets = new HashMap<>();
--for (TopicPartition topicPartition : this.topicPartitions) {
--	this.partitionOffsets.put(topicPartition, 0L);
--}
++if (this.partitionOffsets == null) {
++	this.partitionOffsets = new HashMap<>();
++	for (TopicPartition topicPartition : this.topicPartitions) {
++		this.partitionOffsets.put(topicPartition, 0L);
++	}
++}

This change is backward compatible and allows to:

  • Start from a custom offset if needed (See this test)
  • Start from the offset stored in Kafka by passing an empty mapping (like suggested in this PR, see this test)
  • Start reading from the beginning by default
  • In case of a restart, the value from the execution context takes precedence (consistent with other readers, see javadoc)

What do you think? If you agree, please update the PR accordingly and it should be good to merge. Otherwise please let me know and I can apply the change if you agree.

@fmbenhassine fmbenhassine added the status: waiting-for-reporter Issues for which we are waiting for feedback from the reporter label Jul 7, 2020
@fmbenhassine
Copy link
Contributor

Hi @callard71 ,

Did you get a chance to review my previous comment? If you agree on the changes, I can take care of updating the code accordingly.

Looking forward to your feedback.

@callard71
Copy link
Author

Hi,

It's a good catch. Yes I agree, since I'm no more setup I will let you do the small changes.

Thanks again !
Christian

@fmbenhassine
Copy link
Contributor

Hi @callard71 ,

Thank you for your feedback. I applied the changes as discussed in 15a393b.

@fmbenhassine fmbenhassine removed the status: waiting-for-reporter Issues for which we are waiting for feedback from the reporter label Aug 10, 2020
@fmbenhassine fmbenhassine changed the title Ability to rely on the offset saved in kafka Add ability to start reading from a custom offset in KafkaItemReader Aug 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants