-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Closed
Milestone
Description
Looking to _send_upstream()
and create_message_set()
methods and doing some tests, I found that for all messages of the batch it uses the same key from last message of the batch. It leads to the following behaviour:
>>> p = KeyedProducer(c, async=True, batch_send=True)
>>> for i in range(3):
p.send_messages("test", "KEY%s" % i, "Data%s" % i)
And in Kafka you get following data:
/kafka-console-consumer.sh --topic test --zookeeper zk --property print.key=true
KEY2 Data0
KEY2 Data1
KEY2 Data2
Probably it makes sense for compressed batches when we need to use some arbitrary key for the compressed message itself, but it doesn't mean that we should create messages with wrong keys. I assume that for Kafka log compaction feature it can be harmful because this feature uses message keys actively to find duplicates and remove it after a while, so it can delete wrong messages.
Can somebody explain it or point me to the right direction?