Skip to content

Kafka message keys and asynchronous batching #325

@vshlapakov

Description

@vshlapakov

Looking to _send_upstream() and create_message_set() methods and doing some tests, I found that for all messages of the batch it uses the same key from last message of the batch. It leads to the following behaviour:

>>> p = KeyedProducer(c, async=True, batch_send=True)
>>> for i in range(3):
             p.send_messages("test", "KEY%s" % i, "Data%s" % i)

And in Kafka you get following data:

/kafka-console-consumer.sh --topic test --zookeeper zk --property print.key=true
KEY2 Data0
KEY2 Data1
KEY2 Data2

Probably it makes sense for compressed batches when we need to use some arbitrary key for the compressed message itself, but it doesn't mean that we should create messages with wrong keys. I assume that for Kafka log compaction feature it can be harmful because this feature uses message keys actively to find duplicates and remove it after a while, so it can delete wrong messages.
Can somebody explain it or point me to the right direction?

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions