Kafka message keys and asynchronous batching

Looking to `_send_upstream()` and `create_message_set()` methods and doing some tests, I found that for all messages of the batch it uses the same key from last message of the batch. It leads to the following behaviour:

```
>>> p = KeyedProducer(c, async=True, batch_send=True)
>>> for i in range(3):
             p.send_messages("test", "KEY%s" % i, "Data%s" % i)
```

And in Kafka you get following data:

```
/kafka-console-consumer.sh --topic test --zookeeper zk --property print.key=true
KEY2 Data0
KEY2 Data1
KEY2 Data2
```

Probably it makes sense for compressed batches when we need to use some arbitrary key for the compressed message itself, but it doesn't mean that we should create messages with wrong keys. I assume that for Kafka log compaction feature it can be harmful because this feature uses message keys actively to find duplicates and remove it after a while, so it can delete wrong messages.
Can somebody explain it or point me to the right direction?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Kafka message keys and asynchronous batching #325

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Kafka message keys and asynchronous batching #325

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions