Skip to content

infinite loop in ensure_coordinator_ready when coordinator is unknown #2373

@pe55a

Description

@pe55a

we're facing an issue with kafka poll functionality and in particular we suspect that the culprit is ensure_coordinator_ready function called by the _coordinator.poll()

we're using robot framework so unfortunately we're not able to have a good amount of logs, but got these messages printed in an infinite loop:

10:36:19.658 INFO <BrokerConnection node_id=**** host=**** [IPv4 ('', )]>: connecting to **** [('', ) IPv4]
10:36:19.765 INFO <BrokerConnection node_id= host=
[IPv4 ('', )]>: Connection complete.
10:36:19.886 ERROR <BrokerConnection node_id= host=
[IPv4 ('', )]>: socket disconnected
10:36:19.900 INFO <BrokerConnection node_id= host=
[IPv4 ('****', ****)]>: Closing connection. KafkaConnectionError: socket disconnected
10:36:19.905 ERROR Error sending GroupCoordinatorRequest_v0 to node **** [KafkaConnectionError: socket disconnected]

After checking the kafka python code we noticed that the functions here
https://github.com/dpkp/kafka-python/blob/master/kafka/coordinator/base.py#L241C9-L241C33
doesn't have an exit point from the while loop and neither have an option to pass a timeout parameter.

Can this be improved/fixed?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions