Skip to content

Unable to bootstrap reliably from multi-A DNS record #1227

Closed
@zarnovican

Description

@zarnovican

I'm setting up clients to connect to one static DNS name, which resolves to all bootstrap brokers.

$ host kafka.example.com
kafka.example.com has address 10.97.159.7
kafka.example.com has address 10.86.166.163
kafka.example.com has address 10.91.152.180

This works fine when all brokers are up. When one of the Kafka brokers is down, connections to that IP will be refused (port is closed). Clients will then randomly (1/3 of time) fail to connect to Kafka. What I would expect is the client library to exhaust all IPs and fail to bootstrap only if all of them fail (or timeout).

Can you confirm whether this is a bug or feature ? If it is a feature and it is up to the user to handle retries during bootstrap, could you point me to a working example ? Or is the only way to HA bootstrap through listing multiple hostnames ?

From looking at the BrokerConnection class, I can see there code to fetch all IPs into _gai and also _gai_index to iterate over them. Yet, it still fails on the first failed IP (I don't understand code well enough).
It is further complicated by the fact that the same code is used to connect to "bootstrap" brokers as well as keeping broker connections up for producer/consumer.

Debug output from failed bootstrap

2017-09-21 18:07:06,870 INFO kafka.client Bootstrapping cluster metadata from [('kafka-test.nrgmntr.com', 9092, 0)]
2017-09-21 18:07:06,870 DEBUG kafka.client Attempting to bootstrap via node at kafka-test.nrgmntr.com:9092
2017-09-21 18:07:06,870 DEBUG kafka.conn <BrokerConnection node_id=bootstrap host=kafka-test.nrgmntr.com/kafka-test.nrgmntr.com port=9092>: creating new socket
2017-09-21 18:07:06,878 DEBUG kafka.conn <BrokerConnection node_id=bootstrap host=kafka-test.nrgmntr.com/54.78.71.60 port=9092>: setting socket option (6, 1, 1)
2017-09-21 18:07:06,878 INFO kafka.conn <BrokerConnection node_id=bootstrap host=kafka-test.nrgmntr.com/54.78.71.60 port=9092>: connecting to 54.78.71.60:9092
2017-09-21 18:07:06,939 ERROR kafka.conn Connect attempt to <BrokerConnection node_id=bootstrap host=kafka-test.nrgmntr.com/54.78.71.60 port=9092> returned error 111. Disconnecting.
2017-09-21 18:07:06,939 INFO kafka.conn <BrokerConnection node_id=bootstrap host=kafka-test.nrgmntr.com/54.78.71.60 port=9092>: Closing connection. ConnectionError: 111
2017-09-21 18:07:06,940 DEBUG kafka.conn <BrokerConnection node_id=bootstrap host=kafka-test.nrgmntr.com/54.78.71.60 port=9092>: reconnect backoff 0.0432616152865 after 1 failures
2017-09-21 18:07:06,940 ERROR kafka.client Unable to bootstrap from [('kafka-test.nrgmntr.com', 9092, 0)]
2017-09-21 18:07:06,940 ERROR root NoBrokersAvailable
Traceback (most recent call last):
  File "/home/zarnovic/git/emans/ansible/roles/kafka/files/bin/kafka_partitions.py", line 257, in <module>
    sys.exit(main())
  File "/home/zarnovic/git/emans/ansible/roles/kafka/files/bin/kafka_partitions.py", line 237, in main
    partitions = fetch_partitions(args)
  File "/home/zarnovic/git/emans/ansible/roles/kafka/files/bin/kafka_partitions.py", line 55, in fetch_partitions
    client = KafkaClient(bootstrap_servers=args.bootstrap.split(','))
  File "/home/zarnovic/git/kafka-python/kafka/client_async.py", line 221, in __init__
    self.config['api_version'] = self.check_version(timeout=check_timeout)
  File "/home/zarnovic/git/kafka-python/kafka/client_async.py", line 826, in check_version
    raise Errors.NoBrokersAvailable()
kafka.errors.NoBrokersAvailable: NoBrokersAvailable

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions