Getting Max retries exceeded with url, sporadically #1766

jbalonso · 2016-02-01T22:28:07Z

Despite appearances, this matter is distinct from #823.

I have a collection of automated jobs that invoke aws ec2 and aws s3 commands nightly on three hosts on EC2. Sporadically (once every three days or so), these jobs fail with the following error:

HTTPSConnectionPool(host='ec2.us-east-1.amazonaws.com', port=443): Max retries exceeded with url: / (Caused by <class 'socket.gaierror'>: [Errno -2] Name or service not known)

The manifestation of this error in #823 seems to be user errors in the form of an invalid region, but this case, I have a configuration that usually works. In fact, I "solved" this by simply having my jobs retry. Timeline:

03:12:39 EST: failure
03:12:48 EST: failure
03:13:13 EST: success

Additional note: I am using IAM roles for my EC2 instances.

--debug output: unavailable, difficult to reproduce; will look into creating a trap

Version string: aws-cli/1.2.9 Python/3.4.3 Linux/3.13.0-32-generic (CLI is installed via the apt package for Ubuntu Trusty)

I confess that this might be an issue with DNS load balancing for the API (or DNS resolution in my VPC), but I feel compelled to create this ticket somewhere.

The text was updated successfully, but these errors were encountered:

jamesls · 2016-02-25T21:56:04Z

The debug logs would be really helpful here. The CLI should already be retrying this up to 5 times before giving up:

$ aws ec2 describe-instances --region does-not-exist --debug 2>&1 | egrep 'Error|sleeping'
2016-02-25 13:52:54,995 - MainThread - botocore.endpoint - DEBUG - ConnectionError received when sending HTTP request.
    raise ConnectionError(err, request=request)
ConnectionError: ('Connection aborted.', gaierror(8, 'nodename nor servname provided, or not known'))
EndpointConnectionError: Could not connect to the endpoint URL: "https://ec2.does-not-exist.amazonaws.com/"
2016-02-25 13:52:54,998 - MainThread - botocore.endpoint - DEBUG - Response received to retry, sleeping for 0.676930879677 seconds
2016-02-25 13:52:55,682 - MainThread - botocore.endpoint - DEBUG - ConnectionError received when sending HTTP request.
    raise ConnectionError(err, request=request)
ConnectionError: ('Connection aborted.', gaierror(8, 'nodename nor servname provided, or not known'))
EndpointConnectionError: Could not connect to the endpoint URL: "https://ec2.does-not-exist.amazonaws.com/"
2016-02-25 13:52:55,683 - MainThread - botocore.endpoint - DEBUG - Response received to retry, sleeping for 0.0292999609202 seconds
2016-02-25 13:52:55,720 - MainThread - botocore.endpoint - DEBUG - ConnectionError received when sending HTTP request.
    raise ConnectionError(err, request=request)
ConnectionError: ('Connection aborted.', gaierror(8, 'nodename nor servname provided, or not known'))
EndpointConnectionError: Could not connect to the endpoint URL: "https://ec2.does-not-exist.amazonaws.com/"
2016-02-25 13:52:55,721 - MainThread - botocore.endpoint - DEBUG - Response received to retry, sleeping for 0.444361662946 seconds
2016-02-25 13:52:56,169 - MainThread - botocore.endpoint - DEBUG - ConnectionError received when sending HTTP request.
    raise ConnectionError(err, request=request)
ConnectionError: ('Connection aborted.', gaierror(8, 'nodename nor servname provided, or not known'))
EndpointConnectionError: Could not connect to the endpoint URL: "https://ec2.does-not-exist.amazonaws.com/"
2016-02-25 13:52:56,170 - MainThread - botocore.endpoint - DEBUG - Response received to retry, sleeping for 6.77161567732 seconds
2016-02-25 13:53:02,945 - MainThread - botocore.endpoint - DEBUG - ConnectionError received when sending HTTP request.
    raise ConnectionError(err, request=request)
ConnectionError: ('Connection aborted.', gaierror(8, 'nodename nor servname provided, or not known'))
EndpointConnectionError: Could not connect to the endpoint URL: "https://ec2.does-not-exist.amazonaws.com/"

In this case, there's not much more we can do, other than possibly giving an option to bump up the max number of retries.

Let us know if you're still seeing the issue, and whether or not you've been able to get any debug logs.

Fixes aws#1706 by recognizing the CodeUri global value in the package command.

jamesls added question closing-soon This issue will automatically close in 4 days unless further comments are made. labels Feb 25, 2016

jamesls closed this as completed Mar 7, 2016

diehlaws added guidance Question that needs advice or information. and removed question labels Jan 4, 2019

thoward-godaddy pushed a commit to thoward-godaddy/aws-cli that referenced this issue Feb 12, 2022

fix: Handle global value for Function CodeUri in Package (aws#1766)

0c50871

Fixes aws#1706 by recognizing the CodeUri global value in the package command.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Getting Max retries exceeded with url, sporadically #1766

Getting Max retries exceeded with url, sporadically #1766

jbalonso commented Feb 1, 2016

jamesls commented Feb 25, 2016

Uh oh!

Getting Max retries exceeded with url, sporadically #1766

Getting Max retries exceeded with url, sporadically #1766

Comments

jbalonso commented Feb 1, 2016

jamesls commented Feb 25, 2016

Uh oh!