Skip to content

Getting Max retries exceeded with url, sporadically #1766

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jbalonso opened this issue Feb 1, 2016 · 1 comment
Closed

Getting Max retries exceeded with url, sporadically #1766

jbalonso opened this issue Feb 1, 2016 · 1 comment
Labels
closing-soon This issue will automatically close in 4 days unless further comments are made. guidance Question that needs advice or information.

Comments

@jbalonso
Copy link

jbalonso commented Feb 1, 2016

Despite appearances, this matter is distinct from #823.

I have a collection of automated jobs that invoke aws ec2 and aws s3 commands nightly on three hosts on EC2. Sporadically (once every three days or so), these jobs fail with the following error:

HTTPSConnectionPool(host='ec2.us-east-1.amazonaws.com', port=443): Max retries exceeded with url: / (Caused by <class 'socket.gaierror'>: [Errno -2] Name or service not known)

The manifestation of this error in #823 seems to be user errors in the form of an invalid region, but this case, I have a configuration that usually works. In fact, I "solved" this by simply having my jobs retry. Timeline:

  • 03:12:39 EST: failure
  • 03:12:48 EST: failure
  • 03:13:13 EST: success

Additional note: I am using IAM roles for my EC2 instances.

--debug output: unavailable, difficult to reproduce; will look into creating a trap

Version string: aws-cli/1.2.9 Python/3.4.3 Linux/3.13.0-32-generic (CLI is installed via the apt package for Ubuntu Trusty)

I confess that this might be an issue with DNS load balancing for the API (or DNS resolution in my VPC), but I feel compelled to create this ticket somewhere.

@jamesls
Copy link
Member

jamesls commented Feb 25, 2016

The debug logs would be really helpful here. The CLI should already be retrying this up to 5 times before giving up:

$ aws ec2 describe-instances --region does-not-exist --debug 2>&1 | egrep 'Error|sleeping'
2016-02-25 13:52:54,995 - MainThread - botocore.endpoint - DEBUG - ConnectionError received when sending HTTP request.
    raise ConnectionError(err, request=request)
ConnectionError: ('Connection aborted.', gaierror(8, 'nodename nor servname provided, or not known'))
EndpointConnectionError: Could not connect to the endpoint URL: "https://ec2.does-not-exist.amazonaws.com/"
2016-02-25 13:52:54,998 - MainThread - botocore.endpoint - DEBUG - Response received to retry, sleeping for 0.676930879677 seconds
2016-02-25 13:52:55,682 - MainThread - botocore.endpoint - DEBUG - ConnectionError received when sending HTTP request.
    raise ConnectionError(err, request=request)
ConnectionError: ('Connection aborted.', gaierror(8, 'nodename nor servname provided, or not known'))
EndpointConnectionError: Could not connect to the endpoint URL: "https://ec2.does-not-exist.amazonaws.com/"
2016-02-25 13:52:55,683 - MainThread - botocore.endpoint - DEBUG - Response received to retry, sleeping for 0.0292999609202 seconds
2016-02-25 13:52:55,720 - MainThread - botocore.endpoint - DEBUG - ConnectionError received when sending HTTP request.
    raise ConnectionError(err, request=request)
ConnectionError: ('Connection aborted.', gaierror(8, 'nodename nor servname provided, or not known'))
EndpointConnectionError: Could not connect to the endpoint URL: "https://ec2.does-not-exist.amazonaws.com/"
2016-02-25 13:52:55,721 - MainThread - botocore.endpoint - DEBUG - Response received to retry, sleeping for 0.444361662946 seconds
2016-02-25 13:52:56,169 - MainThread - botocore.endpoint - DEBUG - ConnectionError received when sending HTTP request.
    raise ConnectionError(err, request=request)
ConnectionError: ('Connection aborted.', gaierror(8, 'nodename nor servname provided, or not known'))
EndpointConnectionError: Could not connect to the endpoint URL: "https://ec2.does-not-exist.amazonaws.com/"
2016-02-25 13:52:56,170 - MainThread - botocore.endpoint - DEBUG - Response received to retry, sleeping for 6.77161567732 seconds
2016-02-25 13:53:02,945 - MainThread - botocore.endpoint - DEBUG - ConnectionError received when sending HTTP request.
    raise ConnectionError(err, request=request)
ConnectionError: ('Connection aborted.', gaierror(8, 'nodename nor servname provided, or not known'))
EndpointConnectionError: Could not connect to the endpoint URL: "https://ec2.does-not-exist.amazonaws.com/"

In this case, there's not much more we can do, other than possibly giving an option to bump up the max number of retries.

Let us know if you're still seeing the issue, and whether or not you've been able to get any debug logs.

@jamesls jamesls added question closing-soon This issue will automatically close in 4 days unless further comments are made. labels Feb 25, 2016
@jamesls jamesls closed this as completed Mar 7, 2016
@diehlaws diehlaws added guidance Question that needs advice or information. and removed question labels Jan 4, 2019
thoward-godaddy pushed a commit to thoward-godaddy/aws-cli that referenced this issue Feb 12, 2022
Fixes aws#1706 by recognizing the CodeUri global value in the package command.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closing-soon This issue will automatically close in 4 days unless further comments are made. guidance Question that needs advice or information.
Projects
None yet
Development

No branches or pull requests

3 participants