Skip to content

Conversation

feiskyer
Copy link
Member

@feiskyer feiskyer commented Jul 1, 2019

In go-autorest SDK https://github.com/Azure/go-autorest/blob/master/autorest/sender.go#L242,
if ARM returns http.StatusTooManyRequests, the sender doesn't increase the retry attempt count,
hence the Azure clients will keep retrying forever until it gets a status code other than 429.

// don't count a 429 against the number of attempts
// so that we continue to retry until it succeeds
if resp == nil || resp.StatusCode != http.StatusTooManyRequests {
    attempt++
}

So this PR explicitly remove http.StatusTooManyRequests from autorest.StatusCodesForRetry. To reduce the API throttling issues, it also adds caches for vmss instances.

Fix #2124.

/cc @andyzhangx @nilo19

@k8s-ci-robot
Copy link
Contributor

@feiskyer: GitHub didn't allow me to request PR reviews from the following users: nilo19.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

In go-autorest SDK https://github.com/Azure/go-autorest/blob/master/autorest/sender.go#L242,
if ARM returns http.StatusTooManyRequests, the sender doesn't increase the retry attempt count,
hence the Azure clients will keep retrying forever until it gets a status code other than 429.

  			// don't count a 429 against the number of attempts
  			// so that we continue to retry until it succeeds
  			if resp == nil || resp.StatusCode != http.StatusTooManyRequests {
  				attempt++
  			}

So this PR explicitly remove http.StatusTooManyRequests from autorest.StatusCodesForRetry. To reduce the API throttling issues, it also adds caches for vmss instances.

Fix #2124.

/cc @andyzhangx @nilo19

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot requested a review from andyzhangx July 1, 2019 00:54
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 1, 2019
@andyzhangx
Copy link
Member

@feiskyer In azure cloud provider, we would still retry after hit 409, since it's exponential backoff, is it safe in azure cloud provider code?

		// HTTP 4xx (except 412) or 5xx suggests we should retry.
		if 399 < resp.StatusCode && resp.StatusCode < 600 {
			return true
		}
``

@feiskyer
Copy link
Member Author

feiskyer commented Jul 1, 2019

@feiskyer In azure cloud provider, we would still retry after hit 409, since it's exponential backoff, is it safe in azure cloud provider code?

Yep, it's safe from SDK's client side. CA doesn't need backoff retry since it would run again later. The issue is from go-autorest internal implementations, we should also add some fixes in the Kubernetes cloud provider.

By the way, the error code here is 429, not 409.

Copy link
Member

@andyzhangx andyzhangx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 2, 2019
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 3, 2019
@feiskyer
Copy link
Member Author

feiskyer commented Jul 3, 2019

Rebased to solve the conflicts

@feiskyer
Copy link
Member Author

feiskyer commented Jul 4, 2019

@andyzhangx @losipiuk Could you help to approve the changes?

Copy link
Member

@andyzhangx andyzhangx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 4, 2019
@losipiuk
Copy link
Contributor

losipiuk commented Jul 4, 2019

/lgtm
/approve
Thanks!

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: losipiuk

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 4, 2019
@k8s-ci-robot k8s-ci-robot merged commit 4a9583b into kubernetes:master Jul 4, 2019
@feiskyer feiskyer deleted the fix-stuck branch July 5, 2019 01:25
k8s-ci-robot added a commit that referenced this pull request Jul 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

autoscaler v1.14.2 is not working with Azure VMSS and Kubernetes 1.14.1

4 participants