Skip to content

Conversation

feiskyer
Copy link
Member

@feiskyer feiskyer commented Jul 1, 2019

This PR allows scaling multiple Azure vmss synchronously by delaying the vmss capacity updates in different goroutines.

To make it work, the upcoming nodes from getUpcomingNodeInfos() are also changed to different names, so that multiple nodes won't be merged as one node in filterOutSchedulableByPacking().

Partially fix #2044 (similar node issues are tracked at #2094)
Fix #1984

/cc @andyzhangx @nilo19

@k8s-ci-robot
Copy link
Contributor

@feiskyer: GitHub didn't allow me to request PR reviews from the following users: nilo19.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

This PR allows scaling multiple Azure vmss synchronously by delaying the vmss capacity updates in different goroutines.

To make it work, the upcoming nodes from getUpcomingNodeInfos() are also changed to different names, so that multiple nodes won't be merged as one node in filterOutSchedulableByPacking().

Partially fix #2044 (similar node issues are tracked at #2094)
Fix #1984

/cc @andyzhangx @nilo19

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot requested a review from andyzhangx July 1, 2019 01:15
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 1, 2019
Copy link
Member

@andyzhangx andyzhangx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR title should be Allow scaling multiple Azure vmss simultaneously, actually you are using asynchronous way

upcomingNodes = append(upcomingNodes, nodeTemplate.Node())
// Ensure new nodes having different names because nodeName would used as a map key.
node := nodeTemplate.Node().DeepCopy()
node.Name = fmt.Sprintf("%s-%d", node.Name, rand.Int63())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this affect non-Azure nodes as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it would affect non-Azure nodes, but I think it should be fixed for all cloud providers.

@losipiuk @mwielgus Could you help to take a look at this?

Copy link
Contributor

@losipiuk losipiuk Jul 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like serious bug. Thanks for spotting and fixing that.
One note here. Could you please use counter (increased atomically using AddUint64) instead random value.
Also could we mutate UID too? Probably we should extract a helper function for that buildNodeForNodeTemplate?

cc: @vivekbagade. Vivek could you please scan the code and see if there are no more places where we use node name as dictionary key.
We could also add a sanity check in filterOutSchedulableUsingPacking to detect situation when it gets list of nodes with repeating names.
Also probably we should use UID (not name) as map key.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use node name as a dictionary key in a few places. Need to check if they could be causing any issues. Even if they are not, we probably should change this to avoid future issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because of CreateNodeNameToInfoMap func that could potentially mask a few nodes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, would update the PR.

cc: @vivekbagade. Vivek could you please scan the code and see if there are no more places where we use node name as dictionary key.
We could also add a sanity check in filterOutSchedulableUsingPacking to detect situation when it gets list of nodes with repeating names.
Also probably we should use UID (not name) as map key.

That looks good. We can do the check and optimization after the code scan.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please use counter (increased atomically using AddUint64) instead random value.

Actually, the index would be ok. The node name here only used for this single filterOutSchedulableUsingPacking() step.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@losipiuk Updated, PTAL

// Invalidate the vmss size cache, so that it would be got by API.
scaleSet.mutex.Lock()
defer scaleSet.mutex.Unlock()
scaleSet.lastRefresh = time.Now().Add(-1 * 15 * time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain why we are setting last refresh to -15 seconds from Now()? Also, why not use Sub()? https://golang.org/pkg/time/#Time.Sub

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

15 second is from L122, I renamed it to vmssSizeRefreshPeriod in PR #2151, but forgot to change here. Let me change it also to vmssSizeRefreshPeriod for clear.

Sub is a different use case, it accepts a param with time.Time, not time.Duration.

Copy link
Member

@andyzhangx andyzhangx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
LGTM on azure part

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 2, 2019
return found && oldest.Add(unschedulablePodWithGpuTimeBuffer).After(currentTime)
}

func buildNodeForNodeTemplate(nodeTemplate *schedulernodeinfo.NodeInfo, index int) *apiv1.Node {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.
@vivekbagade is doing some testing on this (thanks!).
I will LGTM when we are done with that.

Copy link
Member

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@vivekbagade
Copy link
Contributor

@losipiuk My testing is done. Works as expected. LGTM from my side.

@losipiuk
Copy link
Contributor

losipiuk commented Jul 3, 2019

/lgtm
/approve

@losipiuk
Copy link
Contributor

losipiuk commented Jul 3, 2019

Thanks!

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: losipiuk

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 3, 2019
@k8s-ci-robot k8s-ci-robot merged commit a342a76 into kubernetes:master Jul 3, 2019
@feiskyer feiskyer deleted the bulk-scale-up branch July 3, 2019 12:26
@feiskyer
Copy link
Member Author

feiskyer commented Jul 3, 2019

@losipiuk Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Azure CA doesn't scale multiple agent pools in parallel bulk scale-up in azure creates only one node per iteration sometimes
6 participants