Skip to content

Conversation

CecileRobertMichon
Copy link
Member

@CecileRobertMichon CecileRobertMichon commented Jun 5, 2019

Related to #2044

Currently, the cluster-autoscaler never puts two different agent pools in the scale-up plan when used with Azure clusters deployed with aks-engine or AKS. Each node has an "agentpool" label that identifies its node pool. This PR adds the "agentpool" label to the ignored labels of IsNodeInfoSimilar in order to allow two similar node groups with different "agentpool" labels to be detected.

Also added documentation on using --balance-similar-node-groups in the Azure README.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 5, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: piosz

If they are not already assigned, you can assign the PR to them by writing /assign @piosz in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jun 5, 2019
To allow scaling similar node pools simultaneously, or when using separate node groups per zone and to keep nodes balanced across zones, use the `--balance-similar-node-groups` flag. Add it to the `command` section to enable it:

```yaml
- --balance-similar-node-groups=true
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@feiskyer do you know if this flag defaults to true?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the default value is false.

@MaciekPytel
Copy link
Contributor

The whole point of making node group comparison a 'processor' (ie. hiding it behind an interface) was that such cloudprovider specific changes can be easily isolated. Please check this commit, which shows code addressing the exact same problem for GKE: 34a4262#diff-83aae18ad286ee688abea380ae132efa (the commit removes the relevant code, since GKE is now supported from fork and there was no reason to keep GKE specific logic in main repo, but the code still illustrates our preferred pattern).

@feiskyer
Copy link
Member

feiskyer commented Jun 5, 2019

@CecileRobertMichon Could you update the PR according to 34a4262#diff-83aae18ad286ee688abea380ae132efa?

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jun 6, 2019
processors.PodListProcessor = core.NewFilterOutSchedulablePodListProcessor()
if autoscalingOptions.CloudProviderName == azure.ProviderName {
processors.NodeGroupSetProcessor = &nodegroupset.BalancingNodeGroupSetProcessor{
Comparator: nodegroupset.IsAzureNodeInfoSimilar}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this overrides the node comparator for Azure

return true
}

func compareLabels(nodes []*schedulernodeinfo.NodeInfo, ignoredLabels map[string]bool) bool {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored this into a function

// somewhat arbitrary, but generally we check if resources provided by both nodes
// are similar enough to likely be the same type of machine and if the set of labels
// is the same (except for a pre-defined set of labels like hostname or zone).
func IsNodeInfoSimilar(n1, n2 *schedulernodeinfo.NodeInfo) bool {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a thin wrapper around the existing function that provides the default set of ignored labels for all cloud providers.

@CecileRobertMichon
Copy link
Member Author

@MaciekPytel thanks for the suggestion. I've updated the PR, let me know what you think. I wasn't able to just reuse the implementation for GKE (although I did reuse the same pattern for overriding the node comparator) because the requirement here is slightly different: the GKE implementation allows two nodes to be "similar" if they have the same node pool label (regardless of their other characteristics). What we're trying to do here for Azure is to allow two node groups to be "similar" if even if their node pool label is _ different_, given that everything else matches. The reason for doing this is to allow for agent pools that have the same configuration to be considered as similar even if their "agentpool" label differs (since they are in different pools). @feiskyer thoughts?

Copy link
Member

@feiskyer feiskyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 9, 2019
@feiskyer
Copy link
Member

@MaciekPytel @losipiuk Could you help to take a look at this PR again?

@CecileRobertMichon CecileRobertMichon changed the title fix: ignore agentpool label when looking for similar node groups fix: ignore agentpool label when looking for similar node groups with Azure provider Jun 12, 2019
@CecileRobertMichon
Copy link
Member Author

@MaciekPytel @losipiuk please take a look.

@losipiuk
Copy link
Contributor

Makes sense but we cannot explicitly call Azure CP from main as we have build tags which allow building without Azure code at all. Could you think of way to extend CP interface to hide such CP specific customization behind it? Sth like CP.setupProcessors(originalProcessors). This would allow to replace/wrap/extend processors set as needed by CP.

@feiskyer
Copy link
Member

feiskyer commented Jul 8, 2019

#2171 is working on #2094 (comment), so close this and prefer #2171.

/close

@k8s-ci-robot
Copy link
Contributor

@feiskyer: Closed this PR.

In response to this:

#2171 is working on #2094 (comment), so close this and prefer #2171.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants