-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Sanitize taints before scheduling DSs on template node infos #5659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@BigDarkClown: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
if err != nil { | ||
return false, "", err | ||
} | ||
sanitizedNodeInfo, err := utils.SanitizeNodeInfo(nodeInfo, id, ignoredTaints, unwantedTaints) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't really a part of your changes, but with the extra SanitizeNode call it's just more visible - why is the above logic different from the GetNodeInfoFromTemplate()? Doesn't it do the exact same thing just with different nodeInfo as a source?
I understand this may not be in the scope of this PR, so feel free not to do it, but maybe it would be worth a small refactor to unify those codepaths?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a valid point, but this is not 100% true. We treat DS pods slightly differently depending on where we get template from. For existing nodes we take DS pods scheduled them and fill with missing DS pods if the option --force-ds
is enabled. For instance template ones we always pack all the DS pods.
I think long term solution here is to set --force-ds
to true always and unify both these logic pieces. But for now I'm not sure if it won't obfuscate the different underlying behaviour.
5bce9f2
to
aac5184
Compare
/assign |
MaxPodEvictionTime time.Duration | ||
// IgnoredTaints is a list of taints to ignore when considering a node template for scheduling. | ||
IgnoredTaints []string | ||
// StatusTaints is a list of taints to remove when creating a node template for scheduling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT about: StatusTaints is a list of taints CA considers to reflect transient node status that should be removed when creating a node template for scheduling. Both IgnoredTaints and StatusTaints are expected to be temporary, the only difference is that IgnoredTaints are expected to appear during node startup.
?
I'd like the comment to better explain the "why" we have this, rather than the "how" it is going to be used.
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: BigDarkClown, x13n The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind bug
/kind cleanup
What this PR does / why we need it:
NodeInfoProvider returns node infos representing each node group for scale-up. If there is at least one node in the node group, then it is used as a template. If there are none, the node info is based on the node group instance template.
In both these cases we want to pack the node infos with all the expected DaemonSet pods. That way CA won't perform a scale-up which leaves some DSs pending. All DS pods should be added to the node info if it is generated from instance template or if it is generated from existing node and the
force-ds
flag is enabled.The issue is that the node sanitisation, which includes cleaning unwanted taints, is performed before the DS are added. That means that it is possible to not include all of the DS pods in the node info if for example:
ignore taint
.deletion taint
.This change fixes it by performing additional sanitisation before trying to schedule DS pods. It also add additional
unwantedTaints
option, which enables defining taints which are removed from templates but not treated as ignore taints.