-
Notifications
You must be signed in to change notification settings - Fork 1.5k
KEP-5325: HPA - Improve pod selection accuracy across workload types #5331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Omer Aplatony <[email protected]>
Skipping CI for Draft Pull Request. |
This is a work in progress - looking for feedback and opinions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some low hanging changes
Co-authored-by: Adrian Moisey <[email protected]>
Co-authored-by: Adrian Moisey <[email protected]>
Co-authored-by: Adrian Moisey <[email protected]>
Co-authored-by: Adrian Moisey <[email protected]>
Co-authored-by: Adrian Moisey <[email protected]>
Co-authored-by: Adrian Moisey <[email protected]>
Co-authored-by: Adrian Moisey <[email protected]>
Signed-off-by: Omer Aplatony <[email protected]>
Signed-off-by: Omer Aplatony <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I focused mostly on the design details, including the API surface. But once this gets opt-ed into the release I'll look carefully for the remaining bits, including the PRR portion of the doc.
@@ -0,0 +1,879 @@ | |||
<!-- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: you're missing keps/prod-readiness/sig-autoscaling/5325.yaml
based of https://github.com/kubernetes/enhancements/blob/master/keps/prod-readiness/template/nnnn.yaml, feel free to add my name as the PRR approver for alpha.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, make sure to get a lead opt-in #5325 into 1.34 release.
|
||
## Summary | ||
|
||
The Horizontal Pod Autoscaler (HPA) has a critical limitation in its pod selection mechanism: it collects metrics from all pods that match the target workload's label selector, regardless of whether those pods are actually managed by the target workload. This can lead to incorrect scaling decisions when unrelated pods (such as Jobs, CronJobs, or other Deployments) happen to share the same labels. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: please make sure to break lines, since this makes the reviews easier to read. This applies to entire document.
|
||
We propose adding a new field to the HPA specification called `strictPodSelection` that allows users to specify how pods should be selected for metric collection: | ||
* If set to true only pods that are actually owned by the target workload (through owner references) are being selected. | ||
* If not set or set to false - default behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md, specifically this part:
Think twice about bool fields. Many ideas start as boolean but eventually trend towards
a small set of mutually exclusive options. Plan for future expansions by describing the
policy options explicitly as a string type alias (e.g. TerminationMessagePolicy).
When we were introducing new mechanism to PDBs we went with UnhealthyPodEvictionPolicy. You'll notice that pattern used also heavily across apps/v1 types.
In your case that'd mean introducing something like PodSelectionType
with two initial values Selector
or Labels
being the default, and backward compatible, and OwnerRefs
. This will ensure you can easily extend the mechanism in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL, thank you for the reference! :)
- a search in the Kubernetes bug triage tool (https://storage.googleapis.com/k8s-triage/index.html) | ||
--> | ||
|
||
- [test name](https://github.com/kubernetes/kubernetes/blob/2334b8469e1983c525c0c6382125710093a25883/test/integration/...): [integration master](https://testgrid.k8s.io/sig-release-master-blocking#integration-master?include-filter-by-regex=MyCoolFeature), [triage search](https://storage.googleapis.com/k8s-triage/index.html?test=MyCoolFeature) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll want to list integration tests you're planning to add.
Co-authored-by: Maciej Szulik <[email protected]>
Co-authored-by: Maciej Szulik <[email protected]>
Co-authored-by: Maciej Szulik <[email protected]>
Signed-off-by: Omer Aplatony <[email protected]>
Signed-off-by: Omer Aplatony <[email protected]>
Signed-off-by: Omer Aplatony <[email protected]>
|
||
## Proposal | ||
|
||
We propose adding a new field to the HPA specification called `podSelectionStrategy` that allows users to specify how pods should be selected for metric collection: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would selectionStrategy
be a shorter more concise phrase for what we're trying to achieve?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree.
Co-authored-by: Adrian Moisey <[email protected]>
Co-authored-by: Adrian Moisey <[email protected]>
Co-authored-by: Adrian Moisey <[email protected]>
Signed-off-by: Omer Aplatony <[email protected]>
Signed-off-by: Omer Aplatony <[email protected]>
Signed-off-by: Omer Aplatony <[email protected]>
Co-authored-by: Adrian Moisey <[email protected]>
Signed-off-by: Omer Aplatony <[email protected]>
Signed-off-by: Omer Aplatony <[email protected]>
Signed-off-by: Omer Aplatony <[email protected]>
WIP here: kubernetes/kubernetes#132018 |
Signed-off-by: Omer Aplatony <[email protected]>
Signed-off-by: Omer Aplatony <[email protected]>
metrics: | ||
- my_feature_metric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metrics: | |
- my_feature_metric | |
metrics: [] |
How necessary is it to make a metric? Taking a look at other similar features, I found an example that added a metric that would be the count of the number of times that the new API field is used. Is that necessary?
(I guess I'm asking @soltysh that question)
Co-authored-by: Adrian Moisey <[email protected]>
|
||
We propose adding a new field to the HPA specification called `selectionStrategy` that allows users to specify how pods should be selected for metric collection: | ||
* If set to `LabelSelector` (default): Uses the current behavior of selecting all pods that match the target workload's label selector. | ||
* If set to `OwnerReferences`: Only selects pods that are owned by the target workload through owner references. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit - perhaps OwnerReference
is better here since HPA would only use one reference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitty counter point: Sometimes there are multiple hops to get from the Pod to the targetRef. So even though Pod -> ReplicaSet and ReplicaSet -> Deployment is 1 OwnerRef each, in total it's 2 OwnerRefs.
I'm personally not strongly opinionated on this and can go either direction.
* `OwnerReferences`: | ||
* Further filters the label-selected pods | ||
* Only keeps pods that are owned by the target workload through owner references | ||
* Follows the ownership chain (e.g., Deployment → ReplicaSet → Pods) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How high will it go? Will it stop when the owner matches the targetRef
in the HPA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the correct direction is down (since we start from the targetRef
and go down to the pods it owns), so we just get the pods by looking at their ownerReference
.
`OwnerReferencesFilter`: | ||
* Validates pod ownership through reference chain | ||
* Only includes pods that are owned by the target workload | ||
* Handles different workload types (Deployments, StatefulSets, etc.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it goes up the chain, the HPA Controller will need to fetch controllers, will we cache these too?
Otherwise you may be fetching the same controller objects every 15s for every pod :(
(In VPA we have a whole bunch of heuristics here too to reduce load on API Server: https://github.com/kubernetes/autoscaler/blob/9220470d9f5154eefc82f53f39de4efb76befe77/vertical-pod-autoscaler/pkg/target/controller_fetcher/controller_fetcher.go#L175)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I forgot about this. You're right - I originally implemented the fetcher to work from the top down, starting at the Deployment and traversing down to the Pod. I agree we should reverse this approach: start from the Pod and cache intermediate controllers along the way to avoid repeatedly fetching the same objects every 15s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we don't do it for every pod, but I agree it can be better (https://github.com/kubernetes/kubernetes/pull/132018/files#diff-1d52b688f694faeedda046d7d6fb75c3a43ad69511db4195c370ece15657a434R127-R164 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified as follows: kubernetes/kubernetes@387fbbb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, the cache felt a bit excessive. It does reduce some API calls, but is it really worth it?
One additional comment here - do you know if there is other prior work anywhere else in Kubernetes for this problem? If I understand correctly, this label overlapping use-case is actually quite prevalent everywhere throughout the ecosystem - If there is, it would be great to tackle that in a consistent way. |
I don't believe this is a general problem. The thing with the HPA is that it has a targetRef, and not a selector.
As a user I read this as "The HPA will use the target's Pods and Metrics to scale my workload". But it's actually saying "The HPA will scale this workload, and it will take the label selector from this workload to find Pods/Metrics". A Service is different, it just takes in a selector:
So as a user I'm more likely to worry about overlapping workloads with the Service than I am with an HPA. |
Signed-off-by: Omer Aplatony <[email protected]>
/sig autoscaling