KEP-5325: HPA - Improve pod selection accuracy across workload types #5331

omerap12 · 2025-05-22T18:20:41Z

One-line PR description: HPA - Improve pod selection accuracy across workload types

Issue link: HPA: Improve pod selection accuracy across workload types #5325

/sig autoscaling

Signed-off-by: Omer Aplatony <[email protected]>

k8s-ci-robot · 2025-05-22T18:20:44Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

omerap12 · 2025-05-22T18:23:06Z

This is a work in progress - looking for feedback and opinions.

adrianmoisey

Some low hanging changes

keps/sig-autoscaling/5325-hpa-pod-selection-accuracy/kep.yaml

Co-authored-by: Adrian Moisey <[email protected]>

Signed-off-by: Omer Aplatony <[email protected]>

soltysh

I focused mostly on the design details, including the API surface. But once this gets opt-ed into the release I'll look carefully for the remaining bits, including the PRR portion of the doc.

soltysh · 2025-05-28T13:44:25Z

keps/sig-autoscaling/5325-hpa-pod-selection-accuracy/README.md

@@ -0,0 +1,879 @@
+<!--


Nit: you're missing keps/prod-readiness/sig-autoscaling/5325.yaml based of https://github.com/kubernetes/enhancements/blob/master/keps/prod-readiness/template/nnnn.yaml, feel free to add my name as the PRR approver for alpha.

Also, make sure to get a lead opt-in #5325 into 1.34 release.

soltysh · 2025-05-28T13:49:14Z

keps/sig-autoscaling/5325-hpa-pod-selection-accuracy/README.md

+
+## Summary
+
+The Horizontal Pod Autoscaler (HPA) has a critical limitation in its pod selection mechanism: it collects metrics from all pods that match the target workload's label selector, regardless of whether those pods are actually managed by the target workload. This can lead to incorrect scaling decisions when unrelated pods (such as Jobs, CronJobs, or other Deployments) happen to share the same labels.


Nit: please make sure to break lines, since this makes the reviews easier to read. This applies to entire document.

keps/sig-autoscaling/5325-hpa-pod-selection-accuracy/README.md

soltysh · 2025-05-28T14:02:30Z

keps/sig-autoscaling/5325-hpa-pod-selection-accuracy/README.md

+
+We propose adding a new field to the HPA specification called `strictPodSelection` that allows users to specify how pods should be selected for metric collection:
+* If set to true only pods that are actually owned by the target workload (through owner references) are being selected.
+* If not set or set to false - default behavior.


Based on https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md, specifically this part:

Think twice about bool fields. Many ideas start as boolean but eventually trend towards a small set of mutually exclusive options. Plan for future expansions by describing the policy options explicitly as a string type alias (e.g. TerminationMessagePolicy).

When we were introducing new mechanism to PDBs we went with UnhealthyPodEvictionPolicy. You'll notice that pattern used also heavily across apps/v1 types.

In your case that'd mean introducing something like PodSelectionType with two initial values Selector or Labels being the default, and backward compatible, and OwnerRefs. This will ensure you can easily extend the mechanism in the future.

TIL, thank you for the reference! :)

keps/sig-autoscaling/5325-hpa-pod-selection-accuracy/README.md

soltysh · 2025-05-28T14:05:53Z

keps/sig-autoscaling/5325-hpa-pod-selection-accuracy/README.md

+- a search in the Kubernetes bug triage tool (https://storage.googleapis.com/k8s-triage/index.html)
+-->
+
+- [test name](https://github.com/kubernetes/kubernetes/blob/2334b8469e1983c525c0c6382125710093a25883/test/integration/...): [integration master](https://testgrid.k8s.io/sig-release-master-blocking#integration-master?include-filter-by-regex=MyCoolFeature), [triage search](https://storage.googleapis.com/k8s-triage/index.html?test=MyCoolFeature)


You'll want to list integration tests you're planning to add.

keps/sig-autoscaling/5325-hpa-pod-selection-accuracy/README.md

Co-authored-by: Maciej Szulik <[email protected]>

Signed-off-by: Omer Aplatony <[email protected]>

keps/sig-autoscaling/5325-hpa-pod-selection-accuracy/README.md

adrianmoisey · 2025-05-30T18:44:26Z

keps/sig-autoscaling/5325-hpa-pod-selection-accuracy/README.md

+
+## Proposal
+
+We propose adding a new field to the HPA specification called `podSelectionStrategy` that allows users to specify how pods should be selected for metric collection:


Would selectionStrategy be a shorter more concise phrase for what we're trying to achieve?

keps/sig-autoscaling/5325-hpa-pod-selection-accuracy/README.md

keps/sig-autoscaling/5325-hpa-pod-selection-accuracy/kep.yaml

Co-authored-by: Adrian Moisey <[email protected]>

Signed-off-by: Omer Aplatony <[email protected]>

Co-authored-by: Adrian Moisey <[email protected]>

Signed-off-by: Omer Aplatony <[email protected]>

omerap12 · 2025-06-01T12:47:30Z

WIP here: kubernetes/kubernetes#132018

Signed-off-by: Omer Aplatony <[email protected]>

adrianmoisey · 2025-06-04T10:54:16Z

keps/sig-autoscaling/5325-hpa-pod-selection-accuracy/kep.yaml

+metrics:
+  - my_feature_metric


Suggested change

metrics:

- my_feature_metric

metrics: []

How necessary is it to make a metric? Taking a look at other similar features, I found an example that added a metric that would be the count of the number of times that the new API field is used. Is that necessary?

(I guess I'm asking @soltysh that question)

keps/sig-autoscaling/5325-hpa-pod-selection-accuracy/README.md

Co-authored-by: Adrian Moisey <[email protected]>

raywainman · 2025-06-05T14:49:48Z

keps/sig-autoscaling/5325-hpa-pod-selection-accuracy/README.md

+
+We propose adding a new field to the HPA specification called `selectionStrategy` that allows users to specify how pods should be selected for metric collection:
+* If set to `LabelSelector` (default): Uses the current behavior of selecting all pods that match the target workload's label selector.
+* If set to `OwnerReferences`: Only selects pods that are owned by the target workload through owner references.


super nit - perhaps OwnerReference is better here since HPA would only use one reference?

Yeah makes sense

Nitty counter point: Sometimes there are multiple hops to get from the Pod to the targetRef. So even though Pod -> ReplicaSet and ReplicaSet -> Deployment is 1 OwnerRef each, in total it's 2 OwnerRefs.

I'm personally not strongly opinionated on this and can go either direction.

raywainman · 2025-06-05T14:51:42Z

keps/sig-autoscaling/5325-hpa-pod-selection-accuracy/README.md

+  * `OwnerReferences`:
+    * Further filters the label-selected pods
+    * Only keeps pods that are owned by the target workload through owner references
+    * Follows the ownership chain (e.g., Deployment → ReplicaSet → Pods)


How high will it go? Will it stop when the owner matches the targetRef in the HPA?

I think the correct direction is down (since we start from the targetRef and go down to the pods it owns), so we just get the pods by looking at their ownerReference.

raywainman · 2025-06-05T14:57:10Z

keps/sig-autoscaling/5325-hpa-pod-selection-accuracy/README.md

+`OwnerReferencesFilter`:
+* Validates pod ownership through reference chain
+* Only includes pods that are owned by the target workload
+* Handles different workload types (Deployments, StatefulSets, etc.)


As it goes up the chain, the HPA Controller will need to fetch controllers, will we cache these too?

Otherwise you may be fetching the same controller objects every 15s for every pod :(

(In VPA we have a whole bunch of heuristics here too to reduce load on API Server: https://github.com/kubernetes/autoscaler/blob/9220470d9f5154eefc82f53f39de4efb76befe77/vertical-pod-autoscaler/pkg/target/controller_fetcher/controller_fetcher.go#L175)

Yeah I forgot about this. You're right - I originally implemented the fetcher to work from the top down, starting at the Deployment and traversing down to the Pod. I agree we should reverse this approach: start from the Pod and cache intermediate controllers along the way to avoid repeatedly fetching the same objects every 15s.

Currently we don't do it for every pod, but I agree it can be better (https://github.com/kubernetes/kubernetes/pull/132018/files#diff-1d52b688f694faeedda046d7d6fb75c3a43ad69511db4195c370ece15657a434R127-R164 )

Modified as follows: kubernetes/kubernetes@387fbbb

In my opinion, the cache felt a bit excessive. It does reduce some API calls, but is it really worth it?

raywainman · 2025-06-05T15:10:31Z

One additional comment here - do you know if there is other prior work anywhere else in Kubernetes for this problem?

If I understand correctly, this label overlapping use-case is actually quite prevalent everywhere throughout the ecosystem - Service is one that I can think about right off the top of my head.

If there is, it would be great to tackle that in a consistent way.

adrianmoisey · 2025-06-05T15:21:58Z

One additional comment here - do you know if there is other prior work anywhere else in Kubernetes for this problem?

If I understand correctly, this label overlapping use-case is actually quite prevalent everywhere throughout the ecosystem - Service is one that I can think about right off the top of my head.

If there is, it would be great to tackle that in a consistent way.

I don't believe this is a general problem.

The thing with the HPA is that it has a targetRef, and not a selector.

$ kubectl explain Horizontalpodautoscaler.spec.scaleTargetRef
GROUP:      autoscaling
KIND:       HorizontalPodAutoscaler
VERSION:    v2

FIELD: scaleTargetRef <CrossVersionObjectReference>


DESCRIPTION:
    scaleTargetRef points to the target resource to scale, and is used to the
    pods for which metrics should be collected, as well as to actually change
    the replica count.
    CrossVersionObjectReference contains enough information to let you identify
    the referred resource.

FIELDS:
  apiVersion	<string>
    apiVersion is the API version of the referent

  kind	<string> -required-
    kind is the kind of the referent; More info:
    https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds

  name	<string> -required-
    name is the name of the referent; More info:
    https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names

As a user I read this as "The HPA will use the target's Pods and Metrics to scale my workload". But it's actually saying "The HPA will scale this workload, and it will take the label selector from this workload to find Pods/Metrics".

A Service is different, it just takes in a selector:

$ kubectl explain Service.spec.selector
KIND:       Service
VERSION:    v1

FIELD: selector <map[string]string>


DESCRIPTION:
    Route service traffic to pods with label keys and values matching this
    selector. If empty or not present, the service is assumed to have an
    external process managing its endpoints, which Kubernetes will not modify.
    Only applies to types ClusterIP, NodePort, and LoadBalancer. Ignored if type
    is ExternalName. More info:
    https://kubernetes.io/docs/concepts/services-networking/service/

So as a user I'm more likely to worry about overlapping workloads with the Service than I am with an HPA.

Signed-off-by: Omer Aplatony <[email protected]>

5325-hpa-pod-selection-accuracy

bb3f849

Signed-off-by: Omer Aplatony <[email protected]>

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 22, 2025

k8s-ci-robot requested review from gjtempleton and jackfrancis May 22, 2025 18:20

k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 22, 2025

omerap12 changed the title ~~5325-hpa-pod-selection-accuracy~~ KEP-5325: HPA pod selection accuracy May 22, 2025

adrianmoisey suggested changes May 22, 2025

View reviewed changes

omerap12 and others added 8 commits May 23, 2025 12:03

update KEP number

fec53e1

Co-authored-by: Adrian Moisey <[email protected]>

other sigs are participating

40132d9

Co-authored-by: Adrian Moisey <[email protected]>

update creation date

99c0097

Co-authored-by: Adrian Moisey <[email protected]>

empty see-also and replaces

ec7e2bc

Co-authored-by: Adrian Moisey <[email protected]>

alpha stage

930cef1

Co-authored-by: Adrian Moisey <[email protected]>

update milestone section

c1f1d29

Co-authored-by: Adrian Moisey <[email protected]>

update status

adebda0

Co-authored-by: Adrian Moisey <[email protected]>

Add sections

6b9b6c6

Signed-off-by: Omer Aplatony <[email protected]>

omerap12 changed the title ~~KEP-5325: HPA pod selection accuracy~~ KEP-5325: HPA - Improve pod selection accuracy across workload types May 23, 2025

update title

4b582b6

Signed-off-by: Omer Aplatony <[email protected]>

soltysh reviewed May 28, 2025

View reviewed changes

omerap12 and others added 3 commits May 28, 2025 20:24

Update keps/sig-autoscaling/5325-hpa-pod-selection-accuracy/README.md

b517592

Co-authored-by: Maciej Szulik <[email protected]>

Update keps/sig-autoscaling/5325-hpa-pod-selection-accuracy/README.md

651e9f3

Co-authored-by: Maciej Szulik <[email protected]>

Update keps/sig-autoscaling/5325-hpa-pod-selection-accuracy/README.md

bac2750

Co-authored-by: Maciej Szulik <[email protected]>

omerap12 mentioned this pull request May 29, 2025

Decouple Startup CPU Boost from VPA modes kubernetes/autoscaler#8175

Open

omerap12 added 3 commits May 29, 2025 06:25

add prod-readiness doc

54c5073

Signed-off-by: Omer Aplatony <[email protected]>

removed notes

d52b76f

Signed-off-by: Omer Aplatony <[email protected]>

Add pod filter system

dfcb564

Signed-off-by: Omer Aplatony <[email protected]>

k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 29, 2025

adrianmoisey reviewed May 30, 2025

View reviewed changes

keps/sig-autoscaling/5325-hpa-pod-selection-accuracy/README.md Outdated Show resolved Hide resolved

adrianmoisey reviewed May 30, 2025

View reviewed changes

keps/sig-autoscaling/5325-hpa-pod-selection-accuracy/README.md Outdated Show resolved Hide resolved

adrianmoisey reviewed May 30, 2025

View reviewed changes

keps/sig-autoscaling/5325-hpa-pod-selection-accuracy/kep.yaml Outdated Show resolved Hide resolved

omerap12 and others added 9 commits May 31, 2025 13:49

Update keps/sig-autoscaling/5325-hpa-pod-selection-accuracy/README.md

053c17c

Co-authored-by: Adrian Moisey <[email protected]>

Update keps/sig-autoscaling/5325-hpa-pod-selection-accuracy/README.md

79ff748

Co-authored-by: Adrian Moisey <[email protected]>

Update keps/sig-autoscaling/5325-hpa-pod-selection-accuracy/kep.yaml

99964ee

Co-authored-by: Adrian Moisey <[email protected]>

change to selectionStrategy

cfbaacf

Signed-off-by: Omer Aplatony <[email protected]>

Update KEP

85e6041

Signed-off-by: Omer Aplatony <[email protected]>

fixed KEP

7173050

Signed-off-by: Omer Aplatony <[email protected]>

Update keps/sig-autoscaling/5325-hpa-pod-selection-accuracy/README.md

9304449

Co-authored-by: Adrian Moisey <[email protected]>

Add e2e tests

dfc8c7f

Signed-off-by: Omer Aplatony <[email protected]>

Update KEP

6ec9540

Signed-off-by: Omer Aplatony <[email protected]>

omerap12 marked this pull request as ready for review June 1, 2025 12:40

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 1, 2025

k8s-ci-robot requested review from jeremyrickard and raywainman June 1, 2025 12:40

run make update-toc

5c1e29c

Signed-off-by: Omer Aplatony <[email protected]>

omerap12 added 2 commits June 1, 2025 13:10

fixed feature gate name

dd870be

Signed-off-by: Omer Aplatony <[email protected]>

s/HPAselectionStrategy/HPASelectionStrategy/

85c7fce

Signed-off-by: Omer Aplatony <[email protected]>

adrianmoisey reviewed Jun 4, 2025

View reviewed changes

keps/sig-autoscaling/5325-hpa-pod-selection-accuracy/README.md Outdated Show resolved Hide resolved

Update README.md

269179c

Co-authored-by: Adrian Moisey <[email protected]>

raywainman reviewed Jun 5, 2025

View reviewed changes

Update KEP

dd71384

Signed-off-by: Omer Aplatony <[email protected]>


		## Summary

		The Horizontal Pod Autoscaler (HPA) has a critical limitation in its pod selection mechanism: it collects metrics from all pods that match the target workload's label selector, regardless of whether those pods are actually managed by the target workload. This can lead to incorrect scaling decisions when unrelated pods (such as Jobs, CronJobs, or other Deployments) happen to share the same labels.


		## Proposal

		We propose adding a new field to the HPA specification called `podSelectionStrategy` that allows users to specify how pods should be selected for metric collection:

KEP-5325: HPA - Improve pod selection accuracy across workload types #5331

Are you sure you want to change the base?

KEP-5325: HPA - Improve pod selection accuracy across workload types #5331

Conversation

omerap12 commented May 22, 2025

Uh oh!

k8s-ci-robot commented May 22, 2025

Uh oh!

omerap12 commented May 22, 2025

Uh oh!

adrianmoisey left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

soltysh left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

omerap12 commented Jun 1, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

raywainman commented Jun 5, 2025

Uh oh!

adrianmoisey commented Jun 5, 2025

Uh oh!

Uh oh!