AUTOSCALE-335,AUTOSCALE-336: 1.34.0 upstream rebase #386

joelsmith · 2025-10-02T21:37:15Z

Started out with this command:

rebasebot --source https://github.com/kubernetes/autoscaler:cluster-autoscaler-release-1.34 \
          --dest openshift/kubernetes-autoscaler:main \
          --rebase joelsmith/autoscaler:rebase-bot-main \
          --tag-policy=strict \
          --github-user-token <(yaml2json ~/.config/hub | jq -r '."github.com"[0].oauth_token') \
          --dry-run

After it was done, I removed the cherry-picks and manually cherry-picked the set of patches it identified so that I could manually resolve merge conflicts.

I squashed "Remove OWNERS automation preamble" into "configure repository for OpenShift releases"

Most of the cherry-picks required minimal or no changes, but "Fix unstructured taint parsing in Cluster API provider" required substantial changes due to the upstream PR kubernetes#8536 which refactored a lot of the cluster API provider's test framework.

fix(VPA): Do not update webhook CA when registerWebhook is disabled

Signed-off-by: Yuriy Losev <[email protected]>

[VPA] Use factory start to fill caches instead of separate informers

…t-success OCI provider: Avoid interpreting HTTP 404 as success on delete

…-cloud-endpoint-reloving fix bug 8168 GetEndpoint resolving fail

…e-terminate-by-default feat: cordon node before terminate by default

this change adds debug logs at level 5 to aid in triaging failed node balancing. It adds logs to help determine why two node groups are not considered as similar. These logs can be quite noisy so the logging level has been set to 5 by default.

AEP-7862: Decouple Startup CPU Boost from VPA modes - updates

* add h4d pricing * fix go fmt * revert gofmt on other files

cluster-autoscaler: add logging for failed node balancing

./hack/update-deps.sh v1.34.0-alpha.1 v1.34.0-alpha.1 https://github.com/kubernetes/kubernetes.git

Following kubernetes#7195

hack/update-codegen.sh

This reverts commit 897989f.

As discussed in sig-autoscaling meeting on 2025-06-30, this is to try follow a similar pattern to the KEP process by getting a tech lead's buy in before merging an AEP.

… values in real apis not necessary

CA: remove azure UT cases

…s-approvers-for-aeps Give sig-autoscaling-leads approval of the AEP directory

chore: bump golangci lint to v2

…ode-groups-from-balancing Filter out non-existing node-groups before scale-up balancing

Signed-off-by: bo.jiang <[email protected]>

Fix capacity buffers injector order in pod list processor

… registered nodes

…test-in-docker` `make test-in-docker` was changed to disable the printf analyzer, but `make test-unit` wasn't for some reason. The current master isn't compatible with the printf analyzer, so `make test-unit` fails on master without this change.

…erry-pick-8552-to-cluster-autoscaler-release-1.34 [cluster-autoscaler-release-1.34] Allow atomic scale down of partially healthy node groups

TestNodeLoadFromExistingTaints creates a currentTime variable set to time.Now(), and a bunch of test objects with time values offset from that variable. This is all standard practice, but then the test iterates over test cases, calls t.Parallel(), and overwrites currentTime with time.Now() again. This makes go test -race fail, because multiple goroutines are writing currentTime at once. It also doesn't seem to make sense in the context of the test, because the other test objects are still offset from the original value. Removing the second write to currentTime seems to be the correct fix here. Also renamed one import because it collided with a local variable name used throughout this test file.

…erry-pick-8584-to-cluster-autoscaler-release-1.34 [cluster-autoscaler-release-1.34] Change `make test-unit` to have the same go test parameters as `make test-in-docker`

…erry-pick-8588-to-cluster-autoscaler-release-1.34 [cluster-autoscaler-release-1.34] Fix a race condition in TestNodeLoadFromExistingTaints

The DRA scheduler plugin is enabled by default since 1.34. We have to hack it to be disabled if the CA DRA logic is disabled via the flag. Without this, the DRA scheduler plugin is enabled but not set up properly, and panics.

…erry-pick-8598-to-cluster-autoscaler-release-1.34 [cluster-autoscaler-release-1.34] Fix DRA enablement logic

openshift-ci-robot · 2025-10-02T21:37:20Z

@joelsmith: This pull request references AUTOSCALE-335 which is a valid jira issue.

In response to this:

Started out with this command:
rebasebot --source https://github.com/kubernetes/autoscaler:cluster-autoscaler-release-1.34 \
         --dest openshift/kubernetes-autoscaler:main \
         --rebase joelsmith/autoscaler:rebase-bot-main \
         --tag-policy=strict \
         --github-user-token <(yaml2json ~/.config/hub | jq -r '."github.com"[0].oauth_token') \
         --dry-run
After it was done, I removed the cherry-picks and manually cherry-picked the set of patches it identified so that I could manually resolve merge conflicts.

I squashed "Remove OWNERS automation preamble" into "configure repository for OpenShift releases"

Most of the cherry-picks required minimal or no changes, but "Fix unstructured taint parsing in Cluster API provider" required substantial changes due to the upstream PR kubernetes#8536 which refactored a lot of the cluster API test framework.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2025-10-02T21:38:32Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign maxcao13 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

This change carries files and modifications that are used by OpenShift release infrastructure and related files. * spec file * dockerfiles * vertical-pod-autoscaler/Dockerfile.rhel * vertical-pod-autoscaler/Dockerfile.openshift * images/cluster-autoscaler/Dockerfile * images/cluster-autoscaler/Dockerfile.rhel * hack scripts (ci and build related) * Makefile * JUnit tools * update gitignore * update/remove OWNERS files * ci-operator config yaml * remove gitignore file from vertical-pod-autoscaler (allow vendor addition) * add Snyk file to exclude vendor directories and problematic cloud providers on scan

Add vendor folders * cluster-autoscaler * balancer * vertical-pod-autoscaler * vertical-pod-autoscaler/e2e for i in cluster-autoscaler balancer vertical-pod-autoscaler vertical-pod-autoscaler/e2e; do pushd $i; go mod tidy; go mod vendor; popd; done

…otation The delete annotation upstream has a different format, but is now inferred dynamically from the API group. If we update this in MAO to use the new format, we can drop this old key

This change re-adds the machine api support for labels and taints on node groups. The code was removed upstream as it is openshift specific, see this pull request[0]. It also adds in the functionality of the upstream override annotation for labels and taints[1] to support https://issues.redhat.com/browse/MIXEDARCH-259 [0]: kubernetes#5249 [1]: kubernetes#5382

the upstream annotations for the scale from zero capacity resources is slighty different than the openshift implementation. the largest difference is the addition of a gpu type annotation. openshift does not yet utilize this annotation and thus this patch should be carried until the machineset controllers for the various providers on openshift have been modified to use the new annotations. another important change is the modification of the memory annotation. previously in openshift we expected this value to be a count of memory in Mebibytes. the conversion function and tests have been modified to allow continued openshift operation. this change can be dropped when the annotations in openshift have been updated, the progress for this effort can be followed at https://issues.redhat.com/browse/OCPCLOUD-944

openshift-ci-robot · 2025-10-02T21:53:39Z

@joelsmith: This pull request references AUTOSCALE-335 which is a valid jira issue.

In response to this:

Started out with this command:
rebasebot --source https://github.com/kubernetes/autoscaler:cluster-autoscaler-release-1.34 \
         --dest openshift/kubernetes-autoscaler:main \
         --rebase joelsmith/autoscaler:rebase-bot-main \
         --tag-policy=strict \
         --github-user-token <(yaml2json ~/.config/hub | jq -r '."github.com"[0].oauth_token') \
         --dry-run
After it was done, I removed the cherry-picks and manually cherry-picked the set of patches it identified so that I could manually resolve merge conflicts.

I squashed "Remove OWNERS automation preamble" into "configure repository for OpenShift releases"

Most of the cherry-picks required minimal or no changes, but "Fix unstructured taint parsing in Cluster API provider" required substantial changes due to the upstream PR kubernetes#8536 which refactored a lot of the cluster API provider's test framework.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

…ider This change corrects the behavior for parsing taints from the unstructured scalable resource. This is required on OpenShift as our implementation is slightly different from the upstream.

Also: * Add unit tests for upstream annotations * Update unit tests using upstream annotations new values

openshift-ci-robot · 2025-10-03T05:18:44Z

@joelsmith: This pull request references AUTOSCALE-335 which is a valid jira issue.

This pull request references AUTOSCALE-336 which is a valid jira issue.

In response to this:

Started out with this command:
rebasebot --source https://github.com/kubernetes/autoscaler:cluster-autoscaler-release-1.34 \
         --dest openshift/kubernetes-autoscaler:main \
         --rebase joelsmith/autoscaler:rebase-bot-main \
         --tag-policy=strict \
         --github-user-token <(yaml2json ~/.config/hub | jq -r '."github.com"[0].oauth_token') \
         --dry-run
After it was done, I removed the cherry-picks and manually cherry-picked the set of patches it identified so that I could manually resolve merge conflicts.

I squashed "Remove OWNERS automation preamble" into "configure repository for OpenShift releases"

Most of the cherry-picks required minimal or no changes, but "Fix unstructured taint parsing in Cluster API provider" required substantial changes due to the upstream PR kubernetes#8536 which refactored a lot of the cluster API provider's test framework.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2025-10-03T08:04:59Z

@joelsmith: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/okd-scos-e2e-aws-ovn	`d67a63f`	link	false	`/test okd-scos-e2e-aws-ovn`
ci/prow/unit	`d67a63f`	link	true	`/test unit`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

vflaux and others added 30 commits June 20, 2025 16:04

do not update webhook CA when registerWebhook is disabled

19b6295

Merge pull request kubernetes#8249 from vflaux/fix_8248

ffd18e3

fix(VPA): Do not update webhook CA when registerWebhook is disabled

Use factory start to fill caches instead of separate informers

a3fb18b

Signed-off-by: Yuriy Losev <[email protected]>

Add factory start to the test

fb1dddb

Signed-off-by: Yuriy Losev <[email protected]>

Error and exit on failure

1332499

Signed-off-by: Yuriy Losev <[email protected]>

Merge pull request kubernetes#8259 from yalosev/fix/factory-start

dd40212

[VPA] Use factory start to fill caches instead of separate informers

Merge pull request kubernetes#8201 from jlamillan/jlamillan/404-is-no…

b167235

…t-success OCI provider: Avoid interpreting HTTP 404 as success on delete

Merge pull request kubernetes#8169 from maximrub/fix/bug-8168-alibaba…

c509bb2

…-cloud-endpoint-reloving fix bug 8168 GetEndpoint resolving fail

Merge pull request kubernetes#8183 from MenD32/feat/cordon-node-befor…

31caf5b

…e-terminate-by-default feat: cordon node before terminate by default

add logging for failed node balancing

771b9ee

this change adds debug logs at level 5 to aid in triaging failed node balancing. It adds logs to help determine why two node groups are not considered as similar. These logs can be quite noisy so the logging level has been set to 5 by default.

Decouple Startup CPU Boost from VPA modes

ecb4297

azure: Add volumeattachments read to ClusterRole for examples

c942ff3

azure: Make it easier to compare examples

20a59a9

Merge pull request kubernetes#8175 from laoj2/aep-7862-fixes-to-api

af75d6e

AEP-7862: Decouple Startup CPU Boost from VPA modes - updates

add h4d pricing (kubernetes#8205)

8e0d47c

* add h4d pricing * fix go fmt * revert gofmt on other files

Merge pull request kubernetes#8266 from elmiko/add-more-balance-logging

77e3f57

cluster-autoscaler: add logging for failed node balancing

Export fake pods definition to a dedicated module

2814dca

update-deps.sh

d5c1e15

./hack/update-deps.sh v1.34.0-alpha.1 v1.34.0-alpha.1 https://github.com/kubernetes/kubernetes.git

tmp: make apis/ a package

897989f

Following kubernetes#7195

update-codegen.sh

353b446

hack/update-codegen.sh

Revert "tmp: make apis/ a package"

4560f69

This reverts commit 897989f.

Give sig-autoscaling-leads approval of the AEP directory

5149494

As discussed in sig-autoscaling meeting on 2025-06-30, this is to try follow a similar pattern to the KEP process by getting a tech lead's buy in before merging an AEP.

removing UT cases where changes in azure api may affect results, fake…

7a1e49a

… values in real apis not necessary

Merge pull request kubernetes#8280 from MaximilianoUribe/master

4f177c9

CA: remove azure UT cases

Merge pull request kubernetes#8277 from adrianmoisey/add-tech-leads-a…

5b25b56

…s-approvers-for-aeps Give sig-autoscaling-leads approval of the AEP directory

Merge pull request kubernetes#8203 from jklaw90/julian/golangci-lint-v2

ffe6219

chore: bump golangci lint to v2

Filterout non-existing node-groups before scale-up balancing

792fba7

Merge pull request kubernetes#8289 from pmendelski/exclude-injected-n…

ce81a6a

…ode-groups-from-balancing Filter out non-existing node-groups before scale-up balancing

Revert filter out non-existing node-groups before scale-up balancing

7912e2d

ci: Add Dependabot for GitHub Actions and update action versions

2ae7495

Signed-off-by: bo.jiang <[email protected]>

k8s-ci-robot and others added 10 commits September 26, 2025 06:12

Merge pull request kubernetes#8578 from BigDarkClown/ca-1.34-fix

dd28ada

Fix capacity buffers injector order in pod list processor

Allow atomic scale down if number of candidates is equal to number of…

364fa5e

… registered nodes

Merge pull request kubernetes#8589 from k8s-infra-cherrypick-robot/ch…

3fbdbfb

…erry-pick-8552-to-cluster-autoscaler-release-1.34 [cluster-autoscaler-release-1.34] Allow atomic scale down of partially healthy node groups

Merge pull request kubernetes#8593 from k8s-infra-cherrypick-robot/ch…

148327d

…erry-pick-8584-to-cluster-autoscaler-release-1.34 [cluster-autoscaler-release-1.34] Change `make test-unit` to have the same go test parameters as `make test-in-docker`

Merge pull request kubernetes#8594 from k8s-infra-cherrypick-robot/ch…

ae7925c

…erry-pick-8588-to-cluster-autoscaler-release-1.34 [cluster-autoscaler-release-1.34] Fix a race condition in TestNodeLoadFromExistingTaints

Fix DRA enablement logic

776d80f

The DRA scheduler plugin is enabled by default since 1.34. We have to hack it to be disabled if the CA DRA logic is disabled via the flag. Without this, the DRA scheduler plugin is enabled but not set up properly, and panics.

Merge pull request kubernetes#8599 from k8s-infra-cherrypick-robot/ch…

f4bec36

…erry-pick-8598-to-cluster-autoscaler-release-1.34 [cluster-autoscaler-release-1.34] Fix DRA enablement logic

merge upstream/cluster-autoscaler-release-1.34 into main

938bf71

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 2, 2025

openshift-ci bot requested review from elmiko and maxcao13 October 2, 2025 21:38

maxcao13 mentioned this pull request Oct 2, 2025

WIP: AUTOSCALE-335: AUTOSCALE-336: rebase on upstream 1.34.0 release #385

Closed

joelsmith and others added 7 commits October 2, 2025 15:46

UPSTREAM: <carry>: Rename FailureMessage to ErrorMessage

160e9d4

UPSTREAM: <carry>: Handle old Machine API specific machine delete ann…

0c765a5

…otation The delete annotation upstream has a different format, but is now inferred dynamically from the API group. If we update this in MAO to use the new format, we can drop this old key

UPSTREAM: <carry>: Have VPA ignore phantom containers named "POD"

eced936

JoelSpeed and others added 2 commits October 2, 2025 23:11

UPSTREAM: <carry>: Fix unstructured taint parsing in Cluster API prov…

0969b8d

…ider This change corrects the behavior for parsing taints from the unstructured scalable resource. This is required on OpenShift as our implementation is slightly different from the upstream.

UPSTREAM: <carry>: Update to prefer upstream annotations if present

d67a63f

Also: * Add unit tests for upstream annotations * Update unit tests using upstream annotations new values

joelsmith force-pushed the rebase branch from 015a319 to d67a63f Compare October 3, 2025 05:12

joelsmith changed the title ~~AUTOSCALE-335: AUTOSCALE-336: 1.34.0 upstream rebase~~ AUTOSCALE-335,AUTOSCALE-336: 1.34.0 upstream rebase Oct 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AUTOSCALE-335,AUTOSCALE-336: 1.34.0 upstream rebase #386

AUTOSCALE-335,AUTOSCALE-336: 1.34.0 upstream rebase #386

Uh oh!

joelsmith commented Oct 2, 2025 •

edited

Loading

Uh oh!

openshift-ci-robot commented Oct 2, 2025 •

edited by openshift-ci bot

Loading

Uh oh!

openshift-ci bot commented Oct 2, 2025

Uh oh!

openshift-ci-robot commented Oct 2, 2025 •

edited by openshift-ci bot

Loading

Uh oh!

openshift-ci-robot commented Oct 3, 2025 •

edited by openshift-ci bot

Loading

Uh oh!

openshift-ci bot commented Oct 3, 2025

Uh oh!

Uh oh!

AUTOSCALE-335,AUTOSCALE-336: 1.34.0 upstream rebase #386

Are you sure you want to change the base?

AUTOSCALE-335,AUTOSCALE-336: 1.34.0 upstream rebase #386

Uh oh!

Conversation

joelsmith commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Oct 2, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci bot commented Oct 2, 2025

Uh oh!

openshift-ci-robot commented Oct 2, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Oct 3, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci bot commented Oct 3, 2025

Uh oh!

Uh oh!

joelsmith commented Oct 2, 2025 •

edited

Loading

openshift-ci-robot commented Oct 2, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Oct 2, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Oct 3, 2025 •

edited by openshift-ci bot

Loading