Skip to content

Conversation

yalosev
Copy link
Contributor

@yalosev yalosev commented Jun 20, 2025

What type of PR is this?

/kind bug

What this PR does / why we need it:

VPA controllers initialize a few informers, but the informers in targetSelectorFetcher and controllerFetcher are duplicated. If we start them one by one, it leads to a warning message because the caches are already synchronized.
We can use the factory (it was made for this purpose) that was used to create these informers to start them and wait for all caches to sync.
Then caches will be populated without any warning messages.

I0621 01:27:41.171294   84656 reflector.go:430] "Caches populated" type="*v1.CronJob" reflector="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:285"
I0621 01:27:41.171294   84656 reflector.go:430] "Caches populated" type="*v1.Job" reflector="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:285"
I0621 01:27:41.171385   84656 reflector.go:430] "Caches populated" type="*v1.LimitRange" reflector="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:285"
I0621 01:27:41.266254   84656 reflector.go:430] "Caches populated" type="*v1.StatefulSet" reflector="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:285"
I0621 01:27:41.267590   84656 reflector.go:430] "Caches populated" type="*v1.ReplicationController" reflector="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:285"
I0621 01:27:41.268791   84656 reflector.go:430] "Caches populated" type="*v1.DaemonSet" reflector="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:285"
I0621 01:27:41.270992   84656 reflector.go:430] "Caches populated" type="*v1.ReplicaSet" reflector="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:285"
I0621 01:27:41.273464   84656 reflector.go:430] "Caches populated" type="*v1.Deployment" reflector="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:285"

Potentially, we could start the factory elsewhere, for example inside the RunOnce function of the updater.
However, that would move it too far from the initialization logic, which might lead to bugs.
So I decided to place the factory start function closer to the NewSharedInformerFactory call, right after all informers for this factory are initialized.

Which issue(s) this PR fixes:

Fixes #8256

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/vertical-pod-autoscaler needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 20, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @yalosev. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jun 20, 2025
@yalosev yalosev marked this pull request as draft June 20, 2025 21:39
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 20, 2025
@yalosev yalosev marked this pull request as ready for review June 20, 2025 22:10
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 20, 2025
@k8s-ci-robot k8s-ci-robot requested a review from jbartosik June 20, 2025 22:10
@adrianmoisey
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 21, 2025
@adrianmoisey
Copy link
Member

Thank you!
/lgtm
/assign @omerap12

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 22, 2025
Copy link
Member

@omerap12 omerap12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this. some comments from me

Comment on lines 124 to 125
if !synced {
klog.V(0).InfoS("Initial sync failed", "kind", informerType)
Copy link
Member

@omerap12 omerap12 Jun 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When informer sync fails, it means that the controller doesn't have a complete view of the cluster state which could lead to incorrect behavior. So we should:
Use error level logging instead of info such as

klog.ErrorS(nil, "Could not sync cache for "+string(kind))

and exit when sync fails to prevent operating with incomplete state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true for all components code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree with you. Fixed.
It was just a copypaste from the current code.
I thought it was weird, but maybe there was some idea behind it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah not sure why some components were treated this issue differently. anyway, look good now :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have commented before this was merged, but this does change the behaviour of the app.
If someone had previously restricted the VPA to only resources they care about (for example, may be they denied it access to CronJobs) it (I think) wouldn't have failed to start, but now it will.
Is that enough of a concern to maintain the backwards compatibility?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting, right, it seems fine to leave as-is then.

defer close(stopCh)
factory.Start(stopCh)
informerMap := factory.WaitForCacheSync(stopCh)
for informerType, synced := range informerMap {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For better readability, let's keep consistent variable naming:

for kind, informer := range informersMap {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true for all components code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: Yuriy Losev <[email protected]>
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 23, 2025
@yalosev yalosev requested a review from omerap12 June 23, 2025 07:09
Copy link
Member

@omerap12 omerap12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!
/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 23, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: omerap12, yalosev

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 23, 2025
@omerap12
Copy link
Member

/label tide/merge-method-squash

@k8s-ci-robot k8s-ci-robot merged commit dd40212 into kubernetes:master Jun 23, 2025
7 checks passed
@k8s-ci-robot k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jun 23, 2025
@yalosev yalosev deleted the fix/factory-start branch June 23, 2025 11:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/vertical-pod-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[VPA] SharedIndexInformer in the updater started multiple times

4 participants