Skip to content

🐛 Set recommended leaderelection settings #1663

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

thetechnick
Copy link
Contributor

@thetechnick thetechnick commented Jan 29, 2025

Description

Extensive e2e tests revealed that operator-controller and catalogd might run into leader election timeouts during cluster bootstrap, causing sporadic alerts being generated.

This commit uses recommended settings for leaderelection:

LeaseDuration: 15s -> 137s
RenewDeadline: 10s -> 107s
RetryPeriod:    2s ->  26s

Warning: This will increase potential down-time of our components to 163s in the worst case (up from 17s). (LeaseDuration + RetryPeriod)

Reviewer Checklist

  • API Go Documentation
  • Tests: Unit Tests (and E2E Tests, if appropriate)
  • Comprehensive Commit Messages
  • Links to related GitHub Issue(s)

@thetechnick thetechnick requested a review from a team as a code owner January 29, 2025 15:01
Copy link

netlify bot commented Jan 29, 2025

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit 92cece5
🔍 Latest deploy log https://app.netlify.com/sites/olmv1/deploys/679a79cecd73790008a545f2
😎 Deploy Preview https://deploy-preview-1663--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@thetechnick
Copy link
Contributor Author

Cross-ref catalogd PR:
operator-framework/catalogd#519

Copy link

codecov bot commented Jan 29, 2025

Codecov Report

Attention: Patch coverage is 46.66667% with 8 lines in your changes missing coverage. Please review.

Project coverage is 67.37%. Comparing base (158d974) to head (92cece5).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
catalogd/cmd/catalogd/main.go 0.00% 8 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1663      +/-   ##
==========================================
- Coverage   67.42%   67.37%   -0.05%     
==========================================
  Files          55       55              
  Lines        4632     4644      +12     
==========================================
+ Hits         3123     3129       +6     
- Misses       1284     1290       +6     
  Partials      225      225              
Flag Coverage Δ
e2e 53.29% <100.00%> (-0.01%) ⬇️
unit 54.30% <0.00%> (-0.15%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

// https://github.com/openshift/enhancements/blob/61581dcd985130357d6e4b0e72b87ee35394bf6e/CONVENTIONS.md#handling-kube-apiserver-disruption
LeaseDuration: ptr.To(137 * time.Second),
RenewDeadline: ptr.To(107 * time.Second),
RetryPeriod: ptr.To(26 * time.Second),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done :)

camilamacedo86
camilamacedo86 previously approved these changes Jan 29, 2025
Copy link
Contributor

@camilamacedo86 camilamacedo86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am OK with those changes 👍

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 29, 2025
@thetechnick thetechnick added the acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. label Jan 29, 2025
@thetechnick thetechnick force-pushed the recommended-leaderelection-settings branch from 63e2507 to 3983002 Compare January 29, 2025 16:57
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jan 29, 2025
@LalatenduMohanty
Copy link
Member

@thetechnick Looks like more than 5000+ files changes for the PR, seems like a rebase gone wrong or something similar.

@camilamacedo86
Copy link
Contributor

camilamacedo86 commented Jan 29, 2025

Oh @thetechnick

I think you just overview.
This one is for upstream we should not add the vendor
The vendor is only for downstream,
Sorry :-(

@@ -42,6 +42,7 @@ import (
_ "k8s.io/client-go/plugin/pkg/client/auth"
"k8s.io/klog/v2"
"k8s.io/klog/v2/textlogger"
"k8s.io/utils/ptr"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to try and avoid pulling in a new dependency just for a small helper that returns a pointer to value?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a new dependency:

k8s.io/utils v0.0.0-20241210054802-24370beab758

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that 5000k lines vendor tricked me :)

@joelanford
Copy link
Member

@thetechnick can you remove vendor? (we don't vendor upstream)

@tmshort
Copy link
Contributor

tmshort commented Jan 29, 2025

+1 on removing vendor

Extensive e2e tests revealed that operator-controller might run into
leader election timeouts during cluster bootstrap, causing sporadic
alerts being generated.

This commit uses recommended settings for leaderelection
LeaseDuration: 15s -> 137s
RenewDeadline: 10s -> 107s
RetryPeriod:    2s ->  26s

Warning: This will increase potential down-time of catalogd to 163s in
the worst case (up from 17s). (LeaseDuration + RetryPeriod)
@tmshort tmshort force-pushed the recommended-leaderelection-settings branch from 3983002 to 92cece5 Compare January 29, 2025 18:56
Copy link
Contributor

@camilamacedo86 camilamacedo86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 29, 2025
Copy link
Member

@LalatenduMohanty LalatenduMohanty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@tmshort tmshort added this pull request to the merge queue Jan 29, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 29, 2025
@camilamacedo86 camilamacedo86 added this pull request to the merge queue Jan 29, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 29, 2025
@dtfranz dtfranz added this pull request to the merge queue Jan 29, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 29, 2025
@dtfranz dtfranz added this pull request to the merge queue Jan 30, 2025
Merged via the queue into operator-framework:main with commit 037b9e2 Jan 30, 2025
21 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants