🐛 Set recommended leaderelection settings #1663

thetechnick · 2025-01-29T15:01:45Z

Description

Extensive e2e tests revealed that operator-controller and catalogd might run into leader election timeouts during cluster bootstrap, causing sporadic alerts being generated.

This commit uses recommended settings for leaderelection:

LeaseDuration: 15s -> 137s
RenewDeadline: 10s -> 107s
RetryPeriod:    2s ->  26s

Warning: This will increase potential down-time of our components to 163s in the worst case (up from 17s). (LeaseDuration + RetryPeriod)

Reviewer Checklist

API Go Documentation
Tests: Unit Tests (and E2E Tests, if appropriate)
Comprehensive Commit Messages
Links to related GitHub Issue(s)

netlify · 2025-01-29T15:02:02Z

✅ Deploy Preview for olmv1 ready!

Name	Link
🔨 Latest commit	`92cece5`
🔍 Latest deploy log	https://app.netlify.com/sites/olmv1/deploys/679a79cecd73790008a545f2
😎 Deploy Preview	https://deploy-preview-1663--olmv1.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

thetechnick · 2025-01-29T15:02:16Z

Cross-ref catalogd PR:
operator-framework/catalogd#519

codecov · 2025-01-29T15:16:54Z

Codecov Report

Attention: Patch coverage is 46.66667% with 8 lines in your changes missing coverage. Please review.

Project coverage is 67.37%. Comparing base (158d974) to head (92cece5).
Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
catalogd/cmd/catalogd/main.go	0.00%	8 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1663      +/-   ##
==========================================
- Coverage   67.42%   67.37%   -0.05%     
==========================================
  Files          55       55              
  Lines        4632     4644      +12     
==========================================
+ Hits         3123     3129       +6     
- Misses       1284     1290       +6     
  Partials      225      225

Flag	Coverage Δ
e2e	`53.29% <100.00%> (-0.01%)`	⬇️
unit	`54.30% <0.00%> (-0.15%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

camilamacedo86 · 2025-01-29T15:16:53Z

cmd/operator-controller/main.go

+		// https://github.com/openshift/enhancements/blob/61581dcd985130357d6e4b0e72b87ee35394bf6e/CONVENTIONS.md#handling-kube-apiserver-disruption
+		LeaseDuration: ptr.To(137 * time.Second),
+		RenewDeadline: ptr.To(107 * time.Second),
+		RetryPeriod:   ptr.To(26 * time.Second),


🎉

I think we need to do the same for catalogd : https://github.com/operator-framework/operator-controller/blob/main/catalogd/cmd/catalogd/main.go

camilamacedo86

I am OK with those changes 👍

/lgtm

LalatenduMohanty · 2025-01-29T17:02:53Z

@thetechnick Looks like more than 5000+ files changes for the PR, seems like a rebase gone wrong or something similar.

camilamacedo86 · 2025-01-29T17:13:01Z

Oh @thetechnick

I think you just overview.
This one is for upstream we should not add the vendor
The vendor is only for downstream,
Sorry :-(

azych · 2025-01-29T17:46:26Z

catalogd/cmd/catalogd/main.go

@@ -42,6 +42,7 @@ import (
 	_ "k8s.io/client-go/plugin/pkg/client/auth"
 	"k8s.io/klog/v2"
 	"k8s.io/klog/v2/textlogger"
+	"k8s.io/utils/ptr"


would it make sense to try and avoid pulling in a new dependency just for a small helper that returns a pointer to value?

This isn't a new dependency:

operator-controller/go.mod

Line 38 in f055efc

k8s.io/utils v0.0.0-20241210054802-24370beab758

that 5000k lines vendor tricked me :)

joelanford · 2025-01-29T18:00:11Z

@thetechnick can you remove vendor? (we don't vendor upstream)

tmshort · 2025-01-29T18:21:18Z

+1 on removing vendor

Extensive e2e tests revealed that operator-controller might run into leader election timeouts during cluster bootstrap, causing sporadic alerts being generated. This commit uses recommended settings for leaderelection LeaseDuration: 15s -> 137s RenewDeadline: 10s -> 107s RetryPeriod: 2s -> 26s Warning: This will increase potential down-time of catalogd to 163s in the worst case (up from 17s). (LeaseDuration + RetryPeriod)

camilamacedo86

/lgtm

LalatenduMohanty

/lgtm

thetechnick requested a review from a team as a code owner January 29, 2025 15:01

camilamacedo86 reviewed Jan 29, 2025

View reviewed changes

camilamacedo86 mentioned this pull request Jan 29, 2025

🐛 Set recommended leaderelection settings operator-framework/catalogd#519

Closed

thetechnick force-pushed the recommended-leaderelection-settings branch from 0618e88 to 63e2507 Compare January 29, 2025 16:17

thetechnick mentioned this pull request Jan 29, 2025

OCPBUGS-48765: leaderelection settings openshift/operator-framework-operator-controller#249

Merged

camilamacedo86 previously approved these changes Jan 29, 2025

View reviewed changes

openshift-ci bot assigned camilamacedo86 Jan 29, 2025

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 29, 2025

thetechnick added the acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. label Jan 29, 2025

thetechnick dismissed camilamacedo86’s stale review via 3983002 January 29, 2025 16:57

thetechnick force-pushed the recommended-leaderelection-settings branch from 63e2507 to 3983002 Compare January 29, 2025 16:57

openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jan 29, 2025

erdii mentioned this pull request Jan 29, 2025

fix: Set recommended leaderelection settings package-operator/package-operator#1730

Merged

4 tasks

azych reviewed Jan 29, 2025

View reviewed changes

tmshort force-pushed the recommended-leaderelection-settings branch from 3983002 to 92cece5 Compare January 29, 2025 18:56

camilamacedo86 approved these changes Jan 29, 2025

View reviewed changes

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 29, 2025

LalatenduMohanty approved these changes Jan 29, 2025

View reviewed changes

openshift-ci bot assigned LalatenduMohanty Jan 29, 2025

azych approved these changes Jan 29, 2025

View reviewed changes

tmshort added this pull request to the merge queue Jan 29, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 29, 2025

camilamacedo86 added this pull request to the merge queue Jan 29, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 29, 2025

dtfranz added this pull request to the merge queue Jan 29, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 29, 2025

dtfranz added this pull request to the merge queue Jan 30, 2025

Merged via the queue into operator-framework:main with commit 037b9e2 Jan 30, 2025
21 of 23 checks passed

joelanford mentioned this pull request Jan 31, 2025

Release leader election lease on shutdown #1687

Closed

🐛 Set recommended leaderelection settings #1663

🐛 Set recommended leaderelection settings #1663

Uh oh!

Conversation

thetechnick commented Jan 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Reviewer Checklist

Uh oh!

netlify bot commented Jan 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for olmv1 ready!

Uh oh!

thetechnick commented Jan 29, 2025

Uh oh!

codecov bot commented Jan 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

camilamacedo86 Jan 29, 2025

Choose a reason for hiding this comment

Uh oh!

thetechnick Jan 29, 2025

Choose a reason for hiding this comment

Uh oh!

camilamacedo86 left a comment

Choose a reason for hiding this comment

Uh oh!

LalatenduMohanty commented Jan 29, 2025

Uh oh!

camilamacedo86 commented Jan 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

azych Jan 29, 2025

Choose a reason for hiding this comment

Uh oh!

joelanford Jan 29, 2025

Choose a reason for hiding this comment

Uh oh!

azych Jan 29, 2025

Choose a reason for hiding this comment

Uh oh!

joelanford commented Jan 29, 2025

Uh oh!

tmshort commented Jan 29, 2025

Uh oh!

camilamacedo86 left a comment

Choose a reason for hiding this comment

Uh oh!

LalatenduMohanty left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thetechnick commented Jan 29, 2025 •

edited

Loading

netlify bot commented Jan 29, 2025 •

edited

Loading

codecov bot commented Jan 29, 2025 •

edited

Loading

camilamacedo86 commented Jan 29, 2025 •

edited

Loading