fix: haproxy: drain connections when disabling endpoints #668

christf · 2025-08-26T16:32:05Z

sending set-server maint will stop sending traffic to endpoints, which will cause traffic to be dropped. This instructs haproxy to gracefully drain an endpoint while sending new connections to other ready endpoints

see [1] for further information on the difference between drain and maint

[1] https://www.haproxy.com/documentation/haproxy-configuration-manual/new/latest/management/#section-9.3.-set-server

sending set-server maint will stop sending traffic to endpoints, which will cause traffic to be dropped. This instructs haproxy to gracefully drain an endpoint while sending new connections to other ready endpoints see [1] for further information on the difference between drain and maint [1] https://www.haproxy.com/documentation/haproxy-configuration-manual/new/latest/management/#section-9.3.-set-server

openshift-ci · 2025-08-26T16:33:22Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign frobware for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2025-08-26T16:33:25Z

Hi @christf. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

alebedev87 · 2025-08-27T12:05:18Z

pkg/router/template/configmanager/haproxy/backend.go

 func (b *Backend) DisableServer(name string) error {
 	log.V(4).Info("disabling server with maint state", "server", name)
-	return b.UpdateServerState(name, BackendServerStateMaint)
+	return b.UpdateServerState(name, BackendServerStateDrain)


Currently set server state is not used on any shipped OpenShift product. It's part of the Dynamic Configuration Manager feature which is still in TechPreview. 2) router watches for endpoints and react to changes, DisableServer is used for deleted endpoints. That is, corresponding pods are not there anymore, so the server should be disabled.

Thank you for the feedback! I am aware of (1) and I am raising this PR to eventually be able to use this tech preview feature.

Can we dig a bit into (2) please?
As per my understanding, DisableServer is being run, when kube-proxy is notified that an endpoint is to be removed. As per kubernetes/kubernetes#106476, the notification to remove an endpoint happens around at the same time as the pod is being asked to terminate. So the pods are still very much ready to serve requests and they need to continue to do so until they have handled all in-flight requests. During this time the router must ensure no new requests are being sent into these pods while still retaining the active connections to those pods that are about to be terminated.
If "maint" is used, all in-flight connections are being broken. "drain" will keep them alive until they are being closed by either end of the connection (either clients are done, or the pod gets SIGKILLED which is governed by a timeout already)
The goal of this change is to support rolling deployments without losing a single request.

There is another bit missing to make it perfect, which is finding a way to delay the SIGTERM to the pod until the endpoint has been drained. But that is another can of worms.

The reasoning about the second point seems to be valid. Let me try to check our test coverage for this use case.

How to progress this?

alebedev87 · 2025-09-17T14:35:12Z

/assign

candita · 2025-09-24T14:53:31Z

/label ok-to-test

candita · 2025-09-24T14:53:59Z

/ok-to-test

openshift-ci · 2025-09-24T14:55:56Z

@candita: Can not set label ok-to-test: Must be member in one of these teams: [openshift-patch-managers openshift-staff-engineers openshift-release-oversight openshift-sustaining-engineers]

In response to this:

/label ok-to-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci · 2025-09-26T23:16:55Z

@christf: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-metal-ipi-ovn-router	`1723e77`	link	false	`/test e2e-metal-ipi-ovn-router`
ci/prow/e2e-metal-ipi-ovn-ipv6	`1723e77`	link	false	`/test e2e-metal-ipi-ovn-ipv6`
ci/prow/e2e-metal-ipi-ovn-dualstack	`1723e77`	link	false	`/test e2e-metal-ipi-ovn-dualstack`
ci/prow/okd-scos-e2e-aws-ovn	`1723e77`	link	false	`/test okd-scos-e2e-aws-ovn`
ci/prow/e2e-aws-serial	`1723e77`	link	true	`/test e2e-aws-serial`
ci/prow/e2e-agnostic	`1723e77`	link	true	`/test e2e-agnostic`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci bot requested review from alebedev87 and knobunc August 26, 2025 16:33

openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 26, 2025

alebedev87 reviewed Aug 27, 2025

View reviewed changes

openshift-ci bot assigned alebedev87 Sep 17, 2025

openshift-ci bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: haproxy: drain connections when disabling endpoints #668

fix: haproxy: drain connections when disabling endpoints #668

Uh oh!

christf commented Aug 26, 2025

Uh oh!

openshift-ci bot commented Aug 26, 2025

Uh oh!

openshift-ci bot commented Aug 26, 2025

Uh oh!

alebedev87 Aug 27, 2025

Uh oh!

christf Aug 27, 2025 •

edited

Loading

Uh oh!

alebedev87 Aug 28, 2025

Uh oh!

christf Sep 9, 2025

Uh oh!

alebedev87 commented Sep 17, 2025

Uh oh!

candita commented Sep 24, 2025

Uh oh!

candita commented Sep 24, 2025

Uh oh!

openshift-ci bot commented Sep 24, 2025

Uh oh!

openshift-ci bot commented Sep 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: haproxy: drain connections when disabling endpoints #668

Are you sure you want to change the base?

fix: haproxy: drain connections when disabling endpoints #668

Uh oh!

Conversation

christf commented Aug 26, 2025

Uh oh!

openshift-ci bot commented Aug 26, 2025

Uh oh!

openshift-ci bot commented Aug 26, 2025

Uh oh!

alebedev87 Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

christf Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alebedev87 Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

christf Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

alebedev87 commented Sep 17, 2025

Uh oh!

candita commented Sep 24, 2025

Uh oh!

candita commented Sep 24, 2025

Uh oh!

openshift-ci bot commented Sep 24, 2025

Uh oh!

openshift-ci bot commented Sep 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

christf Aug 27, 2025 •

edited

Loading