Skip to content

Conversation

christf
Copy link

@christf christf commented Aug 26, 2025

sending set-server maint will stop sending traffic to endpoints, which will cause traffic to be dropped. This instructs haproxy to gracefully drain an endpoint while sending new connections to other ready endpoints

see [1] for further information on the difference between drain and maint

[1] https://www.haproxy.com/documentation/haproxy-configuration-manual/new/latest/management/#section-9.3.-set-server

sending set-server maint will stop sending traffic to endpoints, which
will cause traffic to be dropped. This instructs haproxy to gracefully
drain an endpoint while sending new connections to other ready endpoints

see [1] for further information on the difference  between drain and maint

[1] https://www.haproxy.com/documentation/haproxy-configuration-manual/new/latest/management/#section-9.3.-set-server
@openshift-ci openshift-ci bot requested review from alebedev87 and knobunc August 26, 2025 16:33
Copy link
Contributor

openshift-ci bot commented Aug 26, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign frobware for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 26, 2025
Copy link
Contributor

openshift-ci bot commented Aug 26, 2025

Hi @christf. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

func (b *Backend) DisableServer(name string) error {
log.V(4).Info("disabling server with maint state", "server", name)
return b.UpdateServerState(name, BackendServerStateMaint)
return b.UpdateServerState(name, BackendServerStateDrain)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Currently set server state is not used on any shipped OpenShift product. It's part of the Dynamic Configuration Manager feature which is still in TechPreview. 2) router watches for endpoints and react to changes, DisableServer is used for deleted endpoints. That is, corresponding pods are not there anymore, so the server should be disabled.

Copy link
Author

@christf christf Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the feedback! I am aware of (1) and I am raising this PR to eventually be able to use this tech preview feature.

Can we dig a bit into (2) please?
As per my understanding, DisableServer is being run, when kube-proxy is notified that an endpoint is to be removed. As per kubernetes/kubernetes#106476, the notification to remove an endpoint happens around at the same time as the pod is being asked to terminate. So the pods are still very much ready to serve requests and they need to continue to do so until they have handled all in-flight requests. During this time the router must ensure no new requests are being sent into these pods while still retaining the active connections to those pods that are about to be terminated.
If "maint" is used, all in-flight connections are being broken. "drain" will keep them alive until they are being closed by either end of the connection (either clients are done, or the pod gets SIGKILLED which is governed by a timeout already)
The goal of this change is to support rolling deployments without losing a single request.

There is another bit missing to make it perfect, which is finding a way to delay the SIGTERM to the pod until the endpoint has been drained. But that is another can of worms.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reasoning about the second point seems to be valid. Let me try to check our test coverage for this use case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to progress this?

@alebedev87
Copy link
Contributor

/assign

@candita
Copy link
Contributor

candita commented Sep 24, 2025

/label ok-to-test

@candita
Copy link
Contributor

candita commented Sep 24, 2025

/ok-to-test

@openshift-ci openshift-ci bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 24, 2025
Copy link
Contributor

openshift-ci bot commented Sep 24, 2025

@candita: Can not set label ok-to-test: Must be member in one of these teams: [openshift-patch-managers openshift-staff-engineers openshift-release-oversight openshift-sustaining-engineers]

In response to this:

/label ok-to-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Contributor

openshift-ci bot commented Sep 26, 2025

@christf: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-metal-ipi-ovn-router 1723e77 link false /test e2e-metal-ipi-ovn-router
ci/prow/e2e-metal-ipi-ovn-ipv6 1723e77 link false /test e2e-metal-ipi-ovn-ipv6
ci/prow/e2e-metal-ipi-ovn-dualstack 1723e77 link false /test e2e-metal-ipi-ovn-dualstack
ci/prow/okd-scos-e2e-aws-ovn 1723e77 link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-aws-serial 1723e77 link true /test e2e-aws-serial
ci/prow/e2e-agnostic 1723e77 link true /test e2e-agnostic

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ok-to-test Indicates a non-member PR verified by an org member that is safe to test.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants