Skip to content

Urgent Bug: webhook timeouts occur whenever node is rebooted #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
AceHack opened this issue Apr 9, 2020 · 8 comments
Closed

Urgent Bug: webhook timeouts occur whenever node is rebooted #12

AceHack opened this issue Apr 9, 2020 · 8 comments
Assignees

Comments

@AceHack
Copy link
Contributor

AceHack commented Apr 9, 2020

Describe the bug
Any webhook that uses http2 causes issues in EKS (and other Kubernetes distributions) when a node is rebooted. This is because of a known issue. See references:

kubernetes/kubernetes#80313
istio/istio#13840 (comment)
golang/go#31643

Here is a short rundown of the problem:

  • Golang has a http2 bug where it does not detect "bad" connections
  • Kubernetes has that same bug since it uses golang
  • When you create a Kubernetes webhook that uses http2 this bug can happen
  • When the webhook pod dies because a Kubernetes node dies the problem occurs and our cluster has issues for about 15 minutes
  • Since we run on spot instances this problem happens almost every time a spot instance dies (Which is often)
  • Amazon claims this will be fixed in EKS 1.17, we are on EKS 1.15
  • From past experience it will be likely 6 months before amazon gets EKS 1.17 out

Expected behavior
Node reboots not to affect webhook behavior.

To Reproduce
Install any http2 webhook like the ones from knative.
Reboot the node the webhook pod is running on.

Knative release version
0.13.x

Additional context
This issue is extremely urgent as it's causing us constant issues.

Please provide a way to disable http2 and only use http1 in the webhooks. Istio does this with the following:

apiVersion: install.istio.io/v1alpha2
kind: IstioControlPlane
spec:
  autoInjection:
    components:
      injector:
        k8s:
          env:
            - name: GODEBUG
              value: http2server=0
@evankanderson
Copy link
Member

It looks like it should be possible to update the Knative webhook pods to set that environment variable, which is part of the Go HTTP libray:

https://golang.org/pkg/net/http/

Starting with Go 1.6, the http package has transparent support for the HTTP/2 protocol when using HTTPS. Programs that must disable HTTP/2 can do so by setting Transport.TLSNextProto (for clients) or Server.TLSNextProto (for servers) to a non-nil, empty map. Alternatively, the following GODEBUG environment variables are currently supported:

GODEBUG=http2client=0  # disable HTTP/2 client support
GODEBUG=http2server=0  # disable HTTP/2 server support
GODEBUG=http2debug=1   # enable verbose HTTP/2 debug logs
GODEBUG=http2debug=2   # ... even more verbose, with frame dumps

The GODEBUG variables are not covered by Go's API compatibility promise. Please report any issues before disabling HTTP/2 support: https://golang.org/s/http2bug

@tcnghia
Copy link

tcnghia commented Apr 9, 2020

We can fix this at HEAD and cherry-pick to 0.13 as well.

@tcnghia
Copy link

tcnghia commented Apr 9, 2020

@evankanderson I'm going to assign this to myself, unless you are already working on it? Feel free to reassign!

/assign

@tcnghia
Copy link

tcnghia commented Apr 9, 2020

It looks like the environment variable has been supported for a while, probably all you need to do is to change the webhook deployment to set that environment variable.

@tcnghia
Copy link

tcnghia commented Apr 9, 2020

@AceHack can you please try adding the following to your webhook deployment to fix your urgent issue with node restart?

        - name: METRICS_DOMAIN
          value: knative.dev/serving

        # PATCH to disable http2 on webhook
        - name: GO_DEBUG
          value: http2server=0

@AceHack
Copy link
Contributor Author

AceHack commented Apr 16, 2020

I'm not sure how to get the operator to set these environment variables. I don't see anywhere to set this in the CRDs.

@AceHack
Copy link
Contributor Author

AceHack commented Apr 21, 2020

I was able to fix it with the following commands:

kubectl set env -n knative-serving deployment/webhook GODEBUG=http2server=0
kubectl set env -n knative-serving deployment/istio-webhook GODEBUG=http2server=0

kubectl set env -n knative-eventing deployment/eventing-webhook GODEBUG=http2server=0

The operator does not overwrite these values

@AceHack AceHack closed this as completed Apr 21, 2020
@nak3
Copy link
Contributor

nak3 commented Apr 27, 2020

Hi @tcnghia sorry for bother you, but will you fix this at HEAD and cherry-pick to 0.13, 0.14?

matzew pushed a commit to matzew/knative-operator that referenced this issue Jun 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants