Description
What version of Go are you using (go version
)?
go version go1.15.4 linux/amd64
Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (go env
)?
go env
Output
GO111MODULE="" GOARCH="amd64" GOBIN="" GOCACHE="/home/dionysius/.cache/go-build" GOENV="/home/dionysius/.config/go/env" GOEXE="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="linux" GOINSECURE="" GOMODCACHE="/home/dionysius/go/pkg/mod" GONOPROXY="" GONOSUMDB="" GOOS="linux" GOPATH="/home/dionysius/go" GOPRIVATE="" GOPROXY="https://proxy.golang.org,direct" GOROOT="/usr/lib/go" GOSUMDB="sum.golang.org" GOTMPDIR="" GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64" GCCGO="gccgo" AR="ar" CC="gcc" CXX="g++" CGO_ENABLED="1" GOMOD="" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build977072123=/tmp/go-build -gno-record-gcc-switches"
What did you do?
So, let's begin
High-Level Problem Description
I'm creating reverse proxy with Caddy Server and my own plugin. I use http/2 and Server Push. Sometimes requests hang forever. Here is screenshot from Chrome DevTools:
Low-Level Problem Description
So, I started to debug this situation. I found that my code execution stuck at (http.responseWriter).Write()
, which is an instance of http2responseWriter.
With help of pprof
I found that lockup happens in two functions: http2serverConn.writeHeaders and http2serverConn.writeDataFromHandler - endless waiting of data from done
channel.
Here is an illustration from pprof
:
Next I built go
from source with adding some debug messages and start to dive deeper.
I found a problem with frames are sent to output. At this line N
frames were pushed: https://github.com/golang/go/blob/go1.15.4/src/net/http/h2_bundle.go#L4692. After push
-function scheduleFrameWrite
-function is called. I watched into it and found that it often exit here: https://github.com/golang/go/blob/go1.15.4/src/net/http/h2_bundle.go#L4817. And only M
(M
< N
) frames were popped from queue here: https://github.com/golang/go/blob/go1.15.4/src/net/http/h2_bundle.go#L4837
Pushed Frames
2020/11/12 12:12:35 push writeFrame for streamID=0 size=0 2020/11/12 12:12:35 push writeFrame for streamID=119 size=0 2020/11/12 12:12:35 push writeFrame for streamID=26 size=0 2020/11/12 12:12:35 push writeFrame for streamID=119 size=0 2020/11/12 12:12:35 push writeFrame for streamID=26 size=15169 2020/11/12 12:12:35 push writeFrame for streamID=28 size=0 2020/11/12 12:12:35 push writeFrame for streamID=119 size=0 2020/11/12 12:12:35 push writeFrame for streamID=26 size=0 2020/11/12 12:12:35 push writeFrame for streamID=28 size=5566 2020/11/12 12:12:35 push writeFrame for streamID=30 size=0 2020/11/12 12:12:35 push writeFrame for streamID=119 size=0 2020/11/12 12:12:35 push writeFrame for streamID=28 size=0 2020/11/12 12:12:35 push writeFrame for streamID=30 size=77162 2020/11/12 12:12:35 push writeFrame for streamID=32 size=0 2020/11/12 12:12:35 push writeFrame for streamID=119 size=0 2020/11/12 12:12:35 push writeFrame for streamID=32 size=354 2020/11/12 12:12:35 push writeFrame for streamID=30 size=0 2020/11/12 12:12:35 push writeFrame for streamID=119 size=0 2020/11/12 12:12:35 push writeFrame for streamID=34 size=0 2020/11/12 12:12:35 push writeFrame for streamID=119 size=0 2020/11/12 12:12:35 push writeFrame for streamID=34 size=483 2020/11/12 12:12:35 push writeFrame for streamID=119 size=0 2020/11/12 12:12:35 push writeFrame for streamID=38 size=0 2020/11/12 12:12:35 push writeFrame for streamID=36 size=0 2020/11/12 12:12:35 push writeFrame for streamID=38 size=27485 2020/11/12 12:12:35 push writeFrame for streamID=40 size=0 2020/11/12 12:12:35 push writeFrame for streamID=119 size=0 2020/11/12 12:12:35 push writeFrame for streamID=36 size=6726 2020/11/12 12:12:35 push writeFrame for streamID=40 size=6293 2020/11/12 12:12:35 push writeFrame for streamID=119 size=0 2020/11/12 12:12:35 push writeFrame for streamID=42 size=0 2020/11/12 12:12:35 push writeFrame for streamID=119 size=0 2020/11/12 12:12:35 push writeFrame for streamID=119 size=0 2020/11/12 12:12:35 push writeFrame for streamID=44 size=0 2020/11/12 12:12:35 push writeFrame for streamID=42 size=11249 2020/11/12 12:12:35 push writeFrame for streamID=46 size=0 2020/11/12 12:12:35 push writeFrame for streamID=40 size=0 2020/11/12 12:12:35 push writeFrame for streamID=48 size=0 2020/11/12 12:12:35 push writeFrame for streamID=119 size=0 2020/11/12 12:12:35 push writeFrame for streamID=44 size=9293 2020/11/12 12:12:35 push writeFrame for streamID=46 size=2626 2020/11/12 12:12:35 push writeFrame for streamID=48 size=346 2020/11/12 12:12:35 push writeFrame for streamID=119 size=6381 2020/11/12 12:12:35 push writeFrame for streamID=42 size=0 2020/11/12 12:12:35 push writeFrame for streamID=44 size=0 2020/11/12 12:12:35 push writeFrame for streamID=36 size=0 2020/11/12 12:12:35 push writeFrame for streamID=119 size=0 2020/11/12 12:12:35 push writeFrame for streamID=38 size=0 2020/11/12 12:12:36 push writeFrame for streamID=121 size=0 2020/11/12 12:12:36 push writeFrame for streamID=121 size=1021 2020/11/12 12:12:36 push writeFrame for streamID=123 size=0 2020/11/12 12:12:36 push writeFrame for streamID=125 size=0 2020/11/12 12:12:36 push writeFrame for streamID=127 size=0 2020/11/12 12:12:36 push writeFrame for streamID=129 size=0 2020/11/12 12:12:36 push writeFrame for streamID=131 size=0 2020/11/12 12:12:36 push writeFrame for streamID=133 size=0 2020/11/12 12:12:36 push writeFrame for streamID=135 size=0 2020/11/12 12:12:36 push writeFrame for streamID=137 size=0 2020/11/12 12:12:36 push writeFrame for streamID=139 size=0 2020/11/12 12:12:36 push writeFrame for streamID=141 size=0 2020/11/12 12:12:36 push writeFrame for streamID=143 size=0 2020/11/12 12:12:36 push writeFrame for streamID=145 size=0 2020/11/12 12:12:36 push writeFrame for streamID=147 size=0 2020/11/12 12:12:36 push writeFrame for streamID=149 size=0 2020/11/12 12:12:36 push writeFrame for streamID=151 size=0 2020/11/12 12:12:36 push writeFrame for streamID=153 size=0 2020/11/12 12:12:36 push writeFrame for streamID=155 size=0 2020/11/12 12:12:36 push writeFrame for streamID=157 size=0 2020/11/12 12:12:36 push writeFrame for streamID=159 size=0 2020/11/12 12:12:36 push writeFrame for streamID=161 size=0 2020/11/12 12:12:36 push writeFrame for streamID=163 size=0 2020/11/12 12:12:36 push writeFrame for streamID=165 size=0 2020/11/12 12:12:36 push writeFrame for streamID=167 size=0 2020/11/12 12:12:36 push writeFrame for streamID=169 size=0 2020/11/12 12:12:36 push writeFrame for streamID=171 size=0 2020/11/12 12:12:36 push writeFrame for streamID=173 size=0 2020/11/12 12:12:36 push writeFrame for streamID=175 size=0 2020/11/12 12:12:36 push writeFrame for streamID=201 size=0 2020/11/12 12:12:36 push writeFrame for streamID=179 size=0 2020/11/12 12:12:36 push writeFrame for streamID=181 size=0 2020/11/12 12:12:36 push writeFrame for streamID=183 size=0 2020/11/12 12:12:36 push writeFrame for streamID=185 size=0 2020/11/12 12:12:36 push writeFrame for streamID=187 size=0 2020/11/12 12:12:36 push writeFrame for streamID=207 size=0 2020/11/12 12:12:36 push writeFrame for streamID=203 size=0 2020/11/12 12:12:36 push writeFrame for streamID=209 size=0 2020/11/12 12:12:36 push writeFrame for streamID=205 size=0 2020/11/12 12:12:36 push writeFrame for streamID=189 size=0 2020/11/12 12:12:36 push writeFrame for streamID=213 size=0 2020/11/12 12:12:36 push writeFrame for streamID=211 size=0 2020/11/12 12:12:36 push writeFrame for streamID=221 size=0 2020/11/12 12:12:36 push writeFrame for streamID=193 size=0 2020/11/12 12:12:36 push writeFrame for streamID=191 size=0 2020/11/12 12:12:36 push writeFrame for streamID=197 size=0 2020/11/12 12:12:36 push writeFrame for streamID=199 size=0 2020/11/12 12:12:36 push writeFrame for streamID=195 size=0 2020/11/12 12:12:36 push writeFrame for streamID=215 size=0 2020/11/12 12:12:36 push writeFrame for streamID=217 size=0 2020/11/12 12:12:36 push writeFrame for streamID=223 size=0 2020/11/12 12:12:36 push writeFrame for streamID=219 size=0 2020/11/12 12:12:36 push writeFrame for streamID=225 size=0 2020/11/12 12:12:36 push writeFrame for streamID=227 size=0 2020/11/12 12:12:36 push writeFrame for streamID=123 size=3464 2020/11/12 12:12:36 push writeFrame for streamID=225 size=312351 2020/11/12 12:12:36 push writeFrame for streamID=225 size=0 2020/11/12 12:12:36 push writeFrame for streamID=229 size=0 2020/11/12 12:12:36 push writeFrame for streamID=231 size=0 2020/11/12 12:12:37 push writeFrame for streamID=177 size=0
Popped Frames
2020/11/12 12:12:35 sched pop writeFrame for streamID=0 size=0 2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0 2020/11/12 12:12:35 sched pop writeFrame for streamID=26 size=0 2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0 2020/11/12 12:12:35 sched pop writeFrame for streamID=26 size=15169 2020/11/12 12:12:35 sched pop writeFrame for streamID=28 size=0 2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0 2020/11/12 12:12:35 sched pop writeFrame for streamID=26 size=0 2020/11/12 12:12:35 sched pop writeFrame for streamID=28 size=5566 2020/11/12 12:12:35 sched pop writeFrame for streamID=30 size=0 2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0 2020/11/12 12:12:35 sched pop writeFrame for streamID=28 size=0 2020/11/12 12:12:35 sched pop writeFrame for streamID=30 size=16384 2020/11/12 12:12:35 sched pop writeFrame for streamID=30 size=16384 2020/11/12 12:12:35 sched pop writeFrame for streamID=30 size=16384 2020/11/12 12:12:35 sched pop writeFrame for streamID=30 size=16384 2020/11/12 12:12:35 sched pop writeFrame for streamID=30 size=11626 2020/11/12 12:12:35 sched pop writeFrame for streamID=32 size=0 2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0 2020/11/12 12:12:35 sched pop writeFrame for streamID=32 size=354 2020/11/12 12:12:35 sched pop writeFrame for streamID=30 size=0 2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0 2020/11/12 12:12:35 sched pop writeFrame for streamID=34 size=0 2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0 2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0 2020/11/12 12:12:35 sched pop writeFrame for streamID=38 size=0 2020/11/12 12:12:35 sched pop writeFrame for streamID=34 size=483 2020/11/12 12:12:35 sched pop writeFrame for streamID=36 size=0 2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0 2020/11/12 12:12:35 sched pop writeFrame for streamID=40 size=0 2020/11/12 12:12:35 sched pop writeFrame for streamID=38 size=16384 2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0 2020/11/12 12:12:35 sched pop writeFrame for streamID=40 size=6293 2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0 2020/11/12 12:12:35 sched pop writeFrame for streamID=42 size=0 2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0 2020/11/12 12:12:35 sched pop writeFrame for streamID=44 size=0 2020/11/12 12:12:35 sched pop writeFrame for streamID=42 size=11249 2020/11/12 12:12:35 sched pop writeFrame for streamID=46 size=0 2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0 2020/11/12 12:12:35 sched pop writeFrame for streamID=48 size=0 2020/11/12 12:12:35 sched pop writeFrame for streamID=44 size=9293 2020/11/12 12:12:35 sched pop writeFrame for streamID=48 size=346 2020/11/12 12:12:35 sched pop writeFrame for streamID=46 size=2626 2020/11/12 12:12:35 sched pop writeFrame for streamID=36 size=6726 2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=6381 2020/11/12 12:12:35 sched pop writeFrame for streamID=38 size=11101 2020/11/12 12:12:35 sched pop writeFrame for streamID=40 size=0 2020/11/12 12:12:35 sched pop writeFrame for streamID=42 size=0 2020/11/12 12:12:35 sched pop writeFrame for streamID=44 size=0 2020/11/12 12:12:35 sched pop writeFrame for streamID=36 size=0 2020/11/12 12:12:35 sched pop writeFrame for streamID=119 size=0 2020/11/12 12:12:35 sched pop writeFrame for streamID=38 size=0 2020/11/12 12:12:36 sched pop writeFrame for streamID=121 size=0 2020/11/12 12:12:36 sched pop writeFrame for streamID=121 size=1021 2020/11/12 12:12:36 sched pop writeFrame for streamID=123 size=0 2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=0 2020/11/12 12:12:36 sched pop writeFrame for streamID=123 size=3464 2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384 2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384 2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384 2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384 2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384 2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384 2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384 2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384 2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384 2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384 2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384 2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384 2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384 2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384 2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384 2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384 2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384 2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384 2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=16384 2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=1055 2020/11/12 12:12:36 sched pop writeFrame for streamID=225 size=0
What did you expect to see?
No lockups.
What did you see instead?
Random lockups.
Activity
dtelyukh commentedon Nov 12, 2020
Possible related issue #23559
networkimprov commentedon Nov 12, 2020
cc @fraenkel
fraenkel commentedon Nov 13, 2020
@dtelyukh We are going to need something that can reproduce the issue. It would also help to enable http2 debug and a thread dump when it hangs.
fraenkel commentedon Nov 13, 2020
Your title says the bundled version of http2 has this issue. Are you implying that if you use the latest x/net/http2 you dont?
dtelyukh commentedon Nov 13, 2020
It's not easy to prepare code, which can reproduce this problem with 100% guarantee, but I think I could try.
Here is debug.log for
GODEBUG=http2debug=2
http2.debug.log
and goroutines dump with
pprof
goroutine.dump.log
h2_bundle.go used by third-party code. I didn't try to use x/net/http2 directly. Do you mean that I should do that?
fraenkel commentedon Nov 13, 2020
Don't worry. I am going to need something that can reproduce this issue.
I can see why nothing is making progress but I don't know why.
The debug log is incomplete or slightly broken, but from what I do see there is an oddity.
Notice the stream for the PUSH is 1511 but the above is the first time I see that Framer. And the rest are in the 300s. I don't exactly see how this happened.
There are multiple PUSH_PROMISE all with the same stream id which is also odd.
dtelyukh commentedon Nov 13, 2020
I truncated log-file after each successful request until it was hanged. Maybe that is why the log-file was broken.
I attach here other log-file, which was made when I caught problem from the first time. This log-file was never truncated.
http2.debug.log
dtelyukh commentedon Nov 14, 2020
@fraenkel, we prepared test application for problem reproducing. My apologies for so complicated app. We cannot extract some small piece of code, because we don't know where is the problem exactly.
I sent credentials to michael.fraenkel@gmail.com.
To have more chances to catch the problem it should to remove proxy cache:
kill -SIGTERM <caddy process id>
rm -fR /home/user/caddy-cache
./caddy run&
To patch or debug server:
cd /home/user/go/src/github.com/caddyserver/caddy/cmd/caddy
CGO_ENABLED=0 go build
mv ./caddy ~/caddy
cd
sudo setcap 'cap_net_bind_service=+ep' ./caddy
kill -SIGTERM <caddy process id>
./caddy run&
fraenkel commentedon Nov 15, 2020
@dtelyukh I did find a way to cause the hang locally, from my machine it would never happen.
for i in {1..1000}; do echo $i; nghttp https://cardonecapital.hc04.dorofeev.me/ -n; done
would eventually hang.Once I attempted to compile a new Go, I could never get caddy to rebuild. I always ended up with
qtls init failure. Attempting to fix that, I didn't realize at the time that the src tree was a bit special so I can no longer make any progress since I cannot download your smart-cache module.
If you could fix the tree, at least next time I know to make a copy of the entire tree before doing anything. I am a bit concerned that a simple
go mod tidy
prevented any further compilation of caddy.fraenkel commentedon Nov 15, 2020
Never mind, I got it working again....
fraenkel commentedon Nov 15, 2020
So one thing I did verify is that using the latest golang/x/net/http2 code does not cause the hang I see with my simple testcase.
dtelyukh commentedon Nov 15, 2020
@fraenkel, what can I help?
fraenkel commentedon Nov 15, 2020
You can see the one line change I made to caddyserver with a
go mod tidy
. See if this new version hangs for you as well.32 remaining items
geropl commentedon May 10, 2021
👋 Any update on this issue? Any idea how and when this might be resolved?
dtelyukh commentedon Nov 1, 2021
I hope it will help: #45435
ReillyBrogan commentedon Dec 3, 2021
I believe this is fixed with Go 1.17 considering that the bundled http2 library was updated to a snapshot of x/net/http2 from May of this year.
@dtelyukh Can you give 1.17 a try with your test case (without manually using x/net/http2)?
dtelyukh commentedon Dec 5, 2021
@ReillyBrogan , I tried 1.17.3 - the problem still exists. I think, 1.18 could help.
bakape commentedon Jan 7, 2022
Potentially fixed with #49921. It's part of the Go 1.17.6 release.
dtelyukh commentedon Apr 6, 2022
This is still an issue on Go 1.18
whitewindmills commentedon Jun 14, 2023
I had the same problem using Go 1.19.
divanodestiny commentedon Apr 29, 2024
any progress now?i had the same problems with golang.org/x/net v0.0.0-20210813160813-60bc85c4be6d in branch internal-branch.go1.18-vendor
core/node/rpc: Use x/http2.Server instead of http.Server (#630)
Rohsichan commentedon Sep 9, 2024
I had the same problem using Go 1.22.6
virtyaluk commentedon Dec 30, 2024
I had the same problem using Go 1.23.1