Description
What version of Go are you using (go version
)?
$ go version 1.16.5
Does this issue reproduce with the latest release?
yes, tested on 1.17beta1 as well.
What operating system and processor architecture are you using (go env
)?
go env
Output
$ go envGO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/root/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/root/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/root/sdk/go1.16.5"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/root/sdk/go1.16.5/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.16.5"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/user/go-rate-limiter-bug/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2983600181=/tmp/go-build -gno-record-gcc-switches"
What did you do?
I created a reproduction case: https://github.com/beanieboi/go-rate-limiter-bug/blob/main/main_test.go
func TestRateLimiter(t *testing.T) {
ctx := context.Background()
numItems := 300000
endTime := time.Now().Add(10 * time.Second)
timeRemaining := time.Until(endTime)
dispatchInterval := time.Duration(int64(timeRemaining) / int64(numItems))
limiter := rate.NewLimiter(rate.Every(dispatchInterval), 1)
loopStart := time.Now()
for i := 0; i < numItems; i++ {
err := limiter.Wait(ctx)
if err != nil {
t.Errorf("error waiting: %v", err)
}
// do actual dispatching
}
duration := time.Since(loopStart)
if duration > 15*time.Second {
t.Errorf("expected execution time of less than 15, got: %v", duration)
}
}
in short. we are using the time rate.NewLimiter()
with limiter.Wait(ctx)
and see much longer wait times than on 1.15.7
What did you expect to see?
the same behaviour as on 1.15.7, the tests linked above should finish within ~15s
What did you see instead?
the limiter is waiting too long and the whole test takes ~160s instead of ~15s
Activity
[-]limiter.Wait() on 1.16 and higher waits too long[/-][+]x/time/rate: Limiter.Wait waits too long[/+]mknyszek commentedon Jul 7, 2021
Have you been able to determine whether this is due to a change in the
x/time/rate
or something deeper? I will note #44343 which seems to have regressed in Go 1.16.beanieboi commentedon Jul 28, 2021
hey @mknyszek
I was able to use bisect and pin it down to: 8fdc79e
mknyszek commentedon Aug 9, 2021
@ChrisHines @prattmic Is this the same root cause as #44343?
beanieboi commentedon Aug 11, 2021
@mknyszek I saw that #44343 is tagged as "OS-Windows" and the discussion is mainly about Windows.
We are seeing this issue on Linux (Ubuntu 20.04).
mknyszek commentedon Aug 11, 2021
It is, but we've been having other late timer issues since 8fdc79e and I thought that issue spun out of that. Perhaps its moved on from the general issues around that commit.
prattmic commentedon Aug 11, 2021
This is probably the same cause as #44343. That bug is mostly discussing Windows as the problem is more pronounced there (15ms sleeps), but it does exist on Linux as well (1ms sleeps).
ChrisHines commentedon Aug 12, 2021
I agree that the change in behavior goes back to 8fdc79e. After that change timers rely entirely on the netpoller to sleep and wakeup threads. It removed a variable frequency polling loop in the
sysmon
thread that was helping to wake up threads more quickly in some situations (but not all).On linux that polling loop could run as frequently as every ~20 microseconds or as infrequently as every 10 milliseconds. By comparison the netpoller on linux has a wakeup granularity of ~1 millisecond. So that change reduced the worst case sleep time from 10ms to 1ms on linux, but also raised the best case time. As a counter point, the change also helped reduce CPU load for programs that mostly do lots of sleeps (see: #27707 (comment)).
The test code above calls
rate.NewLimiter
with a burst size of one and an event frequency of 33.333 microseconds. If timers sleep for a minimum of 1ms, then I believe on average therate.Limiter
will alternate between waiting for 1ms and 0ms. The first long sleep forces enough time to pass that a full burst is allowed on the subsequent call. But since a burst is limited to one event no additional tokens accumulate in the bucket until the zero wait has consumed the one that's there.As a result the test is performing the equivalent of ~150,000 sleeps of 1ms, which matches up reasonably well with the report of the total runtime of 160 seconds.
The above test wants to allow 300,000 events in 10 seconds which is the same as 30 events per ms. If we know that the Go 1.16 runtime cannot reliably perform sleeps less than 1ms, then using a burst size ~30 would be more achievable. That should allow 30 events back-to-back followed by a ~1ms sleep and another 30 events, etc.