Skip to content

x/time/rate: Limiter.Wait waits too long #47084

Open
@beanieboi

Description

@beanieboi

What version of Go are you using (go version)?

$ go version

1.16.5

Does this issue reproduce with the latest release?

yes, tested on 1.17beta1 as well.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env

GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/root/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/root/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/root/sdk/go1.16.5"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/root/sdk/go1.16.5/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.16.5"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/user/go-rate-limiter-bug/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2983600181=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I created a reproduction case: https://github.com/beanieboi/go-rate-limiter-bug/blob/main/main_test.go

func TestRateLimiter(t *testing.T) {
	ctx := context.Background()
	numItems := 300000

	endTime := time.Now().Add(10 * time.Second)
	timeRemaining := time.Until(endTime)
	dispatchInterval := time.Duration(int64(timeRemaining) / int64(numItems))
	limiter := rate.NewLimiter(rate.Every(dispatchInterval), 1)

	loopStart := time.Now()
	for i := 0; i < numItems; i++ {
		err := limiter.Wait(ctx)
		if err != nil {
			t.Errorf("error waiting: %v", err)
		}
		// do actual dispatching
	}
	duration := time.Since(loopStart)

	if duration > 15*time.Second {
		t.Errorf("expected execution time of less than 15, got: %v", duration)
	}
}

in short. we are using the time rate.NewLimiter() with limiter.Wait(ctx) and see much longer wait times than on 1.15.7

What did you expect to see?

the same behaviour as on 1.15.7, the tests linked above should finish within ~15s

What did you see instead?

the limiter is waiting too long and the whole test takes ~160s instead of ~15s

Activity

changed the title [-]limiter.Wait() on 1.16 and higher waits too long[/-] [+]x/time/rate: Limiter.Wait waits too long[/+] on Jul 7, 2021
added this to the Unreleased milestone on Jul 7, 2021
mknyszek

mknyszek commented on Jul 7, 2021

@mknyszek
Contributor

Have you been able to determine whether this is due to a change in the x/time/rate or something deeper? I will note #44343 which seems to have regressed in Go 1.16.

added
NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.
on Jul 7, 2021
beanieboi

beanieboi commented on Jul 28, 2021

@beanieboi
Author

hey @mknyszek

I was able to use bisect and pin it down to: 8fdc79e

~/goroot$ git bisect start go1.16.1 go1.15.14
Bisecting: a merge base must be tested
[ba9e10889976025ee1d027db6b1cad383ec56de8] cmd: update golang.org/x/xerrors
$ git bisect run src/bisect.sh
...Building Go...
ok  	github.com/beanieboi/go-rate-limiter-bug	10.003s

Bisecting: 891 revisions left to test after this (roughly 10 steps)
[0040adfd0f98e1012837b0317fabf69cf1f8855b] runtime: define ios/arm64 entry points
...Building Go...
ok  	github.com/beanieboi/go-rate-limiter-bug	10.003s
Bisecting: 445 revisions left to test after this (roughly 9 steps)

[0932dc21180642ce1ff095b9b3e68b06c6f440b3] runtime: declare arg size/map for race version of sync/atomic functions
...Building Go...
FAIL	github.com/beanieboi/go-rate-limiter-bug	16.968s
Bisecting: 222 revisions left to test after this (roughly 8 steps)

[d4c1ad882973e407ff85b977f4ce5b9435451190] database/sql: fix tx stmt deadlock when rollback
...Building Go...
FAIL	github.com/beanieboi/go-rate-limiter-bug	16.972s
Bisecting: 111 revisions left to test after this (roughly 7 steps)

[db185e543fe471c522790b7d93291e786dc54a84] runtime: drop redundant gcBlackenEnabled reset
...Building Go...
ok  	github.com/beanieboi/go-rate-limiter-bug	10.003s
Bisecting: 55 revisions left to test after this (roughly 6 steps)

[36c5edd8d9e6c13af26733e5c820eae0598203fe] runtime: add timeHistogram type
...Building Go...ok  	github.com/beanieboi/go-rate-limiter-bug	10.003s
Bisecting: 27 revisions left to test after this (roughly 5 steps)

[79a3482d9e735fc6d0373da170fd1b4ff9222bc2] cmd/compile: remove support for untyped ssa rules
...Building Go...
FAIL	github.com/beanieboi/go-rate-limiter-bug	17.042s
Bisecting: 13 revisions left to test after this (roughly 4 steps)

[333e90448a0e55f2e1161853caecf3d30ef3a74a] crypto/rand: generate random numbers using RtlGenRandom on Windows
...Building Go...
ok  	github.com/beanieboi/go-rate-limiter-bug	10.003s
Bisecting: 6 revisions left to test after this (roughly 3 steps)

[b3f7f60129b822978115717912f4d477a46e8467] cmd/dist: fix build failure of misc/cgo/test on arm64
...Building Go...
ok  	github.com/beanieboi/go-rate-limiter-bug	10.003s
Bisecting: 3 revisions left to test after this (roughly 2 steps)

[e3bb53a7683eb9c3d04c09f28abb4cf9aa89a7c1] cmd/compile: remove isLiteral
...Building Go...
FAIL	github.com/beanieboi/go-rate-limiter-bug	17.011s
Bisecting: 0 revisions left to test after this (roughly 1 step)

[3f6b1a0d5eea4756b905db1c2b2c03e8594850d3] misc/cgo/test: test C.enum_*
...Building Go...
FAIL	github.com/beanieboi/go-rate-limiter-bug	16.950s
Bisecting: 0 revisions left to test after this (roughly 0 steps)

[8fdc79e18a9704185bd6471b592db1e8004bd993] runtime: reduce timer latency
...Building Go...
FAIL	github.com/beanieboi/go-rate-limiter-bug	16.989s
8fdc79e18a9704185bd6471b592db1e8004bd993 is the first bad commit
mknyszek

mknyszek commented on Aug 9, 2021

@mknyszek
Contributor

@ChrisHines @prattmic Is this the same root cause as #44343?

beanieboi

beanieboi commented on Aug 11, 2021

@beanieboi
Author

@mknyszek I saw that #44343 is tagged as "OS-Windows" and the discussion is mainly about Windows.
We are seeing this issue on Linux (Ubuntu 20.04).

mknyszek

mknyszek commented on Aug 11, 2021

@mknyszek
Contributor

It is, but we've been having other late timer issues since 8fdc79e and I thought that issue spun out of that. Perhaps its moved on from the general issues around that commit.

prattmic

prattmic commented on Aug 11, 2021

@prattmic
Member

This is probably the same cause as #44343. That bug is mostly discussing Windows as the problem is more pronounced there (15ms sleeps), but it does exist on Linux as well (1ms sleeps).

ChrisHines

ChrisHines commented on Aug 12, 2021

@ChrisHines
Contributor

I agree that the change in behavior goes back to 8fdc79e. After that change timers rely entirely on the netpoller to sleep and wakeup threads. It removed a variable frequency polling loop in the sysmon thread that was helping to wake up threads more quickly in some situations (but not all).

On linux that polling loop could run as frequently as every ~20 microseconds or as infrequently as every 10 milliseconds. By comparison the netpoller on linux has a wakeup granularity of ~1 millisecond. So that change reduced the worst case sleep time from 10ms to 1ms on linux, but also raised the best case time. As a counter point, the change also helped reduce CPU load for programs that mostly do lots of sleeps (see: #27707 (comment)).

The test code above calls rate.NewLimiter with a burst size of one and an event frequency of 33.333 microseconds. If timers sleep for a minimum of 1ms, then I believe on average the rate.Limiter will alternate between waiting for 1ms and 0ms. The first long sleep forces enough time to pass that a full burst is allowed on the subsequent call. But since a burst is limited to one event no additional tokens accumulate in the bucket until the zero wait has consumed the one that's there.

As a result the test is performing the equivalent of ~150,000 sleeps of 1ms, which matches up reasonably well with the report of the total runtime of 160 seconds.

The above test wants to allow 300,000 events in 10 seconds which is the same as 30 events per ms. If we know that the Go 1.16 runtime cannot reliably perform sleeps less than 1ms, then using a burst size ~30 would be more achievable. That should allow 30 events back-to-back followed by a ~1ms sleep and another 30 events, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @beanieboi@ChrisHines@mknyszek@prattmic@gopherbot

        Issue actions

          x/time/rate: Limiter.Wait waits too long · Issue #47084 · golang/go