Skip to content

time: Timer reset broken under heavy use since go1.16 timer optimizations added #47329

Closed
@andrewvc

Description

@andrewvc

What version of Go are you using (go version)?

Tracked the specific issue to commit b4b0144 via git bisect

$ go version
go version devel go1.17-d568e6e075 Tue Jul 20 19:54:36 2021 +0000 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
andrewvc@LAPTOP-80O11FM2 ~/p/b/heartbeat (fix-timer-failure)> go env
warning: GOPATH set to GOROOT (/home/andrewvc/projects/go) has no effect
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/andrewvc/.cache/go-build"
GOENV="/home/andrewvc/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/andrewvc/projects/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/andrewvc/projects/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/andrewvc/projects/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/andrewvc/projects/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/andrewvc/projects/beats/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build674387485=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Longstanding unit tests (~1.5 years old) started sporadically failing against go 1.16.x

Our program (Heartbeat) rapidly stops and resets a single timer based on user submitted jobs, effectively testing the golang timer. Further investigation revealed that timer.Reset was no longer resetting the timer consistently. Every 10-40k iterations or so it would have no effect, resulting in a non-triggering timer, and in our case, a deadlocked program.

We tracked the specific issue to a change introduced in golang commit b4b0144 via git bisect

The failure can be reproduced by running from the special branch below, which contains an enhanced test suite for Heartbeat using a watchdog timer to catch the failed timer.

# Examples use a zip download to prevent a full repo clone
curl -L https://github.com/andrewvc/beats/archive/refs/heads/broken-timer.zip -o broken-timer.zip
unzip -q broken-timer.zip
cd beats-broken-timer/heartbeat
go test -timeout 30s -run '^TestStress$' github.com/elastic/beats/v7/heartbeat/scheduler/timerqueue

We are now avoiding Reset in favor of NewTimer in a workaround PR elastic/beats#27006 . You can validate this by deleting the Reset call here and replacing it with the NewTimer call here

Digging into the golang source code I discovered that I could fix the issue by commenting out the optimization on this line inside adjusttimers . It seems that the accounting of that variable may have an issue somewhere. The code is quite tricky, heavily concurrent, etc, and could use the eye of someone familiar with it.

What did you expect to see?

I expected the timer to fire consistently when reset.

What did you see instead?

Nothing, after enhancing the test suite for heartbeat to dump traces it was apparent that the program was in an idle state, with no timer scheduled, and no other code blocked.

Activity

changed the title [-]Timer reset broken under heavy use since go1.16 timer optimizations added[/-] [+]time: Timer reset broken under heavy use since go1.16 timer optimizations added[/+] on Jul 21, 2021
added
NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.
on Jul 21, 2021
added this to the Go1.18 milestone on Jul 21, 2021
ianlancetaylor

ianlancetaylor commented on Jul 22, 2021

@ianlancetaylor
Contributor

Thanks for the good test case and analysis.

modified the milestones: Go1.18, Go1.17 on Jul 22, 2021
ianlancetaylor

ianlancetaylor commented on Jul 22, 2021

@ianlancetaylor
Contributor

@gopherbot Please open backport to 1.16.

This bug also exists in 1.16. It can cause programs that use Timer.Reset to fail to run a timer when it is ready.

gopherbot

gopherbot commented on Jul 22, 2021

@gopherbot
Contributor

Backport issue(s) opened: #47332 (for 1.16).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

gopherbot

gopherbot commented on Jul 22, 2021

@gopherbot
Contributor

Change https://golang.org/cl/336432 mentions this issue: runtime: don't clear timerModifiedEarliest if adjustTimers is 0

added a commit that references this issue on Jul 22, 2021
added a commit that references this issue on Jul 22, 2021

19 remaining items

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @andrewvc@ianlancetaylor@gopherbot

        Issue actions

          time: Timer reset broken under heavy use since go1.16 timer optimizations added · Issue #47329 · golang/go