Open
Description
What version of Go are you using (go version
)?
go version go1.10.1 linux/amd64
Does this issue reproduce with the latest release?
Yes (1.10.2
).
What operating system and processor architecture are you using (go env
)?
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/bas/.cache/go-build"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/bas/go"
GORACE=""
GOROOT="/usr/lib/go-1.10"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go-1.10/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build723289083=/tmp/go-build -gno-record-gcc-switches"
What did you do?
The following Go program calls time.Sleep
the number of times given as a commandline argument.
package main
import (
"os"
"strconv"
"time"
)
var max int
func main() {
max, _ = strconv.Atoi(os.Args[1])
n := 0
for {
time.Sleep(time.Second / 100)
n += 1
if n >= max {
return
}
}
}
If track the number of sys calls using strace -f -c
, we find
bas@fourier2:~/gosleeptest$ strace -c -f ./gosleeptest 1
strace: Process 3115 attached
strace: Process 3114 attached
strace: Process 3116 attached
strace: Process 3117 attached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
0.00 0.000000 0 8 mmap
0.00 0.000000 0 1 munmap
0.00 0.000000 0 114 rt_sigaction
0.00 0.000000 0 14 rt_sigprocmask
0.00 0.000000 0 4 clone
0.00 0.000000 0 1 execve
0.00 0.000000 0 10 sigaltstack
0.00 0.000000 0 5 arch_prctl
0.00 0.000000 0 9 gettid
0.00 0.000000 0 8 1 futex
0.00 0.000000 0 1 sched_getaffinity
0.00 0.000000 0 1 readlinkat
0.00 0.000000 0 22 pselect6
------ ----------- ----------- --------- --------- ----------------
100.00 0.000000 198 1 total
bas@fourier2:~/gosleeptest$ strace -c -f ./gosleeptest 10
strace: Process 3919 attached
strace: Process 3918 attached
strace: Process 3917 attached
strace: Process 3927 attached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
0.00 0.000000 0 8 mmap
0.00 0.000000 0 1 munmap
0.00 0.000000 0 114 rt_sigaction
0.00 0.000000 0 14 rt_sigprocmask
0.00 0.000000 0 2 sched_yield
0.00 0.000000 0 4 clone
0.00 0.000000 0 1 execve
0.00 0.000000 0 10 sigaltstack
0.00 0.000000 0 5 arch_prctl
0.00 0.000000 0 9 gettid
0.00 0.000000 0 74 12 futex
0.00 0.000000 0 1 sched_getaffinity
0.00 0.000000 0 1 readlinkat
0.00 0.000000 0 69 pselect6
------ ----------- ----------- --------- --------- ----------------
100.00 0.000000 313 12 total
bas@fourier2:~/gosleeptest$ strace -c -f ./gosleeptest 100
strace: Process 4491 attached
strace: Process 4490 attached
strace: Process 4489 attached
strace: Process 4532 attached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
89.01 0.043330 82 530 104 futex
9.76 0.004751 21 228 pselect6
0.27 0.000131 1 114 rt_sigaction
0.23 0.000114 23 5 arch_prctl
0.19 0.000091 9 10 sigaltstack
0.18 0.000086 10 9 gettid
0.13 0.000061 4 14 rt_sigprocmask
0.07 0.000035 4 8 mmap
0.07 0.000033 33 1 readlinkat
0.03 0.000017 4 4 clone
0.03 0.000017 17 1 execve
0.02 0.000009 9 1 munmap
0.01 0.000003 3 1 sched_getaffinity
------ ----------- ----------- --------- --------- ----------------
100.00 0.048678 926 104 total
What did you expect to see?
A single time.Sleep
should use approximately one syscall. (Python's time.sleep
does only use one syscall, for instance.)
What did you see instead?
Approximately seven sys calls per time.Sleep
. As a consequence, the go process also uses quite a bit of CPU time per time.Sleep
: 500us (compared to 13us for Python's time.sleep
).
Notes
I encountered this issue while debugging unexpectedly high idle CPU usage by wireguard-go
.
Metadata
Metadata
Assignees
Labels
Type
Projects
Relationships
Development
No branches or pull requests
Activity
robpike commentedon May 21, 2018
I doubt it will ever go as low as one in general. In a concurrent language like Go, unlike Python, putting a thread to sleep requires locking data structures in the concurrent runtime. Seven may be more than are truly needed, I'm not sure, but time.Sleep cannot be done in only one call to the kernel without blocking the rest of the program's running goroutines.
[-]Single `time.Sleep` requires ~7 syscalls[/-][+]time: Sleep requires ~7 syscalls[/+]agnivade commentedon May 21, 2018
What Rob said. Not sure if there's anything to be done here. /cc @rsc
bwesterb commentedon May 21, 2018
I'm a bit confused about your hunch that this is due to locking. Futexes use a
CMPXCHG
in userspace to acquire a mutex. Only when the lock is contended will they call thefutex()
syscall withFUTEX_WAIT
on the mutex to wait for it to unlock, right? Why is there contention in this example?bcmills commentedon May 21, 2018
Does the number of syscalls account for the observed CPU usage in
wireguard-go
, or is this a peripheral issue? We try to focus on concrete problems (specific programs with poor behavior) rather than abstract ones (i.e., microbenchmarks).bwesterb commentedon May 21, 2018
Yes, it did. As a workaround, the author now disables the 1Hz ticker if there is nothing to do because of the overhead. (Although that's a good idea anyway to prevent unnecessary wake-ups.)
(So the overhead is 0.05% of CPU constantly for a 1Hz ticker (on my test machine), which isn't huge — certainly not for servers. It's 43 seconds per day.)
odeke-em commentedon May 21, 2018
/cc @ianlancetaylor @aclements @randall77 too
aclements commentedon May 21, 2018
The high number of futex calls is surprising, for the reasons you pointed out. Most likely what's happening is not lock contention, but rather that the runtime is waking a sleeping thread, though I haven't confirmed that.
ianlancetaylor commentedon May 25, 2018
The Go runtime works with a set of timer goroutines sharded by P. The
time.Sleep
call is implemented by having the calling goroutine tell its local timer goroutine to wake it up at a certain time. This means adding a timer to a heap, and then waking up the timer goroutine: a futex operation.After that, the goroutine that called
time.Sleep
parks itself. The M that was running that goroutine then looks for another goroutine to run, doesn't find any, and parks itself. Parking a goroutine just means marking the G as asleep, but parking an M means sleeping until told to wake up: a futex operation.Meanwhile, the timer goroutine looks at the new timer, decides how long it has to sleep, and goes to sleep for that long: a futex operation.
Eventually the timer goroutine wakes up. It unparks the sleeping goroutine, adding it to the queue of goroutines ready to run. There aren't any M's running, so it has to wake one up: a futex operation.
The timer goroutine then sees that there are no timers, and goes back to sleep waiting for something to happen: a futex operation.
The M that was woken up starts to execute the goroutine, which immediately calls
time.Sleep
again, and the cycle starts over.Meanwhile the system monitor thread runs, poking along doing its own thing while all this is happening. On tip this mostly shows up as calls to
nanosleep
, above it shows up aspselect6
.So I think that gives me 6 system calls per
time.Sleep
system call. Above it says there are 7, so I guess I missed one somewhere. Maybe the system monitor makes another system call.Some important points here are that the system monitor thread is part of the background of a Go process. It doesn't have anything to do with
time.Sleep
, it just shows up here because nothing else is happening. The M parking and unparking itself is only happening because there are no other goroutines; if there were, the M would be picking those up and running them, steps that in themselves do not require any system calls.So I think this microbenchmark is not necessarily indicative of the cost of calling
time.Sleep
. The costs that are closely related totime.Sleep
are the futex call to wake up the timer goroutine and the futex call by the timer goroutine itself to go back to sleep. Those seem difficult to avoid, assuming that we don't want a sleeping goroutine to tie up a thread, which of course we don't. Even those system calls wouldn't necessarily happen if there were other timers, as we only need to notify the timer goroutine of a new timer if it is newer than any existing timer.All in all it's not obvious to me that there is anything to do here. We really do need three threads: the main thread, the timer goroutine, the system monitor. The main thread and the timer goroutine really do need to communicate, and they really do need to go to sleep during the
time.Sleep
call. I don't see how to avoid that even hypothetically. For this example we could save a system call if the timer goroutine spun for a while when there was nothing to do, hoping that something would come in, but that hardly seems like a good idea in general.These system calls are all fast and should not put a significant load on the system.
As mentioned above the comparison to Python is unhelpful, since Python doesn't have to support multiplexing multiple sleeping goroutines onto a small set of threads.
bwesterb commentedon May 25, 2018
My comparison with Python indeed wasn't fair.
Thanks for the explanation. If I understand correctly the current implementation forces the timer goroutine to be executed on a different machine thread than the goroutine calling it. Couldn't the machine thread of the sleeper be reused when there is no
timerproc
running yet? (Implementation complexity is of course a valid objection, if applicable.)ianlancetaylor commentedon May 30, 2018
Given the existence of timers, there has to be a thread that sleeps until the next timer is ready. In the current implementation that thread is the one running the timer goroutine. So in the ordinary steady state there's no way for the goroutine that calls
time.Sleep
to run on the same thread that runs the timer goroutine, because the latter thread is already sleeping until the next timer fires.I think it's true that in your example program, in which there is only ever one timer, when the call to
time.Sleep
happens the timer goroutine is not sleeping but is instead simply parked. So it would be possible for this example to have the thread running the goroutine callingtime.Sleep
to pick up the timer goroutine directly. But in a real program it doesn't seem like a particularly likely case, so it doesn't seem worth implementing that optimization.bcmills commentedon May 30, 2018
Is that actually true? I thought that, at least on some platforms, there is a system call you can use to deliver a signal to the process at the next timer instead: if that's true, we only need a sleeping thread if there isn't some other running thread in the process that can receive the signal.
14 remaining items