Description
The following program hangs:
package main
import (
"runtime"
"sync/atomic"
)
func main() {
const P = 4
runtime.GOMAXPROCS(P)
x := uint32(0)
for p := 0; p < P; p++ {
go func() {
atomic.AddUint32(&x, 1)
for atomic.LoadUint32(&x) != P {
}
}()
}
for atomic.LoadUint32(&x) != P {
runtime.Gosched()
}
}
SIGABRT: abort
PC=0x44dae0 m=0
goroutine 21 [running]:
sync/atomic.LoadUint32(0xc8200b6000)
src/sync/atomic/asm_amd64.s:92 fp=0xc820060798 sp=0xc820060790
main.main.func1(0xc8200b6000)
/tmp/gosched.go:15 +0x37 fp=0xc8200607b8 sp=0xc820060798
runtime.goexit()
src/runtime/asm_amd64.s:1998 +0x1 fp=0xc8200607c0 sp=0xc8200607b8
created by main.main
/tmp/gosched.go:17 +0x78
goroutine 1 [runnable]:
runtime.Gosched()
src/runtime/proc.go:235 +0x14
main.main()
/tmp/gosched.go:20 +0xa7
goroutine 18 [runnable]:
main.main.func1(0xc8200b6000)
/tmp/gosched.go:13
created by main.main
/tmp/gosched.go:17 +0x78
goroutine 19 [running]:
goroutine running on other thread; stack unavailable
created by main.main
/tmp/gosched.go:17 +0x78
goroutine 20 [running]:
goroutine running on other thread; stack unavailable
created by main.main
/tmp/gosched.go:17 +0x78
One goroutine constantly calls runtime.Gosched
but another runnable goroutine is starved.
The root cause is: Gosched puts the current goroutine onto global run queue, then the thread check local run queue (empty), then it checks global run queue and picks up the old goroutine again. But at the same time there is another runnable goroutine in remote per-P queue.
This is probably not super critical, as it can happen only if there are goroutines in tight non-preemptable loops. But still we could check local queues ahead of global queue once in a while in findrunnable. We do the opposite hack in schedule -- check global queue ahead of local queue once in a while. On the other hand this will destroy locality, which is bad for performance...
Activity
dvyukov commentedon Dec 9, 2015
@aclements @RLH @rsc @randall77 @dr2chase
aclements commentedon Dec 9, 2015
Interesting, though I agree with your assessment that this seems relatively low priority. I think if we fix the problem with non-preemptible loops (#10958) it will also fix this, and I'd much rather fix non-preemptible loops than try to put a hack in the scheduler (unless it can be more generally justified).
Hopefully SSA will make it easier to fix non-preemptible loops (because SSA will make everything easier, right? :)
dvyukov commentedon Dec 9, 2015
Yes, fixing non-preemptible loops is definitely better.
aclements commentedon Dec 10, 2015
Closing as a dup of #10958, though we can of course reopen if we want to take a more specific approach to this problem.