Closed
Description
While comparing dvyukov/go-fuzz with mdempsky/go114-fuzz-build, I noticed that fuzzing k8s.io/kubernetes/test/fuzz/yaml.FuzzSigYaml with libFuzzer seems to periodically run into timeouts. I haven't been able to reproduce this with Go 1.13, so this seems like a regression.
Tentatively marking release blocker, since this seems like it could be a subtle runtime and/or cgo regression.
I'm going to try bisecting to see if I can figure out what commit caused the problem.
Metadata
Metadata
Assignees
Labels
Type
Projects
Relationships
Development
No branches or pull requests
Activity
mdempsky commentedon Oct 31, 2019
I ran a bunch of times at master to see how long it takes to hang. Here's a record of the number of fuzz iterations before hang:
118784
274821
97466
41172
705368
374579
773952
135061
4180
34428
520221
199902
241640
306002
25566
160724
359761
205918
26332
518383
99168
316188
I would guess it's exponentially distributed?
Running until 2M iterations seems like a safe threshold for bisection.
Edit: Using a 1.5M iteration threshold because I'm impatient, and estimating the exponential distribution based on the above samples suggests to me the chance of 1.5M iterations without failure is 0.25%. (But I'm no statistician.)
mdempsky commentedon Oct 31, 2019
/cc @aclements
Still bisecting, but it seems to be narrowing down on one of the CLs related to #10958 / #24543.
ianlancetaylor commentedon Oct 31, 2019
Honestly I wouldn't be surprised if the culprit CL turns out to be https://golang.org/cl/171883, which turns on the new timers. I suspect there may be some case we are failing to handle.
It would be nice (for me) to hear that it was a different CL, though.
mdempsky commentedon Oct 31, 2019
@ianlancetaylor You look safe for now. My current bisect interval is from 316fb95 (good) to 6058603 (bad), and that CL looks like it was merged outside of that window.
mdempsky commentedon Oct 31, 2019
Git bisect identified 3f83411.
I double checked to make sure it does hang at that CL; and I'm up to 2M iterations without hanging for double checking the previous commit (which assuming I didn't mess up "git bisect bad" / "git bisect good" earlier should be at least 3.5M iterations in total).
ianlancetaylor commentedon Oct 31, 2019
CC @aclements @cherrymui
mdempsky commentedon Nov 1, 2019
I've been able to minimize the failure below, though it seems to reproduce the issue somewhat less reliably than actually using libfuzzer.
The program should run forever printing increasing numbers, but occasionally it hangs.
(Edit: See better repro below.)
mdempsky commentedon Nov 1, 2019
Simpler, much more reliable repro:
mdempsky commentedon Nov 1, 2019
I'm not able to make any more progress on this. The last repro is very reliable, but I don't sufficiently understand the runtime preemption logic, and my naive debugging skills aren't sufficient here.
[-]runtime: dvyukov/go-fuzz -libfuzzer timeouts at master, but not Go 1.13[/-][+]runtime: preemption-related deadlock when calling Go from C[/+]ianlancetaylor commentedon Nov 2, 2019
I found the problem. Working on adding a test.
gopherbot commentedon Nov 2, 2019
Change https://golang.org/cl/204957 mentions this issue:
runtime: clear preemptStop in dropm