Skip to content

runtime: preemption-related deadlock when calling Go from C #35294

Closed
@mdempsky

Description

@mdempsky

While comparing dvyukov/go-fuzz with mdempsky/go114-fuzz-build, I noticed that fuzzing k8s.io/kubernetes/test/fuzz/yaml.FuzzSigYaml with libFuzzer seems to periodically run into timeouts. I haven't been able to reproduce this with Go 1.13, so this seems like a regression.

Tentatively marking release blocker, since this seems like it could be a subtle runtime and/or cgo regression.

I'm going to try bisecting to see if I can figure out what commit caused the problem.

Activity

added this to the Go1.14 milestone on Oct 31, 2019
added
NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.
on Oct 31, 2019
mdempsky

mdempsky commented on Oct 31, 2019

@mdempsky
ContributorAuthor

I ran a bunch of times at master to see how long it takes to hang. Here's a record of the number of fuzz iterations before hang:

118784
274821
97466
41172
705368
374579
773952
135061
4180
34428
520221
199902
241640
306002
25566
160724
359761
205918
26332
518383
99168
316188

I would guess it's exponentially distributed?

Running until 2M iterations seems like a safe threshold for bisection.

Edit: Using a 1.5M iteration threshold because I'm impatient, and estimating the exponential distribution based on the above samples suggests to me the chance of 1.5M iterations without failure is 0.25%. (But I'm no statistician.)

mdempsky

mdempsky commented on Oct 31, 2019

@mdempsky
ContributorAuthor

/cc @aclements

Still bisecting, but it seems to be narrowing down on one of the CLs related to #10958 / #24543.

ianlancetaylor

ianlancetaylor commented on Oct 31, 2019

@ianlancetaylor
Contributor

Honestly I wouldn't be surprised if the culprit CL turns out to be https://golang.org/cl/171883, which turns on the new timers. I suspect there may be some case we are failing to handle.

It would be nice (for me) to hear that it was a different CL, though.

mdempsky

mdempsky commented on Oct 31, 2019

@mdempsky
ContributorAuthor

@ianlancetaylor You look safe for now. My current bisect interval is from 316fb95 (good) to 6058603 (bad), and that CL looks like it was merged outside of that window.

mdempsky

mdempsky commented on Oct 31, 2019

@mdempsky
ContributorAuthor

Git bisect identified 3f83411.

I double checked to make sure it does hang at that CL; and I'm up to 2M iterations without hanging for double checking the previous commit (which assuming I didn't mess up "git bisect bad" / "git bisect good" earlier should be at least 3.5M iterations in total).

ianlancetaylor

ianlancetaylor commented on Oct 31, 2019

@ianlancetaylor
Contributor
mdempsky

mdempsky commented on Nov 1, 2019

@mdempsky
ContributorAuthor

I've been able to minimize the failure below, though it seems to reproduce the issue somewhat less reliably than actually using libfuzzer.

The program should run forever printing increasing numbers, but occasionally it hangs.

$ go build -buildmode=c-archive -o fuzz.a fuzz.go
$ cc -o driver driver.c fuzz.a -lpthread
$ ./driver

(Edit: See better repro below.)

$ cat fuzz.go
package main

// #include <stdint.h>
import "C"

import (
	"unsafe"
)

//export LLVMFuzzerTestOneInput
func LLVMFuzzerTestOneInput(data *C.uint8_t, size C.size_t) C.int {
	s := (*[1 << 30]byte)(unsafe.Pointer(data))[:size:size]

	for _, x := range s {
		_ = make([]byte, x)
	}
	return 0
}

func main() {
}
$ cat driver.c
// +build ignore

#include <stdio.h>

#include "fuzz.h"

uint8_t data[1] = { 64 };

int main() {
  int i = 0;
  for (;;) {
    i++;
    if ((i % 4096) == 0) printf("#%d\n", i);
    LLVMFuzzerTestOneInput(data, sizeof(data));
  }
}
mdempsky

mdempsky commented on Nov 1, 2019

@mdempsky
ContributorAuthor

Simpler, much more reliable repro:

$ cat fuzz.go
package main

import "C"

var sink []byte

//export Fuzz
func Fuzz() {
	sink = make([]byte, 4096)
}

func main() {
}
$ cat driver.c
// +build ignore

#include <stdio.h>

#include "fuzz.h"

int main() {
  int i = 0;
  for (;;) {
    i++;
    if ((i % 256) == 0) printf("#%d\n", i);
    Fuzz();
  }
}
mdempsky

mdempsky commented on Nov 1, 2019

@mdempsky
ContributorAuthor

I'm not able to make any more progress on this. The last repro is very reliable, but I don't sufficiently understand the runtime preemption logic, and my naive debugging skills aren't sufficient here.

changed the title [-]runtime: dvyukov/go-fuzz -libfuzzer timeouts at master, but not Go 1.13[/-] [+]runtime: preemption-related deadlock when calling Go from C[/+] on Nov 1, 2019
ianlancetaylor

ianlancetaylor commented on Nov 2, 2019

@ianlancetaylor
Contributor

I found the problem. Working on adding a test.

gopherbot

gopherbot commented on Nov 2, 2019

@gopherbot
Contributor

Change https://golang.org/cl/204957 mentions this issue: runtime: clear preemptStop in dropm

locked and limited conversation to collaborators on Nov 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.release-blocker

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @mdempsky@aclements@ianlancetaylor@gopherbot

        Issue actions

          runtime: preemption-related deadlock when calling Go from C · Issue #35294 · golang/go