Skip to content

runtime: OS thread appears to be re-used despite never releasing runtime.LockOSThread() #28979

Closed
@sipsma

Description

@sipsma

What did you do?

In Go 1.11.2, spawned a goroutine, called runtime.LockOSThread() and entered a new mount namespace via syscall.Unshare. runtime.UnlockOSThread() was never called, which according to the docs should mean that the locked thread is never re-used.

However, it appears that a separate goroutine started at the beginning of the process (which never calls unshare or does anything with mounts) will after some time be executing in a mount namespace that was created by the independent goroutine mentioned above (which is not expected to be leaking OS threads). This seems to indicate that the OS thread is somehow leaking for re-use by other goroutines despite never calling runtime.UnlockOSThread().

This behavior was originally observed in a larger piece of software, but I was able to reproduce it using the code below. The code:

  • Creates two files
    • /tmp/test-go-thread-leak/source, with file contents "source"
    • /tmp/test-go-thread-leak/dest, with file contents "dest"
  • From main, starts a goroutine that checks every second that the contents of /tmp/test-go-thread-leak/dest have not changed. If the contents changed from the expected value "dest" to "source", it panics, crashing the program.
  • From main, runs in an infinite loop a function that spins off a separate goroutine that:
    • Calls runtime.LockOSThread()
    • Enters a new mount namespace with Unshare
    • Bind mounts /tmp/test-go-thread-leak/source to /tmp/test-go-thread-leak/dest. This means that any threads/processes in the newly created mount namespace will now see the contents of "dest" have changed to be "source".

The goroutine that checks the contents of /tmp/test-go-thread-leak/dest should never read the contents as "source" unless it has suddenly begun executing on an OS thread using one of the mount namespaces created by the other goroutine. However, after 1-10s of execution, the program will crash due to the goroutine reading "source" instead of destination, indicating the OS thread leaked.

Code was compiled with just go build main.go and then run with sudo ./main (sudo needed to make the unshare call).

package main

import (
    "fmt"
    "io/ioutil"
    "os"
    "path/filepath"
    "runtime"
    "syscall"
    "time"
)

var (
    // Files created will be under this tmpRoot.
    // Any existing path at tmpRoot will be deleted
    // on start.
    tmpRoot = "/tmp/test-go-thread-leak"

    sourcePath = filepath.Join(tmpRoot, "source")
    sourceContents = "source"

    destPath = filepath.Join(tmpRoot, "dest")
    destContents = "dest"
)

func init() {
    // Set up the directories used to test and print some debugging information
    if err := os.RemoveAll(tmpRoot); err != nil {
        panic(err.Error())
    }

    if err := os.Mkdir(tmpRoot, 0777); err != nil {
        panic(err.Error())
    }

    if err := ioutil.WriteFile(sourcePath, []byte(sourceContents), 0777); err != nil {
        panic(err.Error())
    }

    if err := ioutil.WriteFile(destPath, []byte(destContents), 0777); err != nil {
        panic(err.Error())
    }

    fmt.Printf("Running on Go version: %s\n", runtime.Version())
    fmt.Printf("Initial mount namespace: %s\n\n", currentMountNsName())
}

// Returns the name of the mount namespace used by the thread of the calling go routine
func currentMountNsName() string {
    mountNsName, err := os.Readlink("/proc/thread-self/ns/mnt")
    if err != nil {
        panic(fmt.Sprintf("failed to get mount ns name: %v", err))
    }

    return mountNsName
}

// The function that appears to cause unexpected leaks of threads.
// It creates a separate go routine, locks to its OS thread, enters
// a new mount namespace and creates a bind mount in that new namespace.
//
// runtime.UnlockOSThread is never called, so it's expected that the OS
// thread on which this go routine executes is never re-used by other
// go-routines. That appears to not be the case though.
func tryLeak() {
    done := make(chan interface{})
    go func() {
        defer close(done)

        runtime.LockOSThread()

        // create the new namespace after locking to the os thread
        err := syscall.Unshare(syscall.CLONE_NEWNS)
        if err != nil {
            panic(err.Error())
        }

        // make sure mounts don't propagate to the previous mount namespace
        err = syscall.Mount("", "/", "", syscall.MS_REC|syscall.MS_PRIVATE, "")
        if err != nil {
            panic(err.Error())
        }

        // Bind mount the source file to the dest file. This means any thread in
        // this mount namespace will now see that "destPath" has the same
        // contents as "sourcePath"
        err = syscall.Mount(sourcePath, destPath, "", syscall.MS_BIND|syscall.MS_PRIVATE, "")
        if err != nil {
            panic(err.Error())
        }
    }()

    <-done
}

func main() {
    // This go routine checks in a loop that the contents of "destPath" have not
    // changed to the contents of "sourcePath". If the contents did change, that
    // means this go routine is now executing on an OS thread created by the
    // tryLeak function above (even though tryLeak never calls UnlockOSThread
    // and should thus never leak threads for re-use).
    go func() {
        ticker := time.Tick(time.Second)
        for range ticker {
            runtime.LockOSThread()
            readDestContents, err := ioutil.ReadFile(destPath)
            if err != nil {
                panic(err.Error())
            }

            if string(readDestContents) == sourceContents {
                panic(fmt.Sprintf("unexpectedly able to view bind mounted file. current mount namespace: %s", currentMountNsName()))
            } else {
                fmt.Printf("found file contents: %s\n", readDestContents)
            }
            runtime.UnlockOSThread()
        }
    }()

    // Try leaking an os thread every 50ms
    ticker := time.Tick(50 * time.Millisecond)
    for range ticker {
        tryLeak()
    }
}

What did you expect to see?

That the program executed indefinitely, never crashing due to the a goroutine unexpectedly executing in a mount namespace created by a different goroutine locked to its os thread.

What did you see instead?

A mount namespace leaks to the separate go routine, causing the program to crash. The time it takes for this to occur is variable but between 1 and 10 seconds. One example:

Running on Go version: go1.11.2
Initial mount namespace: mnt:[4026531840]

found file contents: dest
found file contents: dest
found file contents: dest
panic: unexpectedly able to view bind mounted file. current mount namespace: mnt:[4026532292]

goroutine 18 [running, locked to thread]:
main.main.func1()
        /home/sipsma/tmp/goleak/main.go:112 +0x31b
created by main.main
        /home/sipsma/tmp/goleak/main.go:102 +0x39

System details

go version go1.11.2 linux/amd64
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH=""
GOPROXY=""
GORACE=""
GOTMPDIR=""
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
GOROOT/bin/go version: go version go1.11.2 linux/amd64
GOROOT/bin/go tool compile -V: compile version go1.11.2
uname -sr: Linux 4.9.124-0.1.ac.198.71.329.metal1.x86_64
/lib64/libc.so.6: GNU C Library stable release version 2.12, by Roland McGrath et al.
gdb --version: GNU gdb (GDB) Amazon Linux (7.2-50.11.amzn1)

Activity

sipsma

sipsma commented on Nov 28, 2018

@sipsma
Author

Additionally, it seems that if I make a seemingly unrelated change, the issue is no longer reproducible. The change is to just remove the done channel from tryLeak, i.e. use the same code as above but with tryLeak written as:

  func tryLeak() {
      go func() {
          runtime.LockOSThread()

          // create the new namespace after locking to the os thread
          err := syscall.Unshare(syscall.CLONE_NEWNS)
          if err != nil {
              panic(err.Error())
          }

          // make sure mounts don't propagate to the previous mount namespace
          err = syscall.Mount("", "/", "", syscall.MS_REC|syscall.MS_PRIVATE, "")
          if err != nil {
              panic(err.Error())
          }

          // Bind mount the source file to the dest file. This means any thread in
          // this mount namespace will now see that "destPath" has the same
          // contents as "sourcePath"
          err = syscall.Mount(sourcePath, destPath, "", syscall.MS_BIND|syscall.MS_PRIVATE, "")
          if err != nil {
              panic(err.Error())
          }
      }()
  }

I obviously can't know whether removing the channel changes the behavior due to a timing difference or whether there could be a bug related to a locked go routine sharing a channel with an unlocked go routine, but seems worth mentioning.

changed the title [-]OS thread appears to be re-used despite never releasing runtime.LockOSThread()[/-] [+]runtime: OS thread appears to be re-used despite never releasing runtime.LockOSThread()[/+] on Nov 28, 2018
added
NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.
on Nov 28, 2018
added this to the Go1.12 milestone on Nov 28, 2018
ianlancetaylor

ianlancetaylor commented on Nov 28, 2018

@ianlancetaylor
Contributor
sipsma

sipsma commented on Nov 28, 2018

@sipsma
Author

If helpful, I got some strace output showing a pid, 22828, that did unshare+mount calls followed up by a clone(2) to make a new thread (right before it calls _exit()). The thread it spawned with that clone, 22833, was the one that eventually found itself in the wrong mount namespace and panicked.

    [pid 22828] <... futex resumed> )       = 0
    [pid 22820] clock_gettime(CLOCK_MONOTONIC,  <unfinished ...>
    [pid 22821] <... nanosleep resumed> NULL) = 0
    [pid 22828] unshare(CLONE_NEWNS <unfinished ...>
    [pid 22820] <... clock_gettime resumed> {11367, 575113001}) = 0
    [pid 22821] clock_gettime(CLOCK_MONOTONIC,  <unfinished ...>
    [pid 22820] futex(0x571ce0, FUTEX_WAIT_PRIVATE, 0, {0, 52273938} <unfinished ...>
    [pid 22821] <... clock_gettime resumed> {11367, 575147924}) = 0
    [pid 22828] <... unshare resumed> )     = 0
    [pid 22832] <... nanosleep resumed> NULL) = 0
    [pid 22828] mount("", "/", "", MS_REC|MS_PRIVATE, NULL <unfinished ...>
    [pid 22821] nanosleep({0, 20000},  <unfinished ...>
    [pid 22832] futex(0xc000033640, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
    [pid 22828] <... mount resumed> )       = 0
    [pid 22828] mount("/tmp/test-go-thread-leak/source", "/tmp/test-go-thread-leak/dest", 0xc00007a06b, MS_BIND|MS_PRIVATE, NULL) = 0
    [pid 22828] futex(0xc000033640, FUTEX_WAKE_PRIVATE, 1) = 1
    [pid 22832] <... futex resumed> )       = 0
    [pid 22832] nanosleep({0, 3000},  <unfinished ...>
    [pid 22828] rt_sigprocmask(SIG_SETMASK, ~[],  <unfinished ...>
    [pid 22821] <... nanosleep resumed> NULL) = 0
    [pid 22828] <... rt_sigprocmask resumed> NULL, 8) = 0
    [pid 22821] clock_gettime(CLOCK_MONOTONIC,  <unfinished ...>
    [pid 22828] sigaltstack({ss_sp=0, ss_flags=SS_DISABLE, ss_size=0} <unfinished ...>
    [pid 22821] <... clock_gettime resumed> {11367, 575500977}) = 0
    [pid 22828] <... sigaltstack resumed> , NULL) = 0
    [pid 22821] nanosleep({0, 20000},  <unfinished ...>
    [pid 22828] clock_gettime(CLOCK_MONOTONIC, {11367, 575600016}) = 0
    [pid 22832] <... nanosleep resumed> NULL) = 0
    [pid 22828] rt_sigprocmask(SIG_SETMASK, ~[],  <unfinished ...>
    [pid 22832] rt_sigprocmask(SIG_SETMASK, ~[],  <unfinished ...>
    [pid 22828] <... rt_sigprocmask resumed> ~[KILL STOP], 8) = 0
    [pid 22832] <... rt_sigprocmask resumed> [], 8) = 0
    [pid 22828] clone( <unfinished ...>
    [pid 22832] clone( <unfinished ...>
    [pid 22821] <... nanosleep resumed> NULL) = 0
    Process 22833 attached
    Process 22834 attached
    [pid 22828] <... clone resumed> child_stack=0xc00006c000, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM) = 22833
    [pid 22833] gettid( <unfinished ...>
    [pid 22821] clock_gettime(CLOCK_MONOTONIC,  <unfinished ...>
    [pid 22834] gettid( <unfinished ...>
    [pid 22833] <... gettid resumed> )      = 22833
    [pid 22832] <... clone resumed> child_stack=0xc00006a000, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM) = 22834
    [pid 22828] rt_sigprocmask(SIG_SETMASK, ~[KILL STOP],  <unfinished ...>
    [pid 22834] <... gettid resumed> )      = 22834
    [pid 22833] arch_prctl(ARCH_SET_FS, 0xc00005ab10 <unfinished ...>
    [pid 22832] rt_sigprocmask(SIG_SETMASK, [],  <unfinished ...>
    [pid 22828] <... rt_sigprocmask resumed> NULL, 8) = 0
    [pid 22821] <... clock_gettime resumed> {11367, 575837370}) = 0
    [pid 22834] arch_prctl(ARCH_SET_FS, 0xc000033910 <unfinished ...>
    [pid 22833] <... arch_prctl resumed> )  = 0
    [pid 22832] <... rt_sigprocmask resumed> NULL, 8) = 0
    [pid 22828] _exit(0)                    = ?
    [pid 22834] <... arch_prctl resumed> )  = 0
    [pid 22833] sigaltstack(NULL <unfinished ...>
    [pid 22821] nanosleep({0, 20000},  <unfinished ...>
    [pid 22834] sigaltstack(NULL <unfinished ...>
    [pid 22833] <... sigaltstack resumed> , {ss_sp=0, ss_flags=SS_DISABLE, ss_size=0}) = 0
    [pid 22832] futex(0xc000033640, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
    [pid 22828] +++ exited with 0 +++

... (skip ahead)

    [pid 22833] write(2, "panic: ", 7panic: )      = 7
    [pid 22833] write(2, "unexpectedly able to view bind mounted file. current mount namespace: 
ianlancetaylor

ianlancetaylor commented on Nov 28, 2018

@ianlancetaylor
Contributor

I think the problem is that goexit0 sets _g_.m.lockedExt = 0, then jumps to mstart, which calls mexit which calls handoffp which calls startm which calls newm. The code in newm does the right thing if the g is locked, but it's not, because the lock was cleared by goexit0.

added
NeedsFixThe path to resolution is known, but the work has not been done.
on Nov 28, 2018
removed
NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.
on Nov 28, 2018
ianlancetaylor

ianlancetaylor commented on Nov 28, 2018

@ianlancetaylor
Contributor

@gopherbot Please open a backport issue for 1.10 and 1.11.

gopherbot

gopherbot commented on Nov 28, 2018

@gopherbot
Contributor

Backport issue(s) opened: #28985 (for 1.10), #28986 (for 1.11).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

gopherbot

gopherbot commented on Dec 7, 2018

@gopherbot
Contributor

Change https://golang.org/cl/153078 mentions this issue: runtime: don't clear lockedExt on locked M when G exits

mknyszek

mknyszek commented on Dec 7, 2018

@mknyszek
Contributor

@ianlancetaylor, thanks for finding the problem!

I think the solution is to just keep _g_.m.lockedExt as-is when the g exits. It feels a little naive, but I looked at all of the places _g_.m.lockedExt is actually used, and it's really just checked in newm and updated as part of LockOSThread and UnlockOSThread. At the point where the g exits and it's locked to its m, that m is really just either going to exit or get wedged if it's the main thread. Also, for the main thread case, it makes sense that it should retain its state indefinitely. The only concern I had was that the m struct still lives in the free list and that it could be used again in some meaningful way, but as it turns out, it's not.

Anyway, I put up a change. I was able to reproduce the original issue and can confirm that with my change, it's fixed.

locked and limited conversation to collaborators on Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @mknyszek@aclements@ianlancetaylor@gopherbot@sipsma

        Issue actions

          runtime: OS thread appears to be re-used despite never releasing runtime.LockOSThread() · Issue #28979 · golang/go