Skip to content

runtime/pprof: collecting a CPU profile takes ~200ms of wall time #63043

Open
@rhysh

Description

@rhysh

What version of Go are you using (go version)?

go1.21 version
go version go1.21.1 darwin/arm64

Does this issue reproduce with the latest release?

Yes, Go 1.21 is the latest release series.

What operating system and processor architecture are you using (go env)?

go env Output
$ go1.21 env
GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/rhys/Library/Caches/go-build'
GOENV='/Users/rhys/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/rhys/go/pkg/mod'
GONOPROXY='*'
GONOSUMDB='*'
GOOS='darwin'
GOPATH='/Users/rhys/go'
GOPRIVATE='*'
GOPROXY='direct'
GOROOT='/Users/rhys/go/version/go1.21'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='local'
GOTOOLDIR='/Users/rhys/go/version/go1.21/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.21.1'
GCCGO='gccgo'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/lf/n4ry0qv5639_3z0nhyhy3z7c0000gs/T/go-build1797745903=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

$ cat slowprof.go
package main

import (
	"flag"
	"io"
	"log"
	"runtime"
	"runtime/pprof"
)

func main() {
	cpu := flag.Bool("cpu", false, "collect a CPU profile")
	mutex := flag.Int("mutex", -1, "collect a mutex profile with specified debug level")
	block := flag.Int("block", -1, "collect a block profile with specified debug level")
	flag.Parse()

	if *cpu {
		err := pprof.StartCPUProfile(io.Discard)
		if err != nil {
			log.Fatalf("pprof.StartCPUProfile: %v", err)
		}
		pprof.StopCPUProfile()
	}

	if *mutex >= 0 {
		runtime.SetMutexProfileFraction(2)
		err := pprof.Lookup("mutex").WriteTo(io.Discard, *mutex)
		if err != nil {
			log.Fatalf(`pprof.Lookup("mutex").WriteTo: %v`, err)
		}
	}

	if *block >= 0 {
		runtime.SetBlockProfileRate(2)
		err := pprof.Lookup("block").WriteTo(io.Discard, *block)
		if err != nil {
			log.Fatalf(`pprof.Lookup("block").WriteTo: %v`, err)
		}
	}
}

$ go1.21 build -o /tmp/slowprof ./slowprof.go

$ for i in {1..10}; do time /tmp/slowprof; done                          
/tmp/slowprof  0.00s user 0.01s system 58% cpu 0.022 total
/tmp/slowprof  0.00s user 0.00s system 36% cpu 0.009 total
/tmp/slowprof  0.00s user 0.00s system 55% cpu 0.007 total
/tmp/slowprof  0.00s user 0.00s system 41% cpu 0.010 total
/tmp/slowprof  0.00s user 0.00s system 39% cpu 0.009 total
/tmp/slowprof  0.00s user 0.00s system 36% cpu 0.006 total
/tmp/slowprof  0.00s user 0.00s system 37% cpu 0.005 total
/tmp/slowprof  0.00s user 0.00s system 33% cpu 0.008 total
/tmp/slowprof  0.00s user 0.00s system 29% cpu 0.007 total
/tmp/slowprof  0.00s user 0.00s system 29% cpu 0.007 total

$ for i in {1..10}; do time /tmp/slowprof -cpu; done
/tmp/slowprof -cpu  0.00s user 0.00s system 2% cpu 0.214 total
/tmp/slowprof -cpu  0.00s user 0.00s system 2% cpu 0.213 total
/tmp/slowprof -cpu  0.00s user 0.01s system 4% cpu 0.216 total
/tmp/slowprof -cpu  0.00s user 0.00s system 3% cpu 0.215 total
/tmp/slowprof -cpu  0.00s user 0.00s system 3% cpu 0.216 total
/tmp/slowprof -cpu  0.00s user 0.00s system 3% cpu 0.215 total
/tmp/slowprof -cpu  0.00s user 0.00s system 3% cpu 0.215 total
/tmp/slowprof -cpu  0.00s user 0.00s system 3% cpu 0.216 total
/tmp/slowprof -cpu  0.00s user 0.01s system 3% cpu 0.216 total
/tmp/slowprof -cpu  0.00s user 0.00s system 2% cpu 0.213 total

What did you expect to see?

I hoped to see very little wall-time overhead as a result of enabling the CPU profile.

What did you see instead?

Starting and stopping the CPU profile adds around 200ms of wall-clock time. That's a significant slowdown when the process under test is something like "a benchmark" or "a command, such as the compiler". (It's not as noticeable when the process under test is a long-running server program.)

Furthermore, the DurationNanos field in the resulting protobuf-encoded profile includes that idle time. As a result, go tool pprof reporting that the process was less busy than it really was during the time the profile was active: the CPU profile samples represent the work the app does from the start of the call to pprof.StopCPUProfile, but the duration covers until the end of that call.

I think this relates to the 100ms sleep in runtime/pprof.profileWriter's loop at https://github.com/golang/go/blob/go1.21.1/src/runtime/pprof/pprof.go#L809

Activity

added this to the Backlog milestone on Sep 20, 2023
added
NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.
on Sep 20, 2023
mknyszek

mknyszek commented on Sep 20, 2023

@mknyszek
Contributor

This is probably at least partially a consequence of a bug fix on Darwin where accessing the profbuf is no longer blocking, and there's an explicit 100ms sleep in the profile reader loop.

Michael Pratt points out that the 100ms sleep could probably be replaced with a select on a couple channels, wherein if the profile ends sooner it wakes up, flushes whatever it can, then exits promptly.

prattmic

prattmic commented on Sep 20, 2023

@prattmic
Member

https://go.dev/cl/228886 tried to fix this, but it was reverted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.compiler/runtimeIssues related to the Go compiler and/or runtime.

    Type

    No type

    Projects

    Status

    Todo

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @rhysh@mknyszek@prattmic@gopherbot

        Issue actions

          runtime/pprof: collecting a CPU profile takes ~200ms of wall time · Issue #63043 · golang/go