Skip to content

Commit d0d35a9

Browse files
committed
Added demo files showing pprof measurement limitations.
1 parent fc4a93b commit d0d35a9

File tree

4 files changed

+490
-0
lines changed

4 files changed

+490
-0
lines changed

README.md

Lines changed: 176 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,178 @@
11
# GoPprofDemo
22
Demo of low precision and accuracy of Go pprof
3+
4+
goroutine.go is a parallel code and serial.go is a serial code.
5+
6+
run.sh automates running these two programs.
7+
8+
You can run manually in the following way:
9+
10+
## Parallel program:
11+
goroutine.go has ten exactly the same goroutines: f1-f10 and I use pprof to collect the CPU profiles via. the default OS timers.
12+
Run the following command several times and notice that each time, pprof reports a different amount of time spent in each of the ten routines.
13+
14+
`go run goroutine.go && go tool pprof -top goroutine_prof`
15+
16+
Expectation: each function (f1-10) should be attributed with exactly (or almost exactly) 10% of the execution time on each run.
17+
18+
19+
## Serial program:
20+
serial.go has ten functions (A_expect_1_82 - J_expect_18_18). The function A_expect_1_82 is expected to consume 1.82% of the total execution time and J_expect_18_18 is expected to consume 18.18% of the execution time, and so on. The code is serial and there is complete data dependence between each function and each iteration of the loop in the functions to avoid any hardware-level optimizations.
21+
Run the following command several times.
22+
23+
`go run serial.go && go tool pprof -top serial_prof`
24+
25+
Expectation: the time attribution should roughly follow the distribution shown below in each run.
26+
```
27+
28+
FUNCTION NAME EXPECTED RELATIVE TIME
29+
30+
A_expect_1_82 1.82%
31+
B_expect_3_64 3.64%
32+
C_expect_5_46 5.46%
33+
D_expect_7_27 7.27%
34+
E_expect_9_09 9.09%
35+
F_expect_10_91 10.91%
36+
G_expect_12_73 12.73%
37+
H_expect_14_546 14.546%
38+
I_expect_16_36 16.36%
39+
J_expect_18_18 18.18%
40+
```
41+
42+
## Details
43+
The experiments demonstrate that pprof CPU profiles lack accuracy (closeness to the ground truth) and precision (repeatability across different runs). The tests are highly predictive in nature and involve little runtime overheads (allocation, GC, system call, etc.). They are carefully designed so that we can compare pprof CPU profiles against our expectations. The evaluation shows a wide difference from expectation. One should not confuse this to be a runtime issue; the issue is with the use of OS timers used for sampling; OS timers are coarse-grained and have a high skid.
44+
45+
## Observation
46+
Run #1, #2, and #3 respectively show the pprof -top output of `goroutine.go`.
47+
You will notice in a single run (say Run #1), f1-f10 have a wide variance in time attributed to them; the expectation is that each of them gets 10% execution time. There is up to 6x difference in the function with the highest execution time (main.f7, 4210ms, in run1) vs. the function with the lowest execution time (main.f9, 700ms, in run1).
48+
This shows poor accuracy of pprof timer-based profiles.
49+
50+
Furthermore, the time attributed to a function widely varies from run to run. Notice how the top-10 ordering changes. The expectation is that the measurements remain the same from run to run. This shows a poor precision of pprof timer-based profiles.
51+
52+
### Run 1:
53+
```
54+
File: goroutine
55+
Type: cpu
56+
Time: Jan 27, 2020 at 3:45pm (PST)
57+
Duration: 6.70s, Total samples = 18060ms (269.37%)
58+
Showing nodes accounting for 18060ms, 100% of 18060ms total
59+
flat flat% sum% cum cum%
60+
4210ms 23.31% 23.31% 4210ms 23.31% main.f7
61+
2610ms 14.45% 37.76% 2610ms 14.45% main.f2
62+
2010ms 11.13% 48.89% 2010ms 11.13% main.f6
63+
1810ms 10.02% 58.91% 1810ms 10.02% main.f10
64+
1780ms 9.86% 68.77% 1780ms 9.86% main.f3
65+
1410ms 7.81% 76.58% 1410ms 7.81% main.f1
66+
1310ms 7.25% 83.83% 1310ms 7.25% main.f4
67+
1110ms 6.15% 89.98% 1110ms 6.15% main.f5
68+
1110ms 6.15% 96.12% 1110ms 6.15% main.f8
69+
700ms 3.88% 100% 700ms 3.88% main.f9
70+
```
71+
72+
### Run 2:
73+
```
74+
File: goroutine
75+
Type: cpu
76+
Time: Jan 27, 2020 at 3:45pm (PST)
77+
Duration: 6.71s, Total samples = 17400ms (259.39%)
78+
Showing nodes accounting for 17400ms, 100% of 17400ms total
79+
flat flat% sum% cum cum%
80+
3250ms 18.68% 18.68% 3250ms 18.68% main.f2
81+
2180ms 12.53% 31.21% 2180ms 12.53% main.f9
82+
2100ms 12.07% 43.28% 2100ms 12.07% main.f1
83+
1770ms 10.17% 53.45% 1770ms 10.17% main.f6
84+
1700ms 9.77% 63.22% 1700ms 9.77% main.f5
85+
1550ms 8.91% 72.13% 1550ms 8.91% main.f4
86+
1500ms 8.62% 80.75% 1500ms 8.62% main.f8
87+
1440ms 8.28% 89.02% 1440ms 8.28% main.f3
88+
1390ms 7.99% 97.01% 1390ms 7.99% main.f10
89+
520ms 2.99% 100% 520ms 2.99% main.f7
90+
```
91+
92+
93+
### Run 3:
94+
```
95+
File: goroutine
96+
Type: cpu
97+
Time: Jan 27, 2020 at 3:48pm (PST)
98+
Duration: 6.71s, Total samples = 17.73s (264.31%)
99+
Showing nodes accounting for 17.73s, 100% of 17.73s total
100+
flat flat% sum% cum cum%
101+
3.74s 21.09% 21.09% 3.74s 21.09% main.f7
102+
2.08s 11.73% 32.83% 2.08s 11.73% main.f9
103+
2.05s 11.56% 44.39% 2.05s 11.56% main.f2
104+
1.85s 10.43% 54.82% 1.85s 10.43% main.f10
105+
1.78s 10.04% 64.86% 1.78s 10.04% main.f1
106+
1.43s 8.07% 72.93% 1.43s 8.07% main.f3
107+
1.42s 8.01% 80.94% 1.42s 8.01% main.f8
108+
1.18s 6.66% 87.59% 1.18s 6.66% main.f6
109+
1.17s 6.60% 94.19% 1.17s 6.60% main.f5
110+
1.03s 5.81% 100% 1.03s 5.81% main.f4
111+
```
112+
113+
114+
115+
The output for `go run serial.go && go tool pprof -top serial_prof` for three runs is shown below.
116+
Comparing the flat% (or cum%) against the expected percentage for each function shows a large difference. For example, in Run 31, `main.H_expect_14_546` is expected to have 14.546% execution time, whereas it is attributed 25% execution time. Furthermore, run to run, there is lack of precision, for example `main.I_expect_16_36` is attributed 6.25% (20ms) execution time in Run #1, whereas it is attributed 21.88% (70ms) execution time in Run#2
117+
118+
119+
### Run 1:
120+
```
121+
File: serial
122+
Type: cpu
123+
Time: Jan 27, 2020 at 1:42pm (PST)
124+
Duration: 501.51ms, Total samples = 320ms (63.81%)
125+
Showing nodes accounting for 320ms, 100% of 320ms total
126+
flat flat% sum% cum cum%
127+
80ms 25.00% 25.00% 80ms 25.00% main.H_expect_14_546
128+
80ms 25.00% 50.00% 80ms 25.00% main.J_expect_18_18
129+
60ms 18.75% 68.75% 60ms 18.75% main.G_expect_12_73
130+
20ms 6.25% 75.00% 20ms 6.25% main.B_expect_3_64
131+
20ms 6.25% 81.25% 20ms 6.25% main.D_expect_7_27
132+
20ms 6.25% 87.50% 20ms 6.25% main.F_expect_10_91
133+
20ms 6.25% 93.75% 20ms 6.25% main.I_expect_16_36
134+
10ms 3.12% 96.88% 10ms 3.12% main.A_expect_1_82
135+
10ms 3.12% 100% 10ms 3.12% main.C_expect_5_46
136+
0 0% 100% 320ms 100% main.main
137+
0 0% 100% 320ms 100% runtime.main
138+
```
139+
140+
### Run 2:
141+
```
142+
File: serial
143+
Type: cpu
144+
Time: Jan 27, 2020 at 1:44pm (PST)
145+
Duration: 501.31ms, Total samples = 320ms (63.83%)
146+
Showing nodes accounting for 320ms, 100% of 320ms total
147+
flat flat% sum% cum cum%
148+
70ms 21.88% 21.88% 70ms 21.88% main.I_expect_16_36
149+
50ms 15.62% 37.50% 50ms 15.62% main.J_expect_18_18
150+
40ms 12.50% 50.00% 40ms 12.50% main.E_expect_9_09
151+
40ms 12.50% 62.50% 40ms 12.50% main.F_expect_10_91
152+
40ms 12.50% 75.00% 40ms 12.50% main.H_expect_14_546
153+
30ms 9.38% 84.38% 30ms 9.38% main.D_expect_7_27
154+
20ms 6.25% 90.62% 20ms 6.25% main.B_expect_3_64
155+
20ms 6.25% 96.88% 20ms 6.25% main.G_expect_12_73
156+
10ms 3.12% 100% 10ms 3.12% main.C_expect_5_46
157+
0 0% 100% 320ms 100% main.main
158+
0 0% 100% 320ms 100% runtime.main
159+
```
160+
161+
### Run 3:
162+
```
163+
File: serial
164+
Type: cpu
165+
Time: Jan 27, 2020 at 1:45pm (PST)
166+
Duration: 501.39ms, Total samples = 310ms (61.83%)
167+
Showing nodes accounting for 310ms, 100% of 310ms total
168+
flat flat% sum% cum cum%
169+
110ms 35.48% 35.48% 110ms 35.48% main.J_expect_18_18
170+
70ms 22.58% 58.06% 70ms 22.58% main.G_expect_12_73
171+
60ms 19.35% 77.42% 60ms 19.35% main.F_expect_10_91
172+
30ms 9.68% 87.10% 30ms 9.68% main.I_expect_16_36
173+
20ms 6.45% 93.55% 20ms 6.45% main.H_expect_14_546
174+
10ms 3.23% 96.77% 10ms 3.23% main.B_expect_3_64
175+
10ms 3.23% 100% 10ms 3.23% main.C_expect_5_46
176+
0 0% 100% 310ms 100% main.main
177+
0 0% 100% 310ms 100% runtime.main
178+
```

goroutine.go

Lines changed: 187 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,187 @@
1+
package main
2+
3+
import (
4+
"fmt"
5+
"log"
6+
"os"
7+
"runtime/pprof"
8+
"sync"
9+
_ "time"
10+
)
11+
12+
var wg sync.WaitGroup
13+
14+
func f1() {
15+
defer wg.Done()
16+
17+
var sum int
18+
for i := 0; i < 500000000; i++ {
19+
sum -= i / 2
20+
sum *= i
21+
sum /= i/3 + 1
22+
sum -= i / 4
23+
}
24+
25+
fmt.Println(sum)
26+
}
27+
28+
func f2() {
29+
defer wg.Done()
30+
31+
var sum int
32+
for i := 0; i < 500000000; i++ {
33+
sum -= i / 2
34+
sum *= i
35+
sum /= i/3 + 1
36+
sum -= i / 4
37+
}
38+
39+
fmt.Println(sum)
40+
}
41+
42+
func f3() {
43+
defer wg.Done()
44+
45+
var sum int
46+
for i := 0; i < 500000000; i++ {
47+
sum -= i / 2
48+
sum *= i
49+
sum /= i/3 + 1
50+
sum -= i / 4
51+
}
52+
53+
fmt.Println(sum)
54+
}
55+
56+
func f4() {
57+
defer wg.Done()
58+
59+
var sum int
60+
for i := 0; i < 500000000; i++ {
61+
sum -= i / 2
62+
sum *= i
63+
sum /= i/3 + 1
64+
sum -= i / 4
65+
}
66+
67+
fmt.Println(sum)
68+
}
69+
70+
func f5() {
71+
defer wg.Done()
72+
73+
var sum int
74+
for i := 0; i < 500000000; i++ {
75+
sum -= i / 2
76+
sum *= i
77+
sum /= i/3 + 1
78+
sum -= i / 4
79+
}
80+
81+
fmt.Println(sum)
82+
}
83+
84+
func f6() {
85+
defer wg.Done()
86+
87+
var sum int
88+
for i := 0; i < 500000000; i++ {
89+
sum -= i / 2
90+
sum *= i
91+
sum /= i/3 + 1
92+
sum -= i / 4
93+
}
94+
95+
fmt.Println(sum)
96+
}
97+
98+
func f7() {
99+
defer wg.Done()
100+
101+
var sum int
102+
for i := 0; i < 500000000; i++ {
103+
sum -= i / 2
104+
sum *= i
105+
sum /= i/3 + 1
106+
sum -= i / 4
107+
}
108+
109+
fmt.Println(sum)
110+
}
111+
112+
func f8() {
113+
defer wg.Done()
114+
115+
var sum int
116+
for i := 0; i < 500000000; i++ {
117+
sum -= i / 2
118+
sum *= i
119+
sum /= i/3 + 1
120+
sum -= i / 4
121+
}
122+
123+
fmt.Println(sum)
124+
}
125+
126+
func f9() {
127+
defer wg.Done()
128+
129+
var sum int
130+
for i := 0; i < 500000000; i++ {
131+
sum -= i / 2
132+
sum *= i
133+
sum /= i/3 + 1
134+
sum -= i / 4
135+
}
136+
137+
fmt.Println(sum)
138+
}
139+
140+
func f10() {
141+
defer wg.Done()
142+
143+
var sum int
144+
for i := 0; i < 500000000; i++ {
145+
sum -= i / 2
146+
sum *= i
147+
sum /= i/3 + 1
148+
sum -= i / 4
149+
}
150+
151+
fmt.Println(sum)
152+
}
153+
154+
func run() error {
155+
timerFile, err := os.Create("goroutine_prof")
156+
if err != nil {
157+
return err
158+
}
159+
defer timerFile.Close()
160+
161+
if err := pprof.StartCPUProfile(timerFile); err != nil {
162+
return err
163+
}
164+
165+
defer pprof.StopCPUProfile()
166+
wg.Add(10)
167+
defer wg.Wait()
168+
169+
go f1()
170+
go f2()
171+
go f3()
172+
go f4()
173+
go f5()
174+
go f6()
175+
go f7()
176+
go f8()
177+
go f9()
178+
go f10()
179+
180+
return nil
181+
}
182+
183+
func main() {
184+
if err := run(); err != nil {
185+
log.Fatal(err)
186+
}
187+
}

run.sh

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
go run goroutine.go && go tool pprof -top goroutine_prof
2+
go run goroutine.go && go tool pprof -top goroutine_prof
3+
go run goroutine.go && go tool pprof -top goroutine_prof
4+
5+
go run serial.go && go tool pprof -top serial_prof
6+
go run serial.go && go tool pprof -top serial_prof
7+
go run serial.go && go tool pprof -top serial_prof

0 commit comments

Comments
 (0)