-
Notifications
You must be signed in to change notification settings - Fork 18.1k
cmd/go: performance regression after F_FULLFSYNC fcntl change on Darwin #27415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
If I'm reading this correctly the regression is in tip but not in 1.11? You have to bisect between 227 commits, which will take 8 steps... at ~2 minutes at step, it's less than 20 minutes. It seems doable to me. |
I'm seeing an even more drastic slowdown without using -race:
|
Okay, I bisected to find be10ad7 ( The test suite in question is exercising some concurrent operations backed by a coreos/bbolt database. The commit message details look like the commit was a necessary change:
Maybe this is just a necessary regression for file correctness on Darwin? Here's the tail of my bisect run, if that's useful information:
|
go test -race
at tip
Thank you for the investigation and report @mark-rushakoff! In deed, the regression is present but for consistency and correctness to avoid data corruption, the change seems inevitable. Could we please compare the speeds and throughput on Linux vs on OS X vs Windows? |
Unfortunately, I don't have access to a Linux or Windows machine to compare results. Fortunately, I've created a repository with a smaller reproduction case so that someone with Linux or Windows can easily help: https://github.com/mark-rushakoff/goissue27415 The case there still depends on coreos/bbolt, which is included in vendor. I tried calling The benchmark in the repository is this: package goissue27415_test
import (
"encoding/binary"
"io/ioutil"
"os"
"path/filepath"
"testing"
bolt "github.com/coreos/bbolt"
)
func BenchmarkBboltWrites(b *testing.B) {
dir, err := ioutil.TempDir("", "")
if err != nil {
b.Fatal(err)
}
path := filepath.Join(dir, "bolt.db")
defer os.Remove(path)
db, err := bolt.Open(path, 0644, nil)
if err != nil {
b.Fatal(err)
}
defer db.Close()
var buf [8]byte
b.ResetTimer()
for i := 0; i < b.N; i++ {
binary.BigEndian.PutUint64(buf[:], uint64(i))
if err := db.Update(func(tx *bolt.Tx) error {
// Create a bucket.
bucket, err := tx.CreateBucketIfNotExists([]byte("numbers"))
if err != nil {
return err
}
if err := bucket.Put(buf[:], []byte("ok")); err != nil {
return err
}
return nil
}); err != nil {
b.Error(err)
}
}
} And by comparing
(Edit: updated code and benchstat for simpler test case) |
At @mark-rushakoff I was able to toggle remote tests/benchmarks on both my Linux PowerEdge server and Macbook Pro. However, as you'll see the numbers of cores vary with results. The results are just to compare the range of tolerance/overhead across the various OSes LinuxBenchmarkBboltWrites/10-4 10 102957964 ns/op
BenchmarkBboltWrites/10-4 20 71143284 ns/op
BenchmarkBboltWrites/10-4 20 107467903 ns/op
BenchmarkBboltWrites/10-4 20 56733746 ns/op
BenchmarkBboltWrites/10-4 20 56198276 ns/op
BenchmarkBboltWrites/10-4 20 97683423 ns/op
BenchmarkBboltWrites/10-4 50 92507200 ns/op
BenchmarkBboltWrites/10-4 20 73986142 ns/op
BenchmarkBboltWrites/10-4 20 85993264 ns/op
BenchmarkBboltWrites/10-4 10 142830820 ns/op
BenchmarkBboltWrites/100-4 20 89995302 ns/op
BenchmarkBboltWrites/100-4 10 104773303 ns/op
BenchmarkBboltWrites/100-4 20 110999693 ns/op
BenchmarkBboltWrites/100-4 20 70266489 ns/op
BenchmarkBboltWrites/100-4 20 82401870 ns/op
BenchmarkBboltWrites/100-4 20 73706927 ns/op
BenchmarkBboltWrites/100-4 20 78203692 ns/op
BenchmarkBboltWrites/100-4 20 73272481 ns/op
BenchmarkBboltWrites/100-4 20 83584688 ns/op
BenchmarkBboltWrites/100-4 20 60500892 ns/op
BenchmarkBboltWrites/1000-4 20 80169125 ns/op
BenchmarkBboltWrites/1000-4 20 66684793 ns/op
BenchmarkBboltWrites/1000-4 20 87746197 ns/op
BenchmarkBboltWrites/1000-4 5 203727481 ns/op
BenchmarkBboltWrites/1000-4 20 87576367 ns/op
BenchmarkBboltWrites/1000-4 10 103791336 ns/op
BenchmarkBboltWrites/1000-4 10 104779844 ns/op
BenchmarkBboltWrites/1000-4 20 100336926 ns/op
BenchmarkBboltWrites/1000-4 20 88736597 ns/op
BenchmarkBboltWrites/1000-4 20 71800534 ns/op OS X Macbook ProForcibly setting GOMAXPROCS=4 go test -bench=. -count=10 BenchmarkBboltWrites/10-4 100 12040480 ns/op
BenchmarkBboltWrites/10-4 100 12081625 ns/op
BenchmarkBboltWrites/10-4 100 12340859 ns/op
BenchmarkBboltWrites/10-4 100 12650864 ns/op
BenchmarkBboltWrites/10-4 100 12846532 ns/op
BenchmarkBboltWrites/10-4 100 12731753 ns/op
BenchmarkBboltWrites/10-4 100 12683881 ns/op
BenchmarkBboltWrites/10-4 100 12340144 ns/op
BenchmarkBboltWrites/10-4 100 11625433 ns/op
BenchmarkBboltWrites/10-4 100 12977623 ns/op
BenchmarkBboltWrites/100-4 100 12117082 ns/op
BenchmarkBboltWrites/100-4 100 12740611 ns/op
BenchmarkBboltWrites/100-4 100 12934473 ns/op
BenchmarkBboltWrites/100-4 100 12111835 ns/op
BenchmarkBboltWrites/100-4 100 12959604 ns/op
BenchmarkBboltWrites/100-4 100 12464744 ns/op
BenchmarkBboltWrites/100-4 100 12544378 ns/op
BenchmarkBboltWrites/100-4 100 12591169 ns/op
BenchmarkBboltWrites/100-4 100 12499221 ns/op
BenchmarkBboltWrites/100-4 100 12667389 ns/op
BenchmarkBboltWrites/1000-4 100 12749705 ns/op
BenchmarkBboltWrites/1000-4 100 12642107 ns/op
BenchmarkBboltWrites/1000-4 100 12252822 ns/op
BenchmarkBboltWrites/1000-4 100 11995750 ns/op
BenchmarkBboltWrites/1000-4 100 12517070 ns/op
BenchmarkBboltWrites/1000-4 100 12207655 ns/op
BenchmarkBboltWrites/1000-4 100 12940175 ns/op
BenchmarkBboltWrites/1000-4 100 12508852 ns/op
BenchmarkBboltWrites/1000-4 100 12423794 ns/op
BenchmarkBboltWrites/1000-4 100 13028134 ns/op ResultsThe results are quite interesting, on OS X even with that change, those writes are apparently ~85% faster than on Linux benchstat linux.txt osx.txt
name old time/op new time/op delta
BboltWrites/10-4 88.8ms ±61% 12.4ms ± 6% -85.99% (p=0.000 n=10+10)
BboltWrites/100-4 82.8ms ±34% 12.6ms ± 4% -84.82% (p=0.000 n=10+10)
BboltWrites/1000-4 88.0ms ±24% 12.5ms ± 4% -85.76% (p=0.000 n=9+10) Unfortunately I don't have a WIndows machine but I'll kindly page @alexbrainman to help with those benchmarks if available However, I've also got a state of the art Macbook Pro with Flash storage while my Linux machine has a hard drive -- this might be an explanation for the discrepancy: the comparison of apples to oranges, and since that change doesn't even affect any other OS I think we just need to make a judgement call on whether we can tolerate data corruption for speed. |
On Darwin it seems clearly correct to use |
That's fair @ianlancetaylor. I assume this performance change will surprise many other users; maybe this issue should be repurposed to track that the |
I did not bother with the benchmarks, because https://go-review.googlesource.com/c/go/+/130676 did not change any windows code. Please, let me know, if I am wrong. Alex |
I added a RELNOTES tag to the CL, so we will add something to the release notes. Closing this issue. |
@alexbrainman no, we meant to say just raw performance numbers regardless of the operating system, to compare with what OS X was seeing, for ballparking how OS X would be suffering. Anyways, no need for it anymore :)
Thank you Ian! |
I'm seeing a reliable ~3x slowdown with
go test -race
on a particular set of tests doing a lot of concurrent operations in several goroutines. I haven't yet spent the time to reduce the test case or to identify the go commit introducing the regression.Tried with go 1.11:
and go tip:
To reproduce:
(That branch will likely get rebased onto master in the near future, so I've got a fork at mark-rushakoff/platform@a8269bc with the commit as it is today.)
I'm consistently seeing times like:
For a small test suite that doesn't exercise concurrency, the slowdown is negligible:
I might spend some time bisecting go to identify where the slowdown was introduced, or I might take a look at trace output, but I wanted to file this issue in case anyone else had time and interest to look at it before I do.
The text was updated successfully, but these errors were encountered: