-
Notifications
You must be signed in to change notification settings - Fork 18k
crypto/tls: significant drop in throughput from Go 1.6 #15713
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Probably it's because of the dynamic record sizing; https://go-review.googlesource.com/19591. What happens when you run the bench with &tls.Config{DynamicRecordSizingDisabled: true}? |
Looks like dynamic record sizing, indeed. Going from 1.6.2 to tip, I can reproduce a slowdown:
Going from 1.6 to tip with
So maybe the only things to do here are to call out dynamic record sizing in the 1.7 release notes and add this benchmark to the repo. Back to you, @rsc. |
/cc @tombergan |
DynamicRecordSizing writes the first 1MB in ~1KB chunks instead of ~16KB chunks. Some loss of throughput is expected as this trades throughput for latency early in the connection. The 1MB constant may benefit from tuning. I borrowed that constant from the Google frontend servers. Unfortunately, it doesn't seem like much thought went into that constant (or at least, the thought process was not documented anywhere) and I could not find benchmark data. I'm not even sure what the right benchmarks would be -- there are competing interests and the right value will likely depend on the underlying network speed and specifics of the application layer. |
Just to verify that the loss in throughput is limited to the start of the connection, I ran Russ's benchmark with various transfer sizes. There is noise, but the overhead generally decreases as the transfer size increases.
|
@tombergan, given that we don't even know what the right values are and the ones we do use cause a significant performance drop, maybe dynamic record sizing should default to off for now or be reverted? Maybe it should be "DynamicRecordSize int" and "DynamicRecordLimit int64", where 0 and 0 means off and the current behavior would be DynamicRecordSize=1024 and DynamicRecordLimit=1048576. |
It causes a significant performance drop when measured by throughput, but a performance increase when measured by latency, particularly for slow connections. For a simple web page served via HTTPS over a simulated 2G connection, I measured a ~500ms improvement in page load time with dynamic record sizing. There's more background in the first comment of issue #14376 and in the discussion with agl in the linked CL. I can try to cook up a more reusable latency-centric benchmark if you like. I tend to believe that most clients will use TLS via HTTPS, and for those clients, latency is more important than throughput early in the connection. I am somewhat worried about over-tuning for a throughput benchmark when many uses are not actually throughput-limited. That said, I don't feel too strongly about whether this should default to "on" or "off". If you feel strongly one way or the other, let's go with that. We tried to avoid knobs other than a single on/off bool, but I'm happy to change the bool to an int if you think that will help (I actually had an int in a prior version of the CL). I do feel strongly about reverting this -- given that it demonstrably improves latency, I think the feature should stay in some form. |
How about:
Rationale:
I added a simple latency benchmark to Russ's tlsbench_test.go. Results below from my workstation:
|
may be it is sensible to make size of "small records" growing? |
CL https://golang.org/cl/23487 mentions this issue. |
To follow-up on the above CL, I tried loading a very simple web page (16KB HTML file that imports one 16KB script in ) served over HTTP/2 on a simulated 2G connection (1200ms RTT, 200kbps throughput). Page load time results over a median of 5 runs: Max record sizing: 6.2s Web benchmarking is hard. The benchmark I chose is a "best case" for dynamic record sizing in some ways, with a very slow client and a very simple page. The reason dynamic record sizing is slightly slower after cl/23487 is that the HTML actually starts on the third data packet -- the first packet has h2 settings and the second packet has h2 response headers. If I were to propose any changes, I might propose changing the sequence to 1,1,1,2,3,4,5,etc. But, I think this is a good enough compromise between throughput/latency. We can revisit this later if it becomes a problem. /cc @rsc |
The current code, introduced after Go 1.6 to improve latency on low-bandwidth connections, sends 1 kB packets until 1 MB has been sent, and then sends 16 kB packets (the maximum record size). Unfortunately this decreases throughput for 1-16 MB responses by 20% or so. Following discussion on golang#15713, change cutoff to 128 kB sent and also grow the size allowed for successive packets: 1 kB, 2 kB, 3 kB, ..., 15 kB, 16 kB. This fixes the throughput problems: the overhead is now closer to 2%. I hope this still helps with latency but I don't have a great way to test it. At the least, it's not worse than Go 1.6. Comparing MaxPacket vs DynamicPacket benchmarks: name maxpkt time/op dyn. time/op delta Throughput/1MB-8 5.07ms ± 7% 5.21ms ± 7% +2.73% (p=0.023 n=16+16) Throughput/2MB-8 15.7ms ±201% 8.4ms ± 5% ~ (p=0.604 n=20+16) Throughput/4MB-8 14.3ms ± 1% 14.5ms ± 1% +1.53% (p=0.000 n=16+16) Throughput/8MB-8 26.6ms ± 1% 26.8ms ± 1% +0.47% (p=0.003 n=19+18) Throughput/16MB-8 51.0ms ± 1% 51.3ms ± 1% +0.47% (p=0.000 n=20+20) Throughput/32MB-8 100ms ± 1% 100ms ± 1% +0.24% (p=0.033 n=20+20) Throughput/64MB-8 197ms ± 0% 198ms ± 0% +0.56% (p=0.000 n=18+7) The small MB runs are bimodal in both cases, probably GC pauses. But there's clearly no general slowdown anymore. Fixes golang#15713. Change-Id: I5fc44680ba71812d24baac142bceee0e23f2e382 Reviewed-on: https://go-review.googlesource.com/23487 Reviewed-by: Ian Lance Taylor <[email protected]>
The current code, introduced after Go 1.6 to improve latency on low-bandwidth connections, sends 1 kB packets until 1 MB has been sent, and then sends 16 kB packets (the maximum record size). Unfortunately this decreases throughput for 1-16 MB responses by 20% or so. Following discussion on golang#15713, change cutoff to 128 kB sent and also grow the size allowed for successive packets: 1 kB, 2 kB, 3 kB, ..., 15 kB, 16 kB. This fixes the throughput problems: the overhead is now closer to 2%. I hope this still helps with latency but I don't have a great way to test it. At the least, it's not worse than Go 1.6. Comparing MaxPacket vs DynamicPacket benchmarks: name maxpkt time/op dyn. time/op delta Throughput/1MB-8 5.07ms ± 7% 5.21ms ± 7% +2.73% (p=0.023 n=16+16) Throughput/2MB-8 15.7ms ±201% 8.4ms ± 5% ~ (p=0.604 n=20+16) Throughput/4MB-8 14.3ms ± 1% 14.5ms ± 1% +1.53% (p=0.000 n=16+16) Throughput/8MB-8 26.6ms ± 1% 26.8ms ± 1% +0.47% (p=0.003 n=19+18) Throughput/16MB-8 51.0ms ± 1% 51.3ms ± 1% +0.47% (p=0.000 n=20+20) Throughput/32MB-8 100ms ± 1% 100ms ± 1% +0.24% (p=0.033 n=20+20) Throughput/64MB-8 197ms ± 0% 198ms ± 0% +0.56% (p=0.000 n=18+7) The small MB runs are bimodal in both cases, probably GC pauses. But there's clearly no general slowdown anymore. Fixes golang#15713. Change-Id: I5fc44680ba71812d24baac142bceee0e23f2e382 Reviewed-on: https://go-review.googlesource.com/23487 Reviewed-by: Ian Lance Taylor <[email protected]>
Attached is a stab at a TLS benchmark (crypto/tls has none!).
It runs b.N TLS connections in sequence, each transferring 16 MB in each direction in 64 kB writes.
tlsbench_test.go.txt
Through Go 1.5 it was hovering around 30 MB/s on my workstation.
Go 1.6 made it jump by almost 10X, to around 295 MB/s.
At tip, however, it is down a bit, around 235 MB/s.
It would be good to understand this. This might be something to do soon after the beta is released.
/cc @agl
The text was updated successfully, but these errors were encountered: