-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Hi!
Recently I did a lot of benchmarks for measuring Profile-Guided Optimization (PGO) effects on different projects (including some libraries) - the results are available here. So I decided to test PGO with tonic
as well.
My test setup is a Macbook M1 Pro, macOS 13.4 Ventura. All tests are done on the same hardware. Rust version - 1.72. The PGO optimization is done with cargo-pgo. As a training and evaluation set I use Tonic decode
benchmarks. All measurements are done multiple times (and chosen with less variation - but they are pretty much the same between runs, to be honest), and the background load was kept the same (as much as I can guarantee on macOS, ofc). The results are the following.
Tonic Release (cargo bench
):
test chunk_size_100 ... bench: 530 ns/iter (+/- 25) = 1896 MB/s
test chunk_size_1005 ... bench: 343 ns/iter (+/- 22) = 2930 MB/s
test chunk_size_500 ... bench: 385 ns/iter (+/- 18) = 2610 MB/s
test message_count_1 ... bench: 311 ns/iter (+/- 19) = 1623 MB/s
test message_count_10 ... bench: 1,217 ns/iter (+/- 108) = 4149 MB/s
test message_count_20 ... bench: 2,237 ns/iter (+/- 146) = 4514 MB/s
test message_size_10k ... bench: 1,550 ns/iter (+/- 237) = 12909 MB/s
test message_size_1k ... bench: 456 ns/iter (+/- 24) = 4407 MB/s
test message_size_5k ... bench: 851 ns/iter (+/- 85) = 11762 MB/s
Tonic Release + PGO instrumentation (cargo pgo bench
), posting these numbers so you can evaluate how Tonic is slow with PGO instrumentation:
test chunk_size_100 ... bench: 825 ns/iter (+/- 33) = 1218 MB/s
test chunk_size_1005 ... bench: 460 ns/iter (+/- 18) = 2184 MB/s
test chunk_size_500 ... bench: 545 ns/iter (+/- 23) = 1844 MB/s
test message_count_1 ... bench: 449 ns/iter (+/- 19) = 1124 MB/s
test message_count_10 ... bench: 1,642 ns/iter (+/- 79) = 3075 MB/s
test message_count_20 ... bench: 2,889 ns/iter (+/- 222) = 3496 MB/s
test message_size_10k ... bench: 2,001 ns/iter (+/- 134) = 10000 MB/s
test message_size_1k ... bench: 622 ns/iter (+/- 26) = 3231 MB/s
test message_size_5k ... bench: 1,138 ns/iter (+/- 193) = 8796 MB/s
Tonic Release + PGO optimized (cargo pgo optimize bench
):
test chunk_size_100 ... bench: 508 ns/iter (+/- 16) = 1978 MB/s
test chunk_size_1005 ... bench: 309 ns/iter (+/- 17) = 3252 MB/s
test chunk_size_500 ... bench: 360 ns/iter (+/- 20) = 2791 MB/s
test message_count_1 ... bench: 287 ns/iter (+/- 15) = 1759 MB/s
test message_count_10 ... bench: 1,154 ns/iter (+/- 73) = 4376 MB/s
test message_count_20 ... bench: 2,047 ns/iter (+/- 91) = 4934 MB/s
test message_size_10k ... bench: 1,530 ns/iter (+/- 148) = 13078 MB/s
test message_size_1k ... bench: 424 ns/iter (+/- 23) = 4740 MB/s
test message_size_5k ... bench: 794 ns/iter (+/- 107) = 12607 MB/s
This information can be helpful:
- For the Tonic users who want to optimize their gRPC routines
- For benchmark purposes as an additional way to extract more performance
Probably would be a good idea to mention PGO somewhere in the Tonic documentation/README/Wiki .