Skip to content

Support PGO for clang-cl #130090

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
chris-eibl opened this issue Feb 13, 2025 · 2 comments
Closed

Support PGO for clang-cl #130090

chris-eibl opened this issue Feb 13, 2025 · 2 comments
Labels
build The build process and cross-build OS-windows type-feature A feature request or enhancement

Comments

@chris-eibl
Copy link
Member

chris-eibl commented Feb 13, 2025

Feature or enhancement

Proposal:

Support PGO (profile guided optimization) for clang-cl on Windows using a similar approach as done in the Linux makefiles for clang.

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

Discussion has started in the PR #129907 while being draft.

Linked PRs

64bit pyperformance results on my Windows 10 PC (dusty i5-4570 CPU) run with --fast --affinity 0 for commit 9db1a29 with

Benchmark msvc.release.9db1a297d9 clang.release.9db1a297d9 msvc.pgo.9db1a297d9 clang.pgo.9db1a297d9
Geometric mean (ref) 1.27x faster 1.28x faster 1.47x faster
Benchmark msvc.release.9db1a297d9 clang.release.9db1a297d9
Geometric mean (ref) 1.27x faster

clang 18.1.8 is faster than 19.1.1, and 20.1.0.rc2 with tailcalling is the fastest:

Benchmark msvc.pgo.9db1a297d9 clang.pgo.18.1.8.9db1a297d9 clang.pgo.9db1a297d9 clang.pgo.tc.20.1.0.rc2.9db1a297d9
Geometric mean (ref) 1.19x faster 1.15x faster 1.25x faster
Details

Benchmarks with tag 'apps':

Benchmark msvc.release.9db1a297d9 clang.release.9db1a297d9 msvc.pgo.9db1a297d9 clang.pgo.9db1a297d9
2to3 586 ms 491 ms: 1.19x faster 462 ms: 1.27x faster 426 ms: 1.38x faster
docutils 4.27 sec 3.75 sec: 1.14x faster 3.50 sec: 1.22x faster 3.31 sec: 1.29x faster
html5lib 104 ms 81.6 ms: 1.28x faster 77.9 ms: 1.34x faster 74.5 ms: 1.40x faster
Geometric mean (ref) 1.20x faster 1.28x faster 1.35x faster

Benchmarks with tag 'asyncio':

Benchmark msvc.release.9db1a297d9 clang.release.9db1a297d9 msvc.pgo.9db1a297d9 clang.pgo.9db1a297d9
async_tree_none 511 ms 383 ms: 1.33x faster 394 ms: 1.30x faster 357 ms: 1.43x faster
async_tree_cpu_io_mixed 933 ms 805 ms: 1.16x faster 749 ms: 1.25x faster 697 ms: 1.34x faster
async_tree_cpu_io_mixed_tg 891 ms 776 ms: 1.15x faster 716 ms: 1.24x faster 665 ms: 1.34x faster
async_tree_eager 209 ms 153 ms: 1.37x faster 160 ms: 1.31x faster 133 ms: 1.57x faster
async_tree_eager_cpu_io_mixed 656 ms 630 ms: 1.04x faster 567 ms: 1.16x faster 535 ms: 1.23x faster
async_tree_eager_cpu_io_mixed_tg 830 ms 741 ms: 1.12x faster 681 ms: 1.22x faster 646 ms: 1.28x faster
async_tree_eager_io 1.12 sec 870 ms: 1.29x faster 874 ms: 1.28x faster 817 ms: 1.37x faster
async_tree_eager_io_tg 1.12 sec 890 ms: 1.26x faster 898 ms: 1.25x faster 840 ms: 1.33x faster
async_tree_eager_memoization 393 ms 304 ms: 1.29x faster 304 ms: 1.29x faster 281 ms: 1.40x faster
async_tree_eager_memoization_tg 546 ms 420 ms: 1.30x faster 427 ms: 1.28x faster 397 ms: 1.37x faster
async_tree_eager_tg 408 ms 312 ms: 1.31x faster 321 ms: 1.27x faster 297 ms: 1.38x faster
async_tree_io 1.14 sec 868 ms: 1.31x faster 889 ms: 1.28x faster 824 ms: 1.38x faster
async_tree_io_tg 1.14 sec 871 ms: 1.31x faster 877 ms: 1.30x faster 807 ms: 1.41x faster
async_tree_memoization 649 ms 493 ms: 1.32x faster 509 ms: 1.28x faster 458 ms: 1.42x faster
async_tree_memoization_tg 605 ms 453 ms: 1.34x faster 462 ms: 1.31x faster 425 ms: 1.42x faster
async_tree_none_tg 497 ms 371 ms: 1.34x faster 382 ms: 1.30x faster 352 ms: 1.41x faster
Geometric mean (ref) 1.26x faster 1.27x faster 1.38x faster

Benchmarks with tag 'math':

Benchmark msvc.release.9db1a297d9 clang.release.9db1a297d9 msvc.pgo.9db1a297d9 clang.pgo.9db1a297d9
float 145 ms 108 ms: 1.35x faster 116 ms: 1.25x faster 96.8 ms: 1.50x faster
nbody 203 ms 155 ms: 1.31x faster 171 ms: 1.19x faster 128 ms: 1.58x faster
pidigits 245 ms 250 ms: 1.02x slower 250 ms: 1.02x slower 240 ms: 1.02x faster
Geometric mean (ref) 1.20x faster 1.13x faster 1.34x faster

Benchmarks with tag 'regex':

Benchmark msvc.release.9db1a297d9 clang.release.9db1a297d9 msvc.pgo.9db1a297d9 clang.pgo.9db1a297d9
regex_compile 237 ms 180 ms: 1.31x faster 180 ms: 1.31x faster 157 ms: 1.51x faster
regex_dna 226 ms 256 ms: 1.14x slower 210 ms: 1.07x faster 211 ms: 1.07x faster
regex_effbot 4.05 ms not significant 3.66 ms: 1.11x faster 3.39 ms: 1.20x faster
regex_v8 38.7 ms 35.7 ms: 1.08x faster 33.7 ms: 1.15x faster 29.8 ms: 1.30x faster
Geometric mean (ref) 1.06x faster 1.16x faster 1.26x faster

Benchmarks with tag 'serialize':

Benchmark msvc.release.9db1a297d9 clang.release.9db1a297d9 msvc.pgo.9db1a297d9 clang.pgo.9db1a297d9
json_dumps 19.6 ms 16.9 ms: 1.16x faster 15.0 ms: 1.31x faster 12.9 ms: 1.52x faster
json_loads 48.1 us 46.7 us: 1.03x faster 36.8 us: 1.31x faster 32.7 us: 1.47x faster
pickle 21.5 us 17.9 us: 1.20x faster 19.1 us: 1.13x faster 15.0 us: 1.44x faster
pickle_dict 46.0 us 34.3 us: 1.34x faster 43.2 us: 1.07x faster 27.6 us: 1.67x faster
pickle_list 8.16 us 6.19 us: 1.32x faster 6.89 us: 1.18x faster 5.05 us: 1.62x faster
pickle_pure_python 672 us 455 us: 1.48x faster 463 us: 1.45x faster 378 us: 1.78x faster
tomli_loads 3.84 sec 2.79 sec: 1.38x faster 2.88 sec: 1.33x faster 2.38 sec: 1.61x faster
unpickle 26.2 us 24.0 us: 1.09x faster 19.8 us: 1.32x faster 17.9 us: 1.46x faster
unpickle_list 7.29 us 6.03 us: 1.21x faster 6.87 us: 1.06x faster 5.38 us: 1.36x faster
unpickle_pure_python 505 us 321 us: 1.57x faster 336 us: 1.50x faster 257 us: 1.96x faster
xml_etree_parse 232 ms 228 ms: 1.02x faster 200 ms: 1.16x faster 210 ms: 1.10x faster
xml_etree_iterparse 185 ms 160 ms: 1.16x faster 154 ms: 1.21x faster 145 ms: 1.27x faster
xml_etree_generate 181 ms 148 ms: 1.22x faster 135 ms: 1.35x faster 119 ms: 1.53x faster
xml_etree_process 128 ms 100 ms: 1.28x faster 94.4 ms: 1.36x faster 82.0 ms: 1.56x faster
Geometric mean (ref) 1.24x faster 1.26x faster 1.51x faster

Benchmarks with tag 'startup':

Benchmark msvc.release.9db1a297d9 clang.release.9db1a297d9 msvc.pgo.9db1a297d9 clang.pgo.9db1a297d9
python_startup 45.4 ms not significant 43.1 ms: 1.05x faster 43.7 ms: 1.04x faster
python_startup_no_site 37.1 ms not significant 35.4 ms: 1.05x faster 35.9 ms: 1.03x faster
Geometric mean (ref) 1.00x faster 1.05x faster 1.04x faster

Benchmarks with tag 'template':

Benchmark msvc.release.9db1a297d9 clang.release.9db1a297d9 msvc.pgo.9db1a297d9 clang.pgo.9db1a297d9
django_template 75.6 ms 55.6 ms: 1.36x faster 52.1 ms: 1.45x faster 42.1 ms: 1.79x faster
genshi_text 44.5 ms 31.4 ms: 1.42x faster 32.5 ms: 1.37x faster 26.3 ms: 1.69x faster
genshi_xml 102 ms 74.0 ms: 1.37x faster 74.6 ms: 1.36x faster 63.1 ms: 1.61x faster
mako 23.3 ms 17.7 ms: 1.31x faster 16.7 ms: 1.39x faster 14.4 ms: 1.61x faster
Geometric mean (ref) 1.36x faster 1.39x faster 1.67x faster

All benchmarks:

Benchmark msvc.release.9db1a297d9 clang.release.9db1a297d9 msvc.pgo.9db1a297d9 clang.pgo.9db1a297d9
2to3 586 ms 491 ms: 1.19x faster 462 ms: 1.27x faster 426 ms: 1.38x faster
async_generators 696 ms 565 ms: 1.23x faster 577 ms: 1.21x faster 514 ms: 1.35x faster
async_tree_none 511 ms 383 ms: 1.33x faster 394 ms: 1.30x faster 357 ms: 1.43x faster
async_tree_cpu_io_mixed 933 ms 805 ms: 1.16x faster 749 ms: 1.25x faster 697 ms: 1.34x faster
async_tree_cpu_io_mixed_tg 891 ms 776 ms: 1.15x faster 716 ms: 1.24x faster 665 ms: 1.34x faster
async_tree_eager 209 ms 153 ms: 1.37x faster 160 ms: 1.31x faster 133 ms: 1.57x faster
async_tree_eager_cpu_io_mixed 656 ms 630 ms: 1.04x faster 567 ms: 1.16x faster 535 ms: 1.23x faster
async_tree_eager_cpu_io_mixed_tg 830 ms 741 ms: 1.12x faster 681 ms: 1.22x faster 646 ms: 1.28x faster
async_tree_eager_io 1.12 sec 870 ms: 1.29x faster 874 ms: 1.28x faster 817 ms: 1.37x faster
async_tree_eager_io_tg 1.12 sec 890 ms: 1.26x faster 898 ms: 1.25x faster 840 ms: 1.33x faster
async_tree_eager_memoization 393 ms 304 ms: 1.29x faster 304 ms: 1.29x faster 281 ms: 1.40x faster
async_tree_eager_memoization_tg 546 ms 420 ms: 1.30x faster 427 ms: 1.28x faster 397 ms: 1.37x faster
async_tree_eager_tg 408 ms 312 ms: 1.31x faster 321 ms: 1.27x faster 297 ms: 1.38x faster
async_tree_io 1.14 sec 868 ms: 1.31x faster 889 ms: 1.28x faster 824 ms: 1.38x faster
async_tree_io_tg 1.14 sec 871 ms: 1.31x faster 877 ms: 1.30x faster 807 ms: 1.41x faster
async_tree_memoization 649 ms 493 ms: 1.32x faster 509 ms: 1.28x faster 458 ms: 1.42x faster
async_tree_memoization_tg 605 ms 453 ms: 1.34x faster 462 ms: 1.31x faster 425 ms: 1.42x faster
async_tree_none_tg 497 ms 371 ms: 1.34x faster 382 ms: 1.30x faster 352 ms: 1.41x faster
asyncio_tcp 1.64 sec 1.55 sec: 1.06x faster 1.48 sec: 1.11x faster not significant
asyncio_websockets 732 ms 578 ms: 1.27x faster 758 ms: 1.04x slower not significant
chaos 132 ms 88.6 ms: 1.48x faster 90.8 ms: 1.45x faster 74.3 ms: 1.77x faster
comprehensions 34.7 us 24.5 us: 1.42x faster 25.2 us: 1.38x faster 19.2 us: 1.80x faster
bench_mp_pool 213 ms 196 ms: 1.09x faster 177 ms: 1.20x faster 190 ms: 1.12x faster
bench_thread_pool 1.95 ms 1.74 ms: 1.12x faster 1.68 ms: 1.16x faster 1.63 ms: 1.19x faster
coroutines 45.3 ms 33.9 ms: 1.34x faster 36.1 ms: 1.25x faster 26.9 ms: 1.68x faster
coverage 130 ms 119 ms: 1.09x faster 120 ms: 1.09x faster 103 ms: 1.26x faster
crypto_pyaes 147 ms 109 ms: 1.35x faster 109 ms: 1.35x faster 86.3 ms: 1.70x faster
deepcopy 516 us 391 us: 1.32x faster 388 us: 1.33x faster 309 us: 1.67x faster
deepcopy_reduce 5.30 us 4.19 us: 1.26x faster 3.95 us: 1.34x faster 3.23 us: 1.64x faster
deepcopy_memo 67.1 us 41.6 us: 1.61x faster 46.8 us: 1.44x faster 34.8 us: 1.93x faster
deltablue 7.72 ms 4.52 ms: 1.71x faster 4.92 ms: 1.57x faster 3.80 ms: 2.03x faster
django_template 75.6 ms 55.6 ms: 1.36x faster 52.1 ms: 1.45x faster 42.1 ms: 1.79x faster
docutils 4.27 sec 3.75 sec: 1.14x faster 3.50 sec: 1.22x faster 3.31 sec: 1.29x faster
dulwich_log 156 ms 141 ms: 1.11x faster 129 ms: 1.20x faster 131 ms: 1.19x faster
fannkuch 770 ms 592 ms: 1.30x faster 637 ms: 1.21x faster 516 ms: 1.49x faster
float 145 ms 108 ms: 1.35x faster 116 ms: 1.25x faster 96.8 ms: 1.50x faster
create_gc_cycles 1.62 ms 1.71 ms: 1.05x slower not significant 1.71 ms: 1.05x slower
gc_traversal 5.03 ms not significant 4.02 ms: 1.25x faster 5.71 ms: 1.13x slower
generators 65.1 ms 40.4 ms: 1.61x faster 44.4 ms: 1.47x faster 36.0 ms: 1.81x faster
genshi_text 44.5 ms 31.4 ms: 1.42x faster 32.5 ms: 1.37x faster 26.3 ms: 1.69x faster
genshi_xml 102 ms 74.0 ms: 1.37x faster 74.6 ms: 1.36x faster 63.1 ms: 1.61x faster
go 255 ms 147 ms: 1.73x faster 170 ms: 1.50x faster 132 ms: 1.94x faster
hexiom 13.4 ms 8.49 ms: 1.58x faster 9.22 ms: 1.46x faster 7.11 ms: 1.89x faster
html5lib 104 ms 81.6 ms: 1.28x faster 77.9 ms: 1.34x faster 74.5 ms: 1.40x faster
json_dumps 19.6 ms 16.9 ms: 1.16x faster 15.0 ms: 1.31x faster 12.9 ms: 1.52x faster
json_loads 48.1 us 46.7 us: 1.03x faster 36.8 us: 1.31x faster 32.7 us: 1.47x faster
logging_format 21.2 us 16.4 us: 1.29x faster 14.7 us: 1.44x faster 13.6 us: 1.56x faster
logging_silent 213 ns 143 ns: 1.49x faster 152 ns: 1.40x faster 109 ns: 1.95x faster
logging_simple 19.4 us 14.6 us: 1.33x faster 13.5 us: 1.44x faster 12.2 us: 1.60x faster
mako 23.3 ms 17.7 ms: 1.31x faster 16.7 ms: 1.39x faster 14.4 ms: 1.61x faster
mdp 3.99 sec 4.12 sec: 1.03x slower 3.76 sec: 1.06x faster 3.37 sec: 1.18x faster
meteor_contest 175 ms 133 ms: 1.32x faster 139 ms: 1.26x faster 124 ms: 1.41x faster
nbody 203 ms 155 ms: 1.31x faster 171 ms: 1.19x faster 128 ms: 1.58x faster
nqueens 179 ms 129 ms: 1.38x faster 131 ms: 1.37x faster 103 ms: 1.73x faster
pathlib 278 ms 266 ms: 1.04x faster 256 ms: 1.09x faster 262 ms: 1.06x faster
pickle 21.5 us 17.9 us: 1.20x faster 19.1 us: 1.13x faster 15.0 us: 1.44x faster
pickle_dict 46.0 us 34.3 us: 1.34x faster 43.2 us: 1.07x faster 27.6 us: 1.67x faster
pickle_list 8.16 us 6.19 us: 1.32x faster 6.89 us: 1.18x faster 5.05 us: 1.62x faster
pickle_pure_python 672 us 455 us: 1.48x faster 463 us: 1.45x faster 378 us: 1.78x faster
pidigits 245 ms 250 ms: 1.02x slower 250 ms: 1.02x slower 240 ms: 1.02x faster
pprint_safe_repr 1.46 sec 1.09 sec: 1.34x faster 1.09 sec: 1.34x faster 934 ms: 1.57x faster
pprint_pformat 3.00 sec 2.22 sec: 1.35x faster 2.23 sec: 1.35x faster 1.91 sec: 1.57x faster
pyflate 875 ms 626 ms: 1.40x faster 668 ms: 1.31x faster 537 ms: 1.63x faster
python_startup 45.4 ms not significant 43.1 ms: 1.05x faster 43.7 ms: 1.04x faster
python_startup_no_site 37.1 ms not significant 35.4 ms: 1.05x faster 35.9 ms: 1.03x faster
raytrace 587 ms 385 ms: 1.52x faster 414 ms: 1.42x faster 321 ms: 1.83x faster
regex_compile 237 ms 180 ms: 1.31x faster 180 ms: 1.31x faster 157 ms: 1.51x faster
regex_dna 226 ms 256 ms: 1.14x slower 210 ms: 1.07x faster 211 ms: 1.07x faster
regex_effbot 4.05 ms not significant 3.66 ms: 1.11x faster 3.39 ms: 1.20x faster
regex_v8 38.7 ms 35.7 ms: 1.08x faster 33.7 ms: 1.15x faster 29.8 ms: 1.30x faster
richards 102 ms 65.3 ms: 1.56x faster 64.7 ms: 1.58x faster 49.7 ms: 2.05x faster
richards_super 116 ms 74.3 ms: 1.57x faster 74.7 ms: 1.56x faster 56.2 ms: 2.07x faster
scimark_fft 664 ms 485 ms: 1.37x faster 493 ms: 1.35x faster 358 ms: 1.85x faster
scimark_lu 227 ms 159 ms: 1.43x faster 164 ms: 1.39x faster 132 ms: 1.72x faster
scimark_monte_carlo 138 ms 91.6 ms: 1.51x faster 101 ms: 1.37x faster 74.6 ms: 1.85x faster
scimark_sor 256 ms 176 ms: 1.46x faster 195 ms: 1.31x faster 151 ms: 1.69x faster
scimark_sparse_mat_mult 8.76 ms 6.31 ms: 1.39x faster 6.06 ms: 1.45x faster 5.01 ms: 1.75x faster
spectral_norm 179 ms 136 ms: 1.32x faster 151 ms: 1.19x faster 110 ms: 1.63x faster
sqlglot_normalize 204 ms 156 ms: 1.31x faster 151 ms: 1.35x faster 131 ms: 1.55x faster
sqlglot_optimize 97.0 ms 77.7 ms: 1.25x faster 74.5 ms: 1.30x faster 66.2 ms: 1.47x faster
sqlglot_parse 2.52 ms 1.72 ms: 1.46x faster 1.81 ms: 1.39x faster 1.51 ms: 1.66x faster
sqlglot_transpile 3.02 ms 2.15 ms: 1.41x faster 2.21 ms: 1.37x faster 1.85 ms: 1.63x faster
sqlite_synth 4.08 us 3.81 us: 1.07x faster 3.75 us: 1.09x faster 3.44 us: 1.18x faster
sympy_expand 818 ms 681 ms: 1.20x faster 640 ms: 1.28x faster 578 ms: 1.42x faster
sympy_integrate 33.5 ms 28.2 ms: 1.19x faster 27.1 ms: 1.24x faster 24.2 ms: 1.38x faster
sympy_sum 258 ms 222 ms: 1.16x faster 213 ms: 1.21x faster 199 ms: 1.29x faster
sympy_str 484 ms 405 ms: 1.20x faster 383 ms: 1.26x faster 344 ms: 1.41x faster
telco 13.1 ms 11.1 ms: 1.17x faster 10.7 ms: 1.22x faster 9.37 ms: 1.40x faster
tomli_loads 3.84 sec 2.79 sec: 1.38x faster 2.88 sec: 1.33x faster 2.38 sec: 1.61x faster
typing_runtime_protocols 296 us 239 us: 1.24x faster 223 us: 1.32x faster 193 us: 1.53x faster
unpack_sequence 152 ns 58.2 ns: 2.61x faster 84.8 ns: 1.79x faster 59.3 ns: 2.56x faster
unpickle 26.2 us 24.0 us: 1.09x faster 19.8 us: 1.32x faster 17.9 us: 1.46x faster
unpickle_list 7.29 us 6.03 us: 1.21x faster 6.87 us: 1.06x faster 5.38 us: 1.36x faster
unpickle_pure_python 505 us 321 us: 1.57x faster 336 us: 1.50x faster 257 us: 1.96x faster
xml_etree_parse 232 ms 228 ms: 1.02x faster 200 ms: 1.16x faster 210 ms: 1.10x faster
xml_etree_iterparse 185 ms 160 ms: 1.16x faster 154 ms: 1.21x faster 145 ms: 1.27x faster
xml_etree_generate 181 ms 148 ms: 1.22x faster 135 ms: 1.35x faster 119 ms: 1.53x faster
xml_etree_process 128 ms 100 ms: 1.28x faster 94.4 ms: 1.36x faster 82.0 ms: 1.56x faster
Geometric mean (ref) 1.27x faster 1.28x faster 1.47x faster

Benchmark hidden because not significant (1): asyncio_tcp_ssl

More benchmarks (including clang-cl 18.1.8, 20.1.0.rc2, computed gotos and tailcall) can be found in https://gist.github.com/chris-eibl/114a42f22563956fdb5cd0335b28c7ae.

Raw data is here https://gist.github.com/chris-eibl/c73b02762a7c467e9a410a0aa19c7701.

@zooba
Copy link
Member

zooba commented Mar 4, 2025

This is now checked in, at least for 32-bit and 64-bit and natively-built ARM64 (cross-compiled ARM64 apparently needs some options that we don't know right now).

Thanks @chris-eibl!

@chris-eibl
Copy link
Member Author

@zanieb asked for build time numbers in #129907 (comment):

Numbers in the following tables represent seconds. Please note that the sum of the detailed project times does not match the total time, because most of the projects are built in parallel, except _freeze_module and python314. Therefore, I have listed them first in the details table and then sorted the others by build time of the first column, see #131005.

In the pgupdate details we still see _freeze_module, because #130420 is not on that branch. Hence, we see what that PR saves us.

I intentionally branched off commit 9db1a29 to have the same environment.

  • MSVC is much faster for both debug and release builds
  • older clangs are faster than newer in debug builds

Debug build times:

debug_clang_18.1.8 debug_clang_19.1.1 debug_clang_20.1.0-rc2 debug_msvc
total time 128.3 154.8 163.5 84.9

Release build times:

release_clang_18.1.8 release_clang_19.1.1 release_clang_20.1.0-rc2 release_msvc
total time 278.3 277.2 274.2 172.3

PGO build times:

  • MSVC is still faster in the pginstr phase
  • but the instrumented binaires take much longer to execute (pgo as short for pgo task in the table)
  • kill is due to call :Kill in case of build.bat --pgo - can be ignored, takes almost no time
  • pgupd phase takes longer for MSVC
  • so the overall build.bat --pgo times are longer for MSVC
pgo_clang_18.1.8 pgo_clang_19.1.1 pgo_clang_20.1.0-rc2 pgo_clang_thin_20.1.0-rc2 pgo_msvc
pginstr 288.7 279.5 297.2 219.3 155.9
pgo 77.0 70.0 70.0 69.0 559.0
kill 1.1 1.2 1.2 0.5 1.1
pgupd 284.8 271.5 282.8 231.7 359.0
total time 651.7 622.1 651.2 520.6 1075.0

Very interesting: pyexpat _elementtree take much longer for 20.1.0-rc2 in the pginstr phase (see details), but come back to "normal" with --flto=thin. Because these are so outliers, I retested several times with the same result :-O

Detailed build times:

Detailed debug build times

debug_clang_18.1.8 debug_clang_19.1.1 debug_clang_20.1.0-rc2 debug_msvc
_freeze_module 31.0 36.5 38.5 16.8
python314 44.9 56.5 62.7 31.4
liblzma 14.8 15.7 14.3 7.5
sqlite3 8.7 8.0 7.8 6.9
_bz2 5.2 6.4 5.1 7.7
_wmi 5.0 5.3 4.5 4.7
_ctypes 5.0 5.9 5.7 7.9
_decimal 4.2 5.5 5.2 3.4
_testcapi 4.0 6.4 7.0 2.2
_ssl 3.4 5.0 3.8 2.3
_overlapped 3.1 3.9 3.6 2.5
_uuid 3.1 1.9 4.2 4.0
_socket 3.0 3.3 3.0 4.7
_tkinter 3.0 3.6 3.5 2.2
_sqlite3 2.9 4.0 4.2 1.4
_hashlib 2.4 2.9 2.8 4.2
venvwlauncher 2.4 2.8 2.8 4.5
_elementtree 2.4 2.8 2.5 2.3
_testlimitedcapi 2.3 3.6 3.9 1.4
_multiprocessing 2.3 3.0 2.6 1.8
_asyncio 2.3 2.8 2.7 3.5
pyshellext 2.2 2.3 2.4 3.3
_zoneinfo 2.1 2.7 2.5 3.1
unicodedata 2.0 2.2 2.0 2.9
py 1.9 2.1 2.2 3.7
pyw 1.9 2.1 2.2 4.0
_queue 1.9 2.0 2.1 3.5
venvlauncher 1.9 1.9 1.5 3.8
pyexpat 1.8 1.7 1.6 1.8
_ctypes_test 1.6 1.6 1.7 1.1
select 1.6 1.7 2.3 2.9
_testinternalcapi 1.5 2.0 2.0 1.1
winsound 1.4 1.8 1.7 7.5
_testclinic 1.1 1.3 1.4 0.8
_testembed 1.0 1.2 1.3 0.8
pythonw 0.9 1.1 1.1 0.7
_testconsole 0.8 1.0 1.1 0.7
_testbuffer 0.8 0.9 1.0 0.6
_lzma 0.8 1.0 1.1 1.1
_testimportmultiple 0.7 0.8 0.9 0.5
python 0.7 1.4 1.0 0.6
_testmultiphase 0.7 0.9 1.0 0.6
_testclinic_limited 0.7 0.8 0.9 0.5
_testsinglephase 0.7 0.9 1.0 0.6
python3 0.5 0.5 0.5 0.5
total 186.8 221.8 227.1 169.8

Detailed release build times

release_clang_18.1.8 release_clang_19.1.1 release_clang_20.1.0-rc2 release_msvc
_freeze_module 26.4 35.5 37.6 13.8
python314 147.0 135.6 131.0 98.2
sqlite3 50.5 45.8 44.2 18.8
liblzma 14.6 15.3 15.3 11.7
_decimal 10.9 11.0 10.8 7.0
_bz2 7.4 7.8 7.2 11.5
_ctypes 7.3 7.3 7.2 9.7
_testcapi 6.0 7.9 8.3 2.7
pyexpat 5.6 5.0 4.3 5.9
_ssl 5.0 5.3 5.5 5.2
_wmi 4.8 4.4 5.0 6.2
_tkinter 4.4 4.4 4.3 6.0
_ctypes_test 3.9 3.7 3.6 6.3
_socket 3.8 4.0 4.3 5.1
_elementtree 3.5 3.5 3.6 5.0
_uuid 3.4 5.0 4.0 5.2
_testlimitedcapi 3.4 4.5 4.8 2.7
_lzma 3.3 3.0 3.1 6.0
_asyncio 3.3 3.5 3.5 1.9
_hashlib 3.0 3.3 3.7 4.9
_overlapped 3.0 3.2 3.4 3.4
venvwlauncher 2.8 3.0 2.8 6.0
_zoneinfo 2.7 4.0 3.1 4.8
pyw 2.6 2.8 2.8 6.4
unicodedata 2.5 2.5 2.4 2.6
_sqlite3 2.4 3.1 3.1 1.4
py 2.4 2.5 2.6 3.4
pyshellext 2.4 2.7 2.7 6.0
_multiprocessing 2.0 2.7 1.9 3.3
_testclinic 2.0 2.0 2.0 1.0
_testinternalcapi 1.9 2.3 2.4 1.2
venvlauncher 1.9 1.8 1.9 3.5
_queue 1.6 2.3 2.4 2.3
select 1.5 1.6 1.6 2.6
_testembed 1.3 1.5 1.5 0.8
winsound 1.2 1.7 1.9 7.6
_testbuffer 1.2 1.4 1.4 0.7
_testconsole 0.8 1.1 1.2 0.7
pythonw 0.8 1.1 1.1 0.7
_testmultiphase 0.8 1.0 1.0 0.6
_testsinglephase 0.7 0.9 1.0 0.6
_testclinic_limited 0.7 0.9 0.9 0.5
python 0.7 0.9 1.0 0.6
_testimportmultiple 0.6 0.9 0.9 0.5
xxlimited 0.6 0.9 0.8 0.5
xxlimited_35 0.6 0.8 0.8 0.5
python3 0.5 0.5 0.5 0.4
total 359.9 365.9 360.3 296.1

Details pginstrument build times

pgo_clang_18.1.8 pgo_clang_19.1.1 pgo_clang_20.1.0-rc2 pgo_clang_thin_20.1.0-rc2 pgo_msvc
_freeze_module 26.6 35.1 38.5 40.0 14.0
python314 159.6 139.7 141.5 81.3 86.6
sqlite3 50.0 45.1 46.0 42.4 18.2
_ctypes 14.3 8.6 6.9 7.5 8.1
_bz2 12.7 8.4 7.0 4.9 6.1
liblzma 12.7 18.3 18.2 16.5 11.3
_decimal 11.4 10.9 12.4 7.7 3.6
pyexpat 10.4 6.1 52.7 3.9 6.2
_testcapi 5.8 7.6 8.3 7.1 2.7
_asyncio 5.5 4.4 4.0 5.2 3.4
_elementtree 5.0 5.4 51.8 5.3 3.2
_wmi 4.9 6.1 4.5 3.0 5.7
_lzma 4.5 3.5 3.8 1.8 6.2
_ssl 3.9 5.6 3.7 5.5 5.6
_ctypes_test 3.9 3.6 3.7 3.4 6.3
venvwlauncher 3.6 2.8 3.3 2.7 4.1
_testlimitedcapi 3.4 4.4 4.9 4.3 2.7
_sqlite3 3.0 3.4 3.4 2.8 1.4
_overlapped 2.9 3.2 4.5 3.2 2.5
_zoneinfo 2.9 3.6 3.1 3.4 3.3
_socket 2.8 4.2 2.4 3.7 4.6
unicodedata 2.7 2.6 2.7 3.0 2.4
_tkinter 2.6 4.6 2.2 4.1 3.4
_multiprocessing 2.5 1.7 3.5 2.7 9.8
pyw 2.4 2.8 2.7 2.7 3.2
py 2.4 2.5 2.6 2.5 3.2
pyshellext 2.3 2.8 2.7 2.6 2.9
_testclinic 2.0 2.0 2.0 1.9 1.0
_hashlib 1.9 3.3 1.8 3.1 5.0
_testinternalcapi 1.8 2.2 2.4 2.2 1.2
venvlauncher 1.7 1.8 1.8 1.7 2.7
select 1.4 1.6 1.8 2.2 2.3
_uuid 1.4 1.6 1.6 3.2 2.4
_queue 1.4 1.7 1.6 2.3 2.8
winsound 1.4 1.6 1.7 3.3 2.4
_testembed 1.3 1.4 1.5 1.5 0.8
_testbuffer 1.3 1.3 1.4 1.3 0.7
_testconsole 0.8 1.0 1.1 1.1 0.7
pythonw 0.8 1.1 1.1 1.1 0.7
_testmultiphase 0.7 0.9 1.0 1.0 0.6
_testsinglephase 0.7 0.9 1.0 1.0 0.6
_testclinic_limited 0.7 0.9 0.9 0.9 0.6
python 0.7 0.9 1.0 0.9 0.6
_testimportmultiple 0.6 0.8 0.9 0.9 0.6
python3 0.5 0.5 0.5 0.5 0.5
total 385.7 372.4 465.8 303.3 257.1

Details pgupdate build times

pgo_clang_18.1.8 pgo_clang_19.1.1 pgo_clang_20.1.0-rc2 pgo_clang_thin_20.1.0-rc2 pgo_msvc
_freeze_module 26.7 34.7 38.0 39.5 13.8
python314 154.3 137.6 141.9 95.4 287.1
sqlite3 47.4 46.1 44.4 42.9 16.1
_ctypes 12.1 6.9 8.0 7.2 6.3
liblzma 12.0 16.6 17.3 16.5 7.9
_bz2 11.0 7.1 7.8 5.5 9.1
_decimal 10.0 10.8 11.2 8.7 6.1
_testcapi 6.0 7.6 8.6 7.3 2.7
pyexpat 5.3 4.4 4.6 3.6 4.7
_elementtree 4.1 3.4 3.5 4.5 12.0
_lzma 4.1 3.3 3.2 1.9 4.8
_ctypes_test 3.8 3.6 3.7 3.4 6.4
unicodedata 3.6 4.1 3.2 3.0 3.0
venvwlauncher 3.5 2.8 3.1 3.0 5.2
_testlimitedcapi 3.4 4.4 5.0 4.2 2.7
_ssl 3.3 5.1 5.2 5.6 2.6
_asyncio 3.0 4.7 4.5 4.6 3.4
_overlapped 2.9 2.9 3.5 3.7 7.2
_uuid 2.9 3.4 2.6 2.8 7.2
_zoneinfo 2.8 3.1 3.2 3.2 2.5
_wmi 2.6 3.3 3.5 3.1 5.3
_sqlite3 2.6 2.9 3.1 2.7 1.7
pyw 2.4 2.5 2.6 2.6 2.7
_socket 2.4 3.8 4.3 3.5 14.6
py 2.3 2.5 2.6 2.7 2.5
pyshellext 2.3 2.5 2.7 2.6 2.4
_tkinter 2.2 3.6 4.0 4.2 4.4
_testclinic 2.0 2.0 2.0 1.9 1.0
_testinternalcapi 1.9 2.2 2.4 2.2 1.2
venvlauncher 1.7 1.6 1.7 1.5 2.4
_multiprocessing 1.5 1.8 2.8 2.6 2.7
select 1.5 1.6 1.6 2.0 3.5
_queue 1.5 1.7 1.9 2.2 2.5
_hashlib 1.5 3.5 3.1 3.3 3.2
winsound 1.3 1.6 1.8 3.0 4.5
_testembed 1.3 1.5 1.5 1.4 0.8
_testbuffer 1.2 1.3 1.4 1.3 0.8
_testconsole 0.8 1.1 1.1 1.0 0.7
pythonw 0.8 1.0 1.1 1.1 0.7
_testmultiphase 0.7 1.0 1.0 1.1 0.6
_testsinglephase 0.7 0.9 1.0 1.0 0.6
python 0.7 0.9 1.0 0.9 0.6
_testclinic_limited 0.6 0.9 0.9 0.9 0.6
_testimportmultiple 0.6 0.8 0.9 0.9 0.6
python3 0.5 0.5 0.5 0.5 0.4
total 360.0 359.7 372.9 316.8 472.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build The build process and cross-build OS-windows type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

4 participants