You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Auto merge of #40454 - djzin:fast-swap, r=sfackler
speed up mem::swap
I would have thought that the mem::swap code didn't need an intermediate variable precisely because the pointers are guaranteed never to alias. And.. it doesn't! It seems that llvm will also auto-vectorize this case for large structs, but alas it doesn't seem to have all the aliasing info it needs and so will add redundant checks (and even not bother with autovectorizing for small types). Looks like a lot of performance could still be gained here, so this might be a good test case for future optimizer improvements.
Here are the current benchmarks for the simd version of mem::swap; the timings are in cycles (code below) measured with 10 iterations. The timings for sizes > 32 which are not a multiple of 8 tend to be ever so slightly faster in the old code, but not always. For large struct sizes (> 1024) the new code shows a marked improvement.
\* = latest commit
† = subtracted from other measurements
| arr_length | noop<sup>†</sup> | rust_stdlib | simd_u64x4\* | simd_u64x8
|------------------|------------|-------------------|-------------------|-------------------
8|80|90|90|90
16|72|177|177|177
24|32|76|76|76
32|68|188|112|188
40|32|80|60|80
48|32|84|56|84
56|32|108|72|108
64|32|108|72|76
72|80|350|220|230
80|80|350|220|230
88|80|420|270|270
96|80|420|270|270
104|80|500|320|320
112|80|490|320|320
120|72|528|342|342
128|48|360|234|234
136|72|987|387|387
144|80|1070|420|420
152|64|856|376|376
160|68|804|400|400
168|80|1060|520|520
176|80|1070|520|520
184|32|464|228|228
192|32|504|228|228
200|32|440|248|248
208|72|987|573|573
216|80|1464|220|220
224|48|852|450|450
232|72|1182|666|666
240|32|428|288|288
248|32|428|308|308
256|80|860|770|770
264|80|1130|820|820
272|80|1340|820|820
280|80|1220|870|870
288|72|1227|804|804
296|72|1356|849|849
0 commit comments