Open

Description
While revealing that dead Vec<_>
writes always emit, to someone else, I decided to look into this for arrays.
It seems that, for any [T; N]
, such that T is not a ZST or enum with a single possible value, and 3 <= N, then array writes are always emit. This applies to allowing emitting redundant memcpys, memclrs, memfills, and all related operations.
https://rust.godbolt.org/z/7vrT5xnxq
As for Vec<T>
, as long as T isn't as described above, then unobservable writes and operations are usually emit.
https://rust.godbolt.org/z/jhM75Mj5E
Metadata
Metadata
Assignees
Labels
Type
Projects
Milestone
Relationships
Development
No branches or pull requests
Activity
erikdesjardins commentedon Apr 29, 2022
For
[T; N]
, this is because we don't add lifetime markers to by-move arguments passed indirectly: https://rust.godbolt.org/z/vjd78n38a.I believe we should be adding a
lifetime.end
marker at the end of the function.(Note that
byval
also fixes this, but it should be avoided because it changes the ABI in a way that limits optimization potential, by requiring the argument to be in a specific stack position.)For
Vec<T>
, LLVM should be able to see that the stores are dead because it knows that__rust_dealloc
frees memory.It appears to be failing here because we conditionally call
__rust_dealloc
, and LLVM doesn't have enough information to optimize out the condition, so__rust_dealloc
doesn't actually postdominate the store.For the simplified case of
Box<[T]>
, see https://rust.godbolt.org/z/vnPM5o1K4. Note thatBox<[u32]>
has an extra:This is doing something like
x.len() * size_of::<u32>() == 0
, to decide whether or not to call__rust_dealloc
. This check shouldn't be necessary, because we just checkedx.len() == 0
above that, and we know the multiplication never wraps (but LLVM doesn't).The store is optimized out for
Box<[u8]>
, becausesize_of::<u8> == 1
and the two conditions are trivially equivalent.Despite this, it's still not optimized out for
Vec<u8>
, because again__rust_dealloc
doesn't postdominate the store. In this case it seems like it's because the bounds check is based onx.len()
and the dealloc check is based onx.capacity()
, and LLVM doesn't know thatx.len() <= x.capacity()
.ghost commentedon Jun 11, 2022
Is there any way that LLVM could be informed
x.len() <= x.capacity()
?At a minimum, we could toss in an extra
to
Vec::{len, capacity}
accessors? It could give LLVM the extra information necessary to optimize this code path? Otherwise, maybe an upstream issue could be made about introducing an attribute for marking members/variables/values/etc as always following a condition such as the above, so we could bind via a macro?Any suggestions that could be valuable for improving this?
erikdesjardins commentedon Jun 13, 2022
Yeah, that would likely work, or equivalently
std::intrinsics::assume(x.len() <= x.capacity())
. Might be worth opening a PR to run perf and make sure it's not too disruptive (assumes can interfere with other optimizations).erikdesjardins commentedon Jun 13, 2022
I'm working on this, should be straightforward to use unchecked multiplication in that size calculation
Rollup merge of rust-lang#98078 - erikdesjardins:uncheckedsize, r=pet…
@llvm.lifetime.end
for moved arguments passed indirectly #98121Rollup merge of rust-lang#98078 - erikdesjardins:uncheckedsize, r=pet…
erikdesjardins commentedon Jun 17, 2022
For the case of
[T; N]
, from #98121 it seems that emittinglifetime.end
is not viable.A more viable option would be to add an attribute to upstream LLVM, like
noreadonunwind
/noreadonreturn
discussed here: https://groups.google.com/g/llvm-dev/c/i0Z1FC51KVI.However this is a bit less important now than when this issue was first opened, since MIR DSE is now able to remove dead stores in some cases, including the simple motivating case for
[T; N]
: https://rust.godbolt.org/z/MGe59WrxY.This is still a problem when the argument is directly overwritten though (https://rust.godbolt.org/z/G3odYbq78), e.g.
ghost commentedon Jun 17, 2022
Ah, is this the only thing still inhibiting optimization of more complicated cases/situations regarding dead code / dead stores? It does seem correct that the lifetime of the argument copy has ended within the function, yet I saw it mentioned in the PR that inlining caused complications with that?
Ouch, well.. this does impact arrays of Move-only types too! I'd certainly consider it to be in the same vein of issue, e.g.:
Hopefully the developers working on LLVM can eventually get a working implementation of what is described in that discussion, good luck.
ghost commentedon Jun 17, 2022
@rustbot label A-LLVM
1 remaining item