Skip to content

Unobservable vec, and arrays writes always always emit for size >= 3 #96497

Open
@ghost

Description

While revealing that dead Vec<_> writes always emit, to someone else, I decided to look into this for arrays.

It seems that, for any [T; N], such that T is not a ZST or enum with a single possible value, and 3 <= N, then array writes are always emit. This applies to allowing emitting redundant memcpys, memclrs, memfills, and all related operations.

https://rust.godbolt.org/z/7vrT5xnxq

As for Vec<T>, as long as T isn't as described above, then unobservable writes and operations are usually emit.
https://rust.godbolt.org/z/jhM75Mj5E

Activity

erikdesjardins

erikdesjardins commented on Apr 29, 2022

@erikdesjardins
Contributor

For [T; N], this is because we don't add lifetime markers to by-move arguments passed indirectly: https://rust.godbolt.org/z/vjd78n38a.

I believe we should be adding a lifetime.end marker at the end of the function.

(Note that byval also fixes this, but it should be avoided because it changes the ABI in a way that limits optimization potential, by requiring the argument to be in a specific stack position.)


For Vec<T>, LLVM should be able to see that the stores are dead because it knows that __rust_dealloc frees memory.

It appears to be failing here because we conditionally call __rust_dealloc, and LLVM doesn't have enough information to optimize out the condition, so __rust_dealloc doesn't actually postdominate the store.

For the simplified case of Box<[T]>, see https://rust.godbolt.org/z/vnPM5o1K4. Note that Box<[u32]> has an extra:

%6 = shl i64 %1, 2
%7 = icmp eq i64 %6, 0

This is doing something like x.len() * size_of::<u32>() == 0, to decide whether or not to call __rust_dealloc. This check shouldn't be necessary, because we just checked x.len() == 0 above that, and we know the multiplication never wraps (but LLVM doesn't).

The store is optimized out for Box<[u8]>, because size_of::<u8> == 1 and the two conditions are trivially equivalent.

Despite this, it's still not optimized out for Vec<u8>, because again __rust_dealloc doesn't postdominate the store. In this case it seems like it's because the bounds check is based on x.len() and the dealloc check is based on x.capacity(), and LLVM doesn't know that x.len() <= x.capacity().

ghost

ghost commented on Jun 11, 2022

@ghost

Is there any way that LLVM could be informed x.len() <= x.capacity()?
At a minimum, we could toss in an extra

unsafe {
 if !(x.len() <= x.capacity()) {
  std::hint::unreachable_unchecked();
 }
}

to Vec::{len, capacity} accessors? It could give LLVM the extra information necessary to optimize this code path? Otherwise, maybe an upstream issue could be made about introducing an attribute for marking members/variables/values/etc as always following a condition such as the above, so we could bind via a macro?
Any suggestions that could be valuable for improving this?

erikdesjardins

erikdesjardins commented on Jun 13, 2022

@erikdesjardins
Contributor

Yeah, that would likely work, or equivalently std::intrinsics::assume(x.len() <= x.capacity()). Might be worth opening a PR to run perf and make sure it's not too disruptive (assumes can interfere with other optimizations).

erikdesjardins

erikdesjardins commented on Jun 13, 2022

@erikdesjardins
Contributor

This is doing something like x.len() * size_of::<u32>() == 0, to decide whether or not to call __rust_dealloc. This check shouldn't be necessary, because we just checked x.len() == 0 above that, and we know the multiplication never wraps (but LLVM doesn't).

I'm working on this, should be straightforward to use unchecked multiplication in that size calculation

added a commit that references this issue on Jun 14, 2022

Rollup merge of rust-lang#98078 - erikdesjardins:uncheckedsize, r=pet…

d18d756
added a commit that references this issue on Jun 15, 2022

Rollup merge of rust-lang#98078 - erikdesjardins:uncheckedsize, r=pet…

2722c2a
erikdesjardins

erikdesjardins commented on Jun 17, 2022

@erikdesjardins
Contributor

For the case of [T; N], from #98121 it seems that emitting lifetime.end is not viable.

A more viable option would be to add an attribute to upstream LLVM, like noreadonunwind/noreadonreturn discussed here: https://groups.google.com/g/llvm-dev/c/i0Z1FC51KVI.

However this is a bit less important now than when this issue was first opened, since MIR DSE is now able to remove dead stores in some cases, including the simple motivating case for [T; N]: https://rust.godbolt.org/z/MGe59WrxY.

This is still a problem when the argument is directly overwritten though (https://rust.godbolt.org/z/G3odYbq78), e.g.

pub fn example_f(mut x: String) {
    x = "foo".to_string(); // this still writes to memory in the caller's stack frame
}
ghost

ghost commented on Jun 17, 2022

@ghost

For the case of [T; N], from #98121 it seems that emitting lifetime.end is not viable.

A more viable option would be to add an attribute to upstream LLVM, like noreadonunwind/noreadonreturn discussed here: https://groups.google.com/g/llvm-dev/c/i0Z1FC51KVI.

Ah, is this the only thing still inhibiting optimization of more complicated cases/situations regarding dead code / dead stores? It does seem correct that the lifetime of the argument copy has ended within the function, yet I saw it mentioned in the PR that inlining caused complications with that?

This is still a problem when the argument is directly overwritten though (https://rust.godbolt.org/z/G3odYbq78), e.g.

pub fn example_f(mut x: String) {
    x = "foo".to_string(); // this still writes to memory in the caller's stack frame
}

Ouch, well.. this does impact arrays of Move-only types too! I'd certainly consider it to be in the same vein of issue, e.g.:

pub fn f(mut x: [Vec<u32>; 2]) {
    // three drops are emit: x[0], x[0], x[1], and an allocation for the new vector
    x[0] = vec![1, 2];
}

Hopefully the developers working on LLVM can eventually get a working implementation of what is described in that discussion, good luck.

ghost

ghost commented on Jun 17, 2022

@ghost

@rustbot label A-LLVM

added
A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.
on Jun 17, 2022
added
T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.
on Apr 5, 2023

1 remaining item

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.A-codegenArea: Code generationT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @erikdesjardins@workingjubilee@rustbot@Noratrieb

        Issue actions

          Unobservable vec, and arrays writes always always emit for size >= 3 · Issue #96497 · rust-lang/rust