-
Notifications
You must be signed in to change notification settings - Fork 13.3k
LLVM IR for generic functions sometimes fails to inline post MergeFuncs with LTO #97552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
See also #89389 |
I think this is just due to the missing |
The MergeFunc merges functions with compatible but different struct return types. When linking modules for LTO, callsites to merged functions will now include extra casts. Such calls cannot be inlined unless placed into the form expected by the inliner, but InstCombine doesn't have necessary transform: I would expect that this would be a non-issue with opaque pointers. |
I can't find deref in your example locally, even with opt-level=1: _ZN2tt12do_something17h17355b1b96872ed5E:
.Lfunc_begin9:
.seh_proc _ZN2tt12do_something17h17355b1b96872ed5E
sub rsp, 40
.seh_stackalloc 40
.seh_endprologue
cmp qword ptr [rcx + 16], 0
je .LBB9_5
mov rax, qword ptr [rcx]
mov r8, qword ptr [rax + 16]
cmp r8, 4
jne .LBB9_2
mov rcx, qword ptr [rax]
lea rdx, [rip + anon.8da7dae456594b2e933d3f3e66d20965.4]
call memcmp
test eax, eax
sete al
jmp .LBB9_4
.LBB9_2:
xor eax, eax
.LBB9_4:
add rsp, 40
ret
.LBB9_5:
lea r8, [rip + anon.8da7dae456594b2e933d3f3e66d20965.3]
xor ecx, ecx
xor edx, edx
call _ZN4core9panicking18panic_bounds_check17h9bd8ac36ac287196E
ud2
.Lfunc_end9: opt-level=2: _ZN2tt12do_something17h17355b1b96872ed5E:
.Lfunc_begin6:
.seh_proc _ZN2tt12do_something17h17355b1b96872ed5E
sub rsp, 40
.seh_stackalloc 40
.seh_endprologue
cmp qword ptr [rcx + 16], 0
je .LBB6_5
mov rax, qword ptr [rcx]
cmp qword ptr [rax + 16], 4
jne .LBB6_2
mov rax, qword ptr [rax]
cmp dword ptr [rax], 1953719668
sete al
jmp .LBB6_4
.LBB6_2:
xor eax, eax
.LBB6_4:
add rsp, 40
ret
.LBB6_5:
lea r8, [rip + anon.8da7dae456594b2e933d3f3e66d20965.3]
xor ecx, ecx
xor edx, edx
call _ZN4core9panicking18panic_bounds_check17h9bd8ac36ac287196E
ud2
.Lfunc_end6: |
hm, yes, that also results in no calls to I used this Cargo.toml:
And ran |
Try without
|
I wonder if this is related at all to the issue mentioned by @nikic here #96624 (comment) (which motivated that PR). This is different in that it happens on LTO, though, although I think the builds I was doing where that came up may have had LTO as well... |
The minimal LLVM reproducer (opt is able to inline the first call in h, but not the second one, for reasons described in #97552 (comment)): ; ModuleID = 'llvm-link'
source_filename = "llvm-link"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
@g = unnamed_addr alias { [0 x i64]*, i64 } (), bitcast ({ [0 x i8]*, i64 } ()* @f to { [0 x i64]*, i64 } ()*)
define { [0 x i8]*, i64 } @f() unnamed_addr {
start:
ret { [0 x i8]*, i64 } undef
}
define void @h() unnamed_addr {
start:
%0 = tail call { [0 x i8]*, i64 } @f()
%1 = tail call { [0 x i64]*, i64 } @g()
ret void
} |
Note that we only update to nightly-2022-04-03 due to a problem in newer versions that leads to not inlining Vec::deref and thereby ruining performance. See rust-lang/rust#97552 for details.
Assigning priority as discussed in the Zulip thread of the Prioritization Working Group. @rustbot label -I-prioritize +P-high +T-compiler |
Add `#[inline]` to `Vec`'s `Deref/DerefMut` This should help rust-lang#97552 (although I haven't verified).
visiting for T-compiler 2022 Q3 P-high review Retitled issue since the problem with Vec itself has been "addressed" by adding Also downgrading priority since this is no longer a critical issue for the noted hot-path in (also: We may want to form a project group or something about shifting to using opaque pointers in our LLVM backend.) @rustbot label: -P-high +P-medium |
(also @nikic notes that we already use opaque pointers, so this issue may actually be resolved? We should test that hypothesis.) |
I just went and confirmed the hypothesis; namely, I checked 1. that I could reproduce the original problem atop nightly-2022-04-04, and 2. that even after I locally removed the |
For some reason, the compiler decides to not inline
Vec::deref
anymore if LTO is enabled, which ruins performance in my case. I tried to force the compiler to inline it via-Znew-llvm-pass-manager=no -Cinline-threshold=N
, but even N=10000 (resulting in insane compile times and binary sizes) doesn't convince the compiler to inline the function. This behavior changed between Rust version 76d770a (inlined) and 6af09d2 (not inlined).Code
I tried to create a more or less minimal example that reproduces the problem, which resulted in the following Rust program:
I expected to see this happen: the
Deref
implementation ofVec
is inlined regardless of whether LTO is used or not.Instead, this happened: If LTO is enabled, the
Deref
implementation ofVec
is not inlined. If LTO is disabled, it is inlined. That is, with LTO, the generated assembly code looks like this:Version it worked on
rustc --version --verbose
:In this version,
Vec::deref
is inlined in every Rust program I have (LTO=on). In the example above, it is also inlined regardless of whether LTO is used or not.Version with regression
rustc --version --verbose
:In this version,
Vec::deref
is never inlined as it seems (LTO=on). In the example above, it is only inlined if LTO is disabled.The text was updated successfully, but these errors were encountered: