-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Weird copy happening during initialization #56191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
You should be able to reproduce any of these with C++ code in But also, it helps to explain what is weird here. In this case, my guess is the reads back out from the stack? Also, we should ignore the lack of SSE usage in the saner example, at least for now, right? cc @rust-lang/wg-codegen Can someone look at the LLVM IR here and try to figure out if there's anything we could tell LLVM that would make it not do this? |
The code we generate for the structure and its initalization looks something like this: %LayoutPrimitiveInfo2 = type { [0 x i32], [7 x float], [0 x i16], i16, [1 x i16] }
; in a function
; loop to initialize all floats to 0
; then...
%9 = getelementptr inbounds %LayoutPrimitiveInfo2, %LayoutPrimitiveInfo2* %0, i32 0, i32 3
store i16 0, i16* %9, align 2
; and implicitly
; %10 = getelementptr inbounds %LayoutPrimitiveInfo2, %LayoutPrimitiveInfo2* %0, i32 0, i32 4
; store [1 x i16] undef, [1 x i16]* %10, align 2 this %10 = getelementptr inbounds %LayoutPrimitiveInfo2, %LayoutPrimitiveInfo2* %0, i32 0, i32 4
store [1 x i16] zersoinitializer, [1 x i16]* %10, align 2 makes GVN generate a memset for the whole structure, rather than a part of it, which then makes instcombine happier and prone to optimizing out stuff better. |
Hmm, can we indicate to LLVM that it could put anything in the padding? |
EDIT: my bad, was looking at the wrong thing. LLVM is able to reach the code below with Wrong stuffFWIW we do not reach the "ideal" codegen in either scenario. Ideally the code would look like this do_ants:
# %bb.0:
subq $216, %rsp
movq $0, (%rsp)
xorps %xmm0, %xmm0
movups %xmm0, 200(%rsp)
movups %xmm0, 184(%rsp)
movq %rsp, %rdi
callq ext
addq $216, %rsp
retq which can be achieved by replacing initialization with
LLVM already knows (the padding is @nikic might be interested, though. |
Does it know it could write there, though? That is, |
Yes. |
I don't think that memset formation is the right place to optimize here. In the general case extending a partial memset to a memset of the full structure may not necessarily be advantageous. It would be preferable to handle this during memset-memcpy forwarding. I have something like https://gist.github.com/nikic/3ca2f08c70bf8d9c0a634d42d5533f17 in mind, which would be enough to cover this case. This will fall short though if there is interior padding. |
The equivalent c++ generates saner code regardless of the type of is_backface_visible #include <stdint.h>
#include <cstring>
struct LayoutPrimitiveInfo2 {
LayoutPrimitiveInfo2() {
memset(rect, 0, sizeof(rect));
is_backface_visible = 0;
}
float rect[7];
uint16_t is_backface_visible;
};
struct SpecificDisplayItem2 {
static SpecificDisplayItem2 PopStackingContext() {
SpecificDisplayItem2 ret;
ret.p.a.disc = 0;
return ret;
}
union {
struct A {
uint64_t disc;
} a;
struct B {
uint64_t disc;
double f[22];
} b;
} p;
};
struct Outer {
SpecificDisplayItem2 item;
LayoutPrimitiveInfo2 info;
};
extern int do_item(Outer &o);
void do_ants() {
Outer o = {
SpecificDisplayItem2::PopStackingContext(),
LayoutPrimitiveInfo2(),
};
do_item(o);
}
do_ants(): # @do_ants()
sub rsp, 216
mov qword ptr [rsp], 0
xorps xmm0, xmm0
movups xmmword ptr [rsp + 198], xmm0
movups xmmword ptr [rsp + 184], xmm0
mov rdi, rsp
call do_item(Outer&)
add rsp, 216
ret |
I've submitted https://reviews.llvm.org/D55120. Ultimately this is just papering things over though. I think a more general solution would be to instead check if a memcpy can be hoisted upwards past a store/memset chain and absorbed into an alloca. Basically what call slot optimization does, just not for calls. |
Patch landed upstream, so this is waiting on an LLVM update now. |
Would it be possible to cherry pick this into rust's llvm ahead of a new
llvm release?
…On Sat, Dec 8, 2018, 12:51 PM Nikita Popov ***@***.*** wrote:
Patch landed upstream, so this is waiting on an LLVM update now.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#56191 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAUTbZhNpxUa7cNhNvFhLPZi4JaX9voTks5u2_wHgaJpZM4YxPtq>
.
|
Yes, I think it would be reasonable to cherry-pick this change. I'd like to wait for a few miscompile fixes to land first, so we can batch them together. |
@nikic the llvm patch has baked for a while now. Do you think you could cherry pick it into rust's llvm? |
Fixed by #57675 We now get: playground::do_ants: # @playground::do_ants
# %bb.0:
subq $216, %rsp
movq $0, (%rsp)
xorps %xmm0, %xmm0
movups %xmm0, 198(%rsp)
movups %xmm0, 184(%rsp)
movq %rsp, %rdi
callq *ext@GOTPCREL(%rip)
addq $216, %rsp
retq |
This code:
compiles to
When the u16 is changed to a u32 we get the saner:
The text was updated successfully, but these errors were encountered: