-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Stage 2 performance regression in debug: big structs as function parameters are passed by value #12638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This pretty much the same as #12215, AstGen decides to do a huge copy and rest of the pipeline just goes with it. The array example works as expected since the LLVM backend is able to elide the copy, as it also manages to do for the struct example if you remove the timing calls: define internal fastcc void @a.main() unnamed_addr #0 !dbg !4356 {
Entry:
%0 = alloca %a.Test, align 4
%1 = bitcast %a.Test* %0 to i8*, !dbg !4369
call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %1, i8* align 4 bitcast (%a.Test* @1 to i8*), i64 4000000, i1 false), !dbg !4369
call void @llvm.dbg.declare(metadata %a.Test* %0, metadata !4359, metadata !DIExpression()), !dbg !4369
%2 = call fastcc i32 @a.Test.passByValueTest(%a.Test* %0), !dbg !4370
ret void, !dbg !4370
} |
Looks like the same thing I found here too: #12604 I asked about it on discord and I guess the language spec doesn't make any guarantees that large values will be passed by reference. So if you didn't want it to be pass by value it is technically a bug in the above code. |
This most definitely is a compiler bug. Zig should be expected to select the optimal method. The documentation states that "When these types are passed as parameters, Zig may choose to copy and pass by value, or pass by reference, whichever way Zig decides will be faster.". Of course, this isn't mentioned in the language spec, as it is a compiler detail. |
I thought this too when I first read your issue. I think that the same underlying cause (#12215) could be affecting both of us. I'll try and reduce my code at some point, but it's a fairly sizable codebase, so no promises :) A starting point for me could be comparing the output from stage1 and stage2 for my code and seeing what the do similarly and differently. |
Note that the issue is not that Zig isn't passing the struct by reference but that it is first copying it to the stack. |
Actually just tested your code, I can't reproduce the performance problem. You built it in debug mode for both, so it didn't get optimised. Building it with |
It could still be argued that this should be optimised by Zig to a pass-by-reference without relying on LLVM, and so debug mode would have much better performance, though I'll leave that up to the core team to decide. |
@ominitay Good catch. That's so stupid. |
Ok, I was not able to reproduce the performance problems in release. |
My program runs around 10% percent slower in release since switching to stage2. I can get back most of the performance by explicitly using const* in some places. It makes me a bit insecure whether I should wait for the compiler to get better, because I technically don't need reference semantics, or whether I should just use const* like in C++ when I think it is faster. |
Hi, not sure if this is exactly the same problem but I had a significant change (at least 10x longer to generate terrain) in performance when switching to the new compiler, and by hoisting a big array it went back to roughly the same as before. This was when building in debug, haven't tried any other configs. The commit that fixed it (not the thread naming) : |
Thanks for the data point @Srekel. I don't think we're going to get to this by 0.10.0 unfortunately but it's certainly an important issue for the project. |
Ok, that's fine by me :) (since I got around it with my l33t optimization skills) |
Missing elision when loading a by-ref field in airStructFieldVal. I would have expected LLVM to be able to optimize away these temporaries since we don't leak pointers to them and they are fed straight from def to use, but empirically it does not. Resolves ziglang#12713 Resolves ziglang#12638
Missing elision when loading a by-ref field in airStructFieldVal. I would have expected LLVM to be able to optimize away these temporaries since we don't leak pointers to them and they are fed straight from def to use, but empirically it does not. Resolves ziglang#12713 Resolves ziglang#12638
Missing elision when loading a by-ref field in airStructFieldVal. I would have expected LLVM to be able to optimize away these temporaries since we don't leak pointers to them and they are fed straight from def to use, but empirically it does not. Resolves ziglang#12713 Resolves ziglang#12638
Missing elision when loading a by-ref field in airStructFieldVal. I would have expected LLVM to be able to optimize away these temporaries since we don't leak pointers to them and they are fed straight from def to use, but empirically it does not. Resolves ziglang#12713 Resolves ziglang#12638
Missing elision when loading a by-ref field in airStructFieldVal. I would have expected LLVM to be able to optimize away these temporaries since we don't leak pointers to them and they are fed straight from def to use, but empirically it does not. Resolves ziglang#12713 Resolves ziglang#12638
Missing elision when loading a by-ref field in airStructFieldVal. I would have expected LLVM to be able to optimize away these temporaries since we don't leak pointers to them and they are fed straight from def to use, but empirically it does not. Resolves ziglang#12713 Resolves ziglang#12638
Missing elision when loading a by-ref field in airStructFieldVal. I would have expected LLVM to be able to optimize away these temporaries since we don't leak pointers to them and they are fed straight from def to use, but empirically it does not. Resolves ziglang#12713 Resolves ziglang#12638
Missing elision when loading a by-ref field in airStructFieldVal. I would have expected LLVM to be able to optimize away these temporaries since we don't leak pointers to them and they are fed straight from def to use, but empirically it does not. Resolves ziglang#12713 Resolves ziglang#12638
Missing elision when loading a by-ref field in airStructFieldVal. I would have expected LLVM to be able to optimize away these temporaries since we don't leak pointers to them and they are fed straight from def to use, but empirically it does not. Resolves ziglang#12713 Resolves ziglang#12638
Missing elision when loading a by-ref field in airStructFieldVal. I would have expected LLVM to be able to optimize away these temporaries since we don't leak pointers to them and they are fed straight from def to use, but empirically it does not. Resolves ziglang#12713 Resolves ziglang#12638
Missing elision when loading a by-ref field in airStructFieldVal. I would have expected LLVM to be able to optimize away these temporaries since we don't leak pointers to them and they are fed straight from def to use, but empirically it does not. Resolves ziglang#12713 Resolves ziglang#12638
Missing elision when loading a by-ref field in airStructFieldVal. I would have expected LLVM to be able to optimize away these temporaries since we don't leak pointers to them and they are fed straight from def to use, but empirically it does not. Resolves ziglang#12713 Resolves ziglang#12638
Missing elision when loading a by-ref field in airStructFieldVal. I would have expected LLVM to be able to optimize away these temporaries since we don't leak pointers to them and they are fed straight from def to use, but empirically it does not. Resolves ziglang#12713 Resolves ziglang#12638
Adds optimizations for by-ref types to: - .struct_field_val - .slice_elem_val - .ptr_elem_val I would have expected LLVM to be able to optimize away these temporaries since we don't leak pointers to them and they are fed straight from def to use, but empirically it does not. Resolves #12713 Resolves #12638
Uh oh!
There was an error while loading. Please reload this page.
Zig Version
0.10.0-dev.3685+dae7aeb33
Steps to Reproduce
Put the following code in
test.zig
:Run it with
zig run test.zig -Drelease-fast=true
orzig run -fstage1 test.zig -Drelease-fast=true
Edit:
-Drelease-fast=true
doesn't do anything forzig run
. The performance issue only happens in debug.Expected Behavior
The documentation says:
"Structs, unions, and arrays can sometimes be more efficiently passed as a reference, since a copy could be arbitrarily expensive depending on the size. When these types are passed as parameters, Zig may choose to copy and pass by value, or pass by reference, whichever way Zig decides will be faster."
I expect that both stage1 and stage2 should pass the struct by reference, because at this size(4 MB) a copy clearly is less performant.
Actual Behavior
stage2 is significantly slower:
godbolt further reveals that stage2 is doing a
memcpy
, while stage1 doesn't.So the performance regression is definitely caused by a pass by value.
This might be related to #12518, but I do not have the time to check that(it's a lot of code and no minimal reproducible example).
Bonus
It does work with just a big array:
Output:
The text was updated successfully, but these errors were encountered: