Skip to content

Stage 2 performance regression in debug: big structs as function parameters are passed by value #12638

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
IntegratedQuantum opened this issue Aug 26, 2022 · 14 comments
Labels
frontend Tokenization, parsing, AstGen, Sema, and Liveness. optimization regression It worked in a previous version of Zig, but stopped working.
Milestone

Comments

@IntegratedQuantum
Copy link
Contributor

IntegratedQuantum commented Aug 26, 2022

Zig Version

0.10.0-dev.3685+dae7aeb33

Steps to Reproduce

Put the following code in test.zig:

const std = @import("std");

const Test = struct {
    bigArray: [1000000]u32 = undefined,
    pub fn passByValueTest(self: Test) u32 {
        return self.bigArray[0];
    }
};

pub fn main() void {
    var v = Test{};
    var t1 = std.time.nanoTimestamp();
    _ = v.passByValueTest();
    var t2 = std.time.nanoTimestamp();
    std.log.info("{}", .{t2 -% t1});
}

Run it with zig run test.zig -Drelease-fast=true or zig run -fstage1 test.zig -Drelease-fast=true

Edit: -Drelease-fast=true doesn't do anything for zig run. The performance issue only happens in debug.

Expected Behavior

The documentation says:

"Structs, unions, and arrays can sometimes be more efficiently passed as a reference, since a copy could be arbitrarily expensive depending on the size. When these types are passed as parameters, Zig may choose to copy and pass by value, or pass by reference, whichever way Zig decides will be faster."

I expect that both stage1 and stage2 should pass the struct by reference, because at this size(4 MB) a copy clearly is less performant.

Actual Behavior

stage2 is significantly slower:

$ zig run -fstage1 test.zig -Drelease-fast=true
info: 300
$ zig run test.zig -Drelease-fast=true
info: 2245980

godbolt further reveals that stage2 is doing a memcpy, while stage1 doesn't.
So the performance regression is definitely caused by a pass by value.

This might be related to #12518, but I do not have the time to check that(it's a lot of code and no minimal reproducible example).

Bonus

It does work with just a big array:

 const std = @import("std");

pub fn passByValueTest(bigArray: [1000000]u32) u32 {
    return bigArray[0];
}

pub fn main() void {
    var v: [1000000]u32 = undefined;
    var t1 = std.time.nanoTimestamp();
    _ = passByValueTest(v);
    var t2 = std.time.nanoTimestamp();
    std.log.info("{}", .{t2 -% t1});
}

Output:

$ zig run -fstage1 test.zig -Drelease-fast=true
info: 340
$ zig run test.zig -Drelease-fast=true
info: 300
@IntegratedQuantum IntegratedQuantum added the bug Observed behavior contradicts documented or intended behavior label Aug 26, 2022
@Vexu
Copy link
Member

Vexu commented Aug 26, 2022

This pretty much the same as #12215, AstGen decides to do a huge copy and rest of the pipeline just goes with it.

The array example works as expected since the LLVM backend is able to elide the copy, as it also manages to do for the struct example if you remove the timing calls:

define internal fastcc void @a.main() unnamed_addr #0 !dbg !4356 {
Entry:
  %0 = alloca %a.Test, align 4
  %1 = bitcast %a.Test* %0 to i8*, !dbg !4369
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %1, i8* align 4 bitcast (%a.Test* @1 to i8*), i64 4000000, i1 false), !dbg !4369
  call void @llvm.dbg.declare(metadata %a.Test* %0, metadata !4359, metadata !DIExpression()), !dbg !4369
  %2 = call fastcc i32 @a.Test.passByValueTest(%a.Test* %0), !dbg !4370
  ret void, !dbg !4370
}

@Vexu Vexu added the frontend Tokenization, parsing, AstGen, Sema, and Liveness. label Aug 26, 2022
@Vexu Vexu added this to the 0.10.0 milestone Aug 26, 2022
@jeffkdev
Copy link

Looks like the same thing I found here too: #12604 I asked about it on discord and I guess the language spec doesn't make any guarantees that large values will be passed by reference. So if you didn't want it to be pass by value it is technically a bug in the above code.

@ominitay
Copy link
Contributor

This most definitely is a compiler bug. Zig should be expected to select the optimal method. The documentation states that "When these types are passed as parameters, Zig may choose to copy and pass by value, or pass by reference, whichever way Zig decides will be faster.". Of course, this isn't mentioned in the language spec, as it is a compiler detail.

@ominitay
Copy link
Contributor

@IntegratedQuantum

This might be related to #12518, but I do not have the time to check that(it's a lot of code and no minimal reproducible example).

I thought this too when I first read your issue. I think that the same underlying cause (#12215) could be affecting both of us. I'll try and reduce my code at some point, but it's a fairly sizable codebase, so no promises :)

A starting point for me could be comparing the output from stage1 and stage2 for my code and seeing what the do similarly and differently.

@Vexu
Copy link
Member

Vexu commented Aug 26, 2022

Note that the issue is not that Zig isn't passing the struct by reference but that it is first copying it to the stack.

@jeffkdev
Copy link

This most definitely is a compiler bug. Zig should be expected to select the optimal method. The documentation states that "When these types are passed as parameters, Zig may choose to copy and pass by value, or pass by reference, whichever way Zig decides will be faster.". Of course, this isn't mentioned in the language spec, as it is a compiler detail.

Just letting you know what I was told. This was part of the conversation if you are curious to read more. It was referred to as a "breaking change" for stage2. You are expected to account for the fact the value may be copied if you aren't using a const pointer.
image

@ominitay
Copy link
Contributor

Actually just tested your code, I can't reproduce the performance problem. You built it in debug mode for both, so it didn't get optimised. Building it with -OReleaseFast yields near-identical performance. @Vexu this should probably be closed.

@Vexu Vexu modified the milestones: 0.10.0, 0.11.0 Aug 26, 2022
@ominitay
Copy link
Contributor

It could still be argued that this should be optimised by Zig to a pass-by-reference without relying on LLVM, and so debug mode would have much better performance, though I'll leave that up to the core team to decide.

@IntegratedQuantum
Copy link
Contributor Author

@ominitay Good catch. That's so stupid.
But in this case the example is actually so simple that the entire function call gets eliminated. I'll try to find an example where it doesn't get optimized away.

@IntegratedQuantum
Copy link
Contributor Author

Ok, I was not able to reproduce the performance problems in release.
But the issue Vexu mentioned remains.

@IntegratedQuantum IntegratedQuantum changed the title Stage 2 performance regression: big structs as function parameters are passed by value Stage 2 performance regression in debug: big structs as function parameters are passed by value Aug 27, 2022
@Opioid
Copy link

Opioid commented Sep 14, 2022

My program runs around 10% percent slower in release since switching to stage2. I can get back most of the performance by explicitly using const* in some places. It makes me a bit insecure whether I should wait for the compiler to get better, because I technically don't need reference semantics, or whether I should just use const* like in C++ when I think it is faster.

@Srekel
Copy link

Srekel commented Sep 30, 2022

Hi, not sure if this is exactly the same problem but I had a significant change (at least 10x longer to generate terrain) in performance when switching to the new compiler, and by hoisting a big array it went back to roughly the same as before. This was when building in debug, haven't tried any other configs.

The commit that fixed it (not the thread naming) :
Srekel/tides-of-revival@0077041

@andrewrk andrewrk added optimization regression It worked in a previous version of Zig, but stopped working. and removed bug Observed behavior contradicts documented or intended behavior labels Sep 30, 2022
@andrewrk
Copy link
Member

Thanks for the data point @Srekel. I don't think we're going to get to this by 0.10.0 unfortunately but it's certainly an important issue for the project.

@Srekel
Copy link

Srekel commented Sep 30, 2022

Ok, that's fine by me :) (since I got around it with my l33t optimization skills)

topolarity pushed a commit to topolarity/zig that referenced this issue Oct 5, 2022
Missing elision when loading a by-ref field in airStructFieldVal.
I would have expected LLVM to be able to optimize away these
temporaries since we don't leak pointers to them and they are fed
straight from def to use, but empirically it does not.

Resolves ziglang#12713
Resolves ziglang#12638
topolarity added a commit to topolarity/zig that referenced this issue Oct 5, 2022
Missing elision when loading a by-ref field in airStructFieldVal.
I would have expected LLVM to be able to optimize away these
temporaries since we don't leak pointers to them and they are fed
straight from def to use, but empirically it does not.

Resolves ziglang#12713
Resolves ziglang#12638
topolarity added a commit to topolarity/zig that referenced this issue Oct 5, 2022
Missing elision when loading a by-ref field in airStructFieldVal.
I would have expected LLVM to be able to optimize away these
temporaries since we don't leak pointers to them and they are fed
straight from def to use, but empirically it does not.

Resolves ziglang#12713
Resolves ziglang#12638
topolarity added a commit to topolarity/zig that referenced this issue Oct 5, 2022
Missing elision when loading a by-ref field in airStructFieldVal.
I would have expected LLVM to be able to optimize away these
temporaries since we don't leak pointers to them and they are fed
straight from def to use, but empirically it does not.

Resolves ziglang#12713
Resolves ziglang#12638
xxxbxxx pushed a commit to xxxbxxx/zig that referenced this issue Oct 9, 2022
Missing elision when loading a by-ref field in airStructFieldVal.
I would have expected LLVM to be able to optimize away these
temporaries since we don't leak pointers to them and they are fed
straight from def to use, but empirically it does not.

Resolves ziglang#12713
Resolves ziglang#12638
topolarity added a commit to topolarity/zig that referenced this issue Oct 12, 2022
Missing elision when loading a by-ref field in airStructFieldVal.
I would have expected LLVM to be able to optimize away these
temporaries since we don't leak pointers to them and they are fed
straight from def to use, but empirically it does not.

Resolves ziglang#12713
Resolves ziglang#12638
topolarity added a commit to topolarity/zig that referenced this issue Oct 13, 2022
Missing elision when loading a by-ref field in airStructFieldVal.
I would have expected LLVM to be able to optimize away these
temporaries since we don't leak pointers to them and they are fed
straight from def to use, but empirically it does not.

Resolves ziglang#12713
Resolves ziglang#12638
topolarity added a commit to topolarity/zig that referenced this issue Oct 19, 2022
Missing elision when loading a by-ref field in airStructFieldVal.
I would have expected LLVM to be able to optimize away these
temporaries since we don't leak pointers to them and they are fed
straight from def to use, but empirically it does not.

Resolves ziglang#12713
Resolves ziglang#12638
topolarity added a commit to topolarity/zig that referenced this issue Oct 20, 2022
Missing elision when loading a by-ref field in airStructFieldVal.
I would have expected LLVM to be able to optimize away these
temporaries since we don't leak pointers to them and they are fed
straight from def to use, but empirically it does not.

Resolves ziglang#12713
Resolves ziglang#12638
topolarity added a commit to topolarity/zig that referenced this issue Nov 1, 2022
Missing elision when loading a by-ref field in airStructFieldVal.
I would have expected LLVM to be able to optimize away these
temporaries since we don't leak pointers to them and they are fed
straight from def to use, but empirically it does not.

Resolves ziglang#12713
Resolves ziglang#12638
topolarity added a commit to topolarity/zig that referenced this issue Nov 1, 2022
Missing elision when loading a by-ref field in airStructFieldVal.
I would have expected LLVM to be able to optimize away these
temporaries since we don't leak pointers to them and they are fed
straight from def to use, but empirically it does not.

Resolves ziglang#12713
Resolves ziglang#12638
topolarity added a commit to topolarity/zig that referenced this issue Nov 1, 2022
Missing elision when loading a by-ref field in airStructFieldVal.
I would have expected LLVM to be able to optimize away these
temporaries since we don't leak pointers to them and they are fed
straight from def to use, but empirically it does not.

Resolves ziglang#12713
Resolves ziglang#12638
topolarity added a commit to topolarity/zig that referenced this issue Nov 2, 2022
Missing elision when loading a by-ref field in airStructFieldVal.
I would have expected LLVM to be able to optimize away these
temporaries since we don't leak pointers to them and they are fed
straight from def to use, but empirically it does not.

Resolves ziglang#12713
Resolves ziglang#12638
andrewrk pushed a commit that referenced this issue Nov 12, 2022
Adds optimizations for by-ref types to:
  - .struct_field_val
  - .slice_elem_val
  - .ptr_elem_val

I would have expected LLVM to be able to optimize away these
temporaries since we don't leak pointers to them and they are fed
straight from def to use, but empirically it does not.

Resolves #12713
Resolves #12638
@andrewrk andrewrk modified the milestones: 0.11.0, 0.10.1 Jan 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
frontend Tokenization, parsing, AstGen, Sema, and Liveness. optimization regression It worked in a previous version of Zig, but stopped working.
Projects
None yet
Development

No branches or pull requests

7 participants