Stage 2 performance regression in debug: big structs as function parameters are passed by value #12638

IntegratedQuantum · 2022-08-26T11:41:38Z

Zig Version

0.10.0-dev.3685+dae7aeb33

Steps to Reproduce

Put the following code in test.zig:

const std = @import("std");

const Test = struct {
    bigArray: [1000000]u32 = undefined,
    pub fn passByValueTest(self: Test) u32 {
        return self.bigArray[0];
    }
};

pub fn main() void {
    var v = Test{};
    var t1 = std.time.nanoTimestamp();
    _ = v.passByValueTest();
    var t2 = std.time.nanoTimestamp();
    std.log.info("{}", .{t2 -% t1});
}

Run it with zig run test.zig -Drelease-fast=true or zig run -fstage1 test.zig -Drelease-fast=true

Edit: -Drelease-fast=true doesn't do anything for zig run. The performance issue only happens in debug.

Expected Behavior

The documentation says:

"Structs, unions, and arrays can sometimes be more efficiently passed as a reference, since a copy could be arbitrarily expensive depending on the size. When these types are passed as parameters, Zig may choose to copy and pass by value, or pass by reference, whichever way Zig decides will be faster."

I expect that both stage1 and stage2 should pass the struct by reference, because at this size(4 MB) a copy clearly is less performant.

Actual Behavior

stage2 is significantly slower:

$ zig run -fstage1 test.zig -Drelease-fast=true
info: 300
$ zig run test.zig -Drelease-fast=true
info: 2245980

godbolt further reveals that stage2 is doing a memcpy, while stage1 doesn't.
So the performance regression is definitely caused by a pass by value.

This might be related to #12518, but I do not have the time to check that(it's a lot of code and no minimal reproducible example).

Bonus

It does work with just a big array:

 const std = @import("std");

pub fn passByValueTest(bigArray: [1000000]u32) u32 {
    return bigArray[0];
}

pub fn main() void {
    var v: [1000000]u32 = undefined;
    var t1 = std.time.nanoTimestamp();
    _ = passByValueTest(v);
    var t2 = std.time.nanoTimestamp();
    std.log.info("{}", .{t2 -% t1});
}

Output:

$ zig run -fstage1 test.zig -Drelease-fast=true
info: 340
$ zig run test.zig -Drelease-fast=true
info: 300

The text was updated successfully, but these errors were encountered:

Vexu · 2022-08-26T12:22:50Z

This pretty much the same as #12215, AstGen decides to do a huge copy and rest of the pipeline just goes with it.

The array example works as expected since the LLVM backend is able to elide the copy, as it also manages to do for the struct example if you remove the timing calls:

define internal fastcc void @a.main() unnamed_addr #0 !dbg !4356 {
Entry:
  %0 = alloca %a.Test, align 4
  %1 = bitcast %a.Test* %0 to i8*, !dbg !4369
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %1, i8* align 4 bitcast (%a.Test* @1 to i8*), i64 4000000, i1 false), !dbg !4369
  call void @llvm.dbg.declare(metadata %a.Test* %0, metadata !4359, metadata !DIExpression()), !dbg !4369
  %2 = call fastcc i32 @a.Test.passByValueTest(%a.Test* %0), !dbg !4370
  ret void, !dbg !4370
}

jeffkdev · 2022-08-26T17:49:26Z

Looks like the same thing I found here too: #12604 I asked about it on discord and I guess the language spec doesn't make any guarantees that large values will be passed by reference. So if you didn't want it to be pass by value it is technically a bug in the above code.

ominitay · 2022-08-26T18:57:39Z

This most definitely is a compiler bug. Zig should be expected to select the optimal method. The documentation states that "When these types are passed as parameters, Zig may choose to copy and pass by value, or pass by reference, whichever way Zig decides will be faster.". Of course, this isn't mentioned in the language spec, as it is a compiler detail.

ominitay · 2022-08-26T19:03:27Z

@IntegratedQuantum

This might be related to #12518, but I do not have the time to check that(it's a lot of code and no minimal reproducible example).

I thought this too when I first read your issue. I think that the same underlying cause (#12215) could be affecting both of us. I'll try and reduce my code at some point, but it's a fairly sizable codebase, so no promises :)

A starting point for me could be comparing the output from stage1 and stage2 for my code and seeing what the do similarly and differently.

Vexu · 2022-08-26T19:06:39Z

Note that the issue is not that Zig isn't passing the struct by reference but that it is first copying it to the stack.

jeffkdev · 2022-08-26T19:14:23Z

This most definitely is a compiler bug. Zig should be expected to select the optimal method. The documentation states that "When these types are passed as parameters, Zig may choose to copy and pass by value, or pass by reference, whichever way Zig decides will be faster.". Of course, this isn't mentioned in the language spec, as it is a compiler detail.

Just letting you know what I was told. This was part of the conversation if you are curious to read more. It was referred to as a "breaking change" for stage2. You are expected to account for the fact the value may be copied if you aren't using a const pointer.

ominitay · 2022-08-26T20:00:38Z

Actually just tested your code, I can't reproduce the performance problem. You built it in debug mode for both, so it didn't get optimised. Building it with -OReleaseFast yields near-identical performance. @Vexu this should probably be closed.

ominitay · 2022-08-26T20:04:01Z

It could still be argued that this should be optimised by Zig to a pass-by-reference without relying on LLVM, and so debug mode would have much better performance, though I'll leave that up to the core team to decide.

IntegratedQuantum · 2022-08-27T07:09:11Z

@ominitay Good catch. That's so stupid.
But in this case the example is actually so simple that the entire function call gets eliminated. I'll try to find an example where it doesn't get optimized away.

IntegratedQuantum · 2022-08-27T07:56:56Z

Ok, I was not able to reproduce the performance problems in release.
But the issue Vexu mentioned remains.

Opioid · 2022-09-14T07:27:36Z

My program runs around 10% percent slower in release since switching to stage2. I can get back most of the performance by explicitly using const* in some places. It makes me a bit insecure whether I should wait for the compiler to get better, because I technically don't need reference semantics, or whether I should just use const* like in C++ when I think it is faster.

Srekel · 2022-09-30T19:31:30Z

Hi, not sure if this is exactly the same problem but I had a significant change (at least 10x longer to generate terrain) in performance when switching to the new compiler, and by hoisting a big array it went back to roughly the same as before. This was when building in debug, haven't tried any other configs.

The commit that fixed it (not the thread naming) :
Srekel/tides-of-revival@0077041

andrewrk · 2022-09-30T19:59:38Z

Thanks for the data point @Srekel. I don't think we're going to get to this by 0.10.0 unfortunately but it's certainly an important issue for the project.

Srekel · 2022-09-30T20:04:47Z

Ok, that's fine by me :) (since I got around it with my l33t optimization skills)

Missing elision when loading a by-ref field in airStructFieldVal. I would have expected LLVM to be able to optimize away these temporaries since we don't leak pointers to them and they are fed straight from def to use, but empirically it does not. Resolves ziglang#12713 Resolves ziglang#12638

Adds optimizations for by-ref types to: - .struct_field_val - .slice_elem_val - .ptr_elem_val I would have expected LLVM to be able to optimize away these temporaries since we don't leak pointers to them and they are fed straight from def to use, but empirically it does not. Resolves #12713 Resolves #12638

IntegratedQuantum added the bug Observed behavior contradicts documented or intended behavior label Aug 26, 2022

Vexu added the frontend Tokenization, parsing, AstGen, Sema, and Liveness. label Aug 26, 2022

Vexu added this to the 0.10.0 milestone Aug 26, 2022

Vexu modified the milestones: 0.10.0, 0.11.0 Aug 26, 2022

IntegratedQuantum changed the title ~~Stage 2 performance regression: big structs as function parameters are passed by value~~ Stage 2 performance regression in debug: big structs as function parameters are passed by value Aug 27, 2022

This was referenced Aug 28, 2022

gen_zig.py: add compile time branches for self-hosted support floooh/sokol#710

Merged

Updated examples to be compatible with self-hosted floooh/sokol-zig#26

Merged

andrewrk added optimization regression It worked in a previous version of Zig, but stopped working. and removed bug Observed behavior contradicts documented or intended behavior labels Sep 30, 2022

nektro mentioned this issue Oct 10, 2022

eliminate hidden pass-by-reference footguns #5973

Open

mlarouche mentioned this issue Oct 15, 2022

Get third party project "zigimg" working successfully with self-hosted. #12652

Closed

andrewrk mentioned this issue Oct 16, 2022

stage2: cannot determine length of array in comptime block #12240

Closed

andrewrk closed this as completed in a2f4de1 Nov 11, 2022

IntegratedQuantum mentioned this issue Nov 11, 2022

Bcrypt large speed regression #13510

Open

andrewrk modified the milestones: 0.11.0, 0.10.1 Jan 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Stage 2 performance regression in debug: big structs as function parameters are passed by value #12638

Stage 2 performance regression in debug: big structs as function parameters are passed by value #12638

IntegratedQuantum commented Aug 26, 2022 •

edited

Loading

Vexu commented Aug 26, 2022

Uh oh!

jeffkdev commented Aug 26, 2022

Uh oh!

ominitay commented Aug 26, 2022

Uh oh!

ominitay commented Aug 26, 2022

Uh oh!

Vexu commented Aug 26, 2022

Uh oh!

jeffkdev commented Aug 26, 2022

Uh oh!

ominitay commented Aug 26, 2022

Uh oh!

ominitay commented Aug 26, 2022

Uh oh!

IntegratedQuantum commented Aug 27, 2022

Uh oh!

IntegratedQuantum commented Aug 27, 2022

Uh oh!

Opioid commented Sep 14, 2022 •

edited

Loading

Uh oh!

Srekel commented Sep 30, 2022

Uh oh!

andrewrk commented Sep 30, 2022

Uh oh!

Srekel commented Sep 30, 2022

Uh oh!

Uh oh!

Stage 2 performance regression in debug: big structs as function parameters are passed by value #12638

Stage 2 performance regression in debug: big structs as function parameters are passed by value #12638

Comments

IntegratedQuantum commented Aug 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Zig Version

Steps to Reproduce

Expected Behavior

Actual Behavior

Bonus

Vexu commented Aug 26, 2022

Uh oh!

jeffkdev commented Aug 26, 2022

Uh oh!

ominitay commented Aug 26, 2022

Uh oh!

ominitay commented Aug 26, 2022

Uh oh!

Vexu commented Aug 26, 2022

Uh oh!

jeffkdev commented Aug 26, 2022

Uh oh!

ominitay commented Aug 26, 2022

Uh oh!

ominitay commented Aug 26, 2022

Uh oh!

IntegratedQuantum commented Aug 27, 2022

Uh oh!

IntegratedQuantum commented Aug 27, 2022

Uh oh!

Opioid commented Sep 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Srekel commented Sep 30, 2022

Uh oh!

andrewrk commented Sep 30, 2022

Uh oh!

Srekel commented Sep 30, 2022

Uh oh!

IntegratedQuantum commented Aug 26, 2022 •

edited

Loading

Opioid commented Sep 14, 2022 •

edited

Loading