Skip to content

crash / wrong behavior mixing runtime and comptime conditionals in inline for? #4364

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ghost opened this issue Feb 1, 2020 · 8 comments
Closed
Labels
bug Observed behavior contradicts documented or intended behavior stage1 The process of building from source via WebAssembly and the C backend.
Milestone

Comments

@ghost
Copy link

ghost commented Feb 1, 2020

See latest comment for minimal case #4364 (comment)

This was broken by the ir-clean-up-vars PR.


Apologies in advance, I haven't been able to create a minimal reproduction.

Oxid contains a really big comptime function and the compiler is now asserting while analyzing this function. (However I'm not sure if the problem has anything to do with comptime.) It worked before #4152 was merged in.

Commit that works: d896500
Commit that doesn't work: 76fba5b

What happens when I build (my tests will hit it):

~/oxid $ zig build test
when analyzing /Users/dbandstra/oxid/lib/gbe/gbe.zig:378:28: assertion failed. This is a bug in the Zig compiler.

The line in my code where it died isn't very informative, at least to me:

https://github.com/dbandstra/oxid/blob/2459c2cd2395f25412c15e8002d9b2af5a4090af/lib/gbe/gbe.zig#L378

lldb excerpt (on commit 76fba5b):

    frame #15: 0x000000010014af86 zig`stage2_panic(ptr="assertion failed. This is a bug in the Zig compiler.", len=52) at stage1.zig:35:5
    frame #16: 0x00000001000121d5 zig`src_assert(ok=false, source_node=0x00000001087bb6c0) at analyze.cpp:9242:5
    frame #17: 0x00000001000896f8 zig`ir_assert(ok=false, source_instruction=0x000000010dd47120) at ir.cpp:10343:5
    frame #18: 0x00000001000c0a75 zig`ir_analyze_instruction_phi(ira=0x000000010dd4d420, phi_instruction=0x000000010dd47120) at ir.cpp:20344:13
    frame #19: 0x00000001000888a9 zig`ir_analyze_instruction_base(ira=0x000000010dd4d420, instruction=0x000000010dd47120) at ir.cpp:29562:20
    frame #20: 0x000000010008775a zig`ir_analyze(codegen=0x000000010a004a00, old_exec=0x000000010b99f080, new_exec=0x000000010b99eed0, expected_type=0x000000010b05aee0, expected_type_source_node=0x00000001087b0a00, result_ptr=0x0000000000000000) at ir.cpp:29878:38
    frame #21: 0x0000000100028f15 zig`analyze_fn_ir(g=0x000000010a004a00, fn=0x000000010b99ee50, return_type_node=0x00000001087b0a00) at analyze.cpp:4743:34
    frame #22: 0x000000010001752d zig`analyze_fn_body(g=0x000000010a004a00, fn_table_entry=0x000000010b99ee50) at analyze.cpp:4856:5
    frame #23: 0x00000001000185fe zig`semantic_analyze(g=0x000000010a004a00) at analyze.cpp:4968:13
    frame #24: 0x000000010004c861 zig`gen_root_source(g=0x000000010a004a00) at codegen.cpp:9460:9
    frame #25: 0x000000010004aad3 zig`codegen_build_and_link(g=0x000000010a004a00) at codegen.cpp:10473:13
    frame #26: 0x0000000100007e21 zig`main(argc=11, argv=0x00007ffeefbff7d8) at main.cpp:1289:17

The code in ir.cpp where the assertion happens:

static IrInstGen *ir_analyze_instruction_phi(IrAnalyze *ira, IrInstSrcPhi *phi_instruction) {
    Error err;

    if (ira->const_predecessor_bb) {
        // [...]
    }

    ResultLocPeerParent *peer_parent = phi_instruction->peer_parent;
    if (peer_parent != nullptr && !peer_parent->skipped && !peer_parent->done_resuming &&
        peer_parent->peers.length >= 2)
    {
        if (peer_parent->resolved_type == nullptr) {
            IrInstGen **instructions = allocate<IrInstGen *>(peer_parent->peers.length);
            for (size_t i = 0; i < peer_parent->peers.length; i += 1) {
                ResultLocPeer *this_peer = peer_parent->peers.at(i);

                IrInstGen *gen_instruction = this_peer->base.gen_instruction;
                if (gen_instruction == nullptr) {
                    // unreachable instructions will cause implicit_elem_type to be null
                    if (this_peer->base.implicit_elem_type == nullptr) {
                        instructions[i] = ir_const_unreachable(ira, &this_peer->base.source_instruction->base);
                    } else {
                        instructions[i] = ir_const(ira, &this_peer->base.source_instruction->base,
                                this_peer->base.implicit_elem_type);
                        instructions[i]->value->special = ConstValSpecialRuntime;
                    }
                } else {
                    instructions[i] = gen_instruction;
                }

            }
            ZigType *expected_type = ir_result_loc_expected_type(ira, &phi_instruction->base.base, peer_parent->parent);
            peer_parent->resolved_type = ir_resolve_peer_types(ira,
                    peer_parent->base.source_instruction->base.source_node, expected_type, instructions,
                    peer_parent->peers.length);
            if (type_is_invalid(peer_parent->resolved_type))
                return ira->codegen->invalid_inst_gen;

//
// this is the assertion that fails:
            // the logic below assumes there are no instructions in the new current basic block yet
            ir_assert(ira->new_irb.current_basic_block->instruction_list.length == 0, &phi_instruction->base.base);
//

            // In case resolving the parent activates a suspend, do it now
            // [...]

            // If the above code generated any instructions in the current basic block, we need
            // to move them to the peer parent predecessor.
            // [...]
        }
        // [...]
@andrewrk
Copy link
Member

andrewrk commented Feb 2, 2020

Thanks for the report. I will look into this

@andrewrk andrewrk added this to the 0.6.0 milestone Feb 2, 2020
@andrewrk andrewrk added bug Observed behavior contradicts documented or intended behavior stage1 The process of building from source via WebAssembly and the C backend. labels Feb 2, 2020
@andrewrk
Copy link
Member

andrewrk commented Feb 2, 2020

This diff makes it compile for me:

--- a/src/ir.cpp
+++ b/src/ir.cpp
@@ -20387,8 +20387,10 @@ static IrInstGen *ir_analyze_instruction_phi(IrAnalyze *ira, IrInstSrcPhi *phi_i
             if (type_is_invalid(peer_parent->resolved_type))
                 return ira->codegen->invalid_inst_gen;
 
-            // the logic below assumes there are no instructions in the new current basic block yet
-            ir_assert(ira->new_irb.current_basic_block->instruction_list.length == 0, &phi_instruction->base.base);
+            ZigList<IrInstGen *> instrs_to_move = {};
+            while (ira->new_irb.current_basic_block->instruction_list.length != 0) {
+                instrs_to_move.append(ira->new_irb.current_basic_block->instruction_list.pop());
+            }
 
             // In case resolving the parent activates a suspend, do it now
             IrInstGen *parent_result_loc = ir_resolve_result(ira, &phi_instruction->base.base, peer_parent->parent,
@@ -20400,7 +20402,6 @@ static IrInstGen *ir_analyze_instruction_phi(IrAnalyze *ira, IrInstSrcPhi *phi_i
             }
             // If the above code generated any instructions in the current basic block, we need
             // to move them to the peer parent predecessor.
-            ZigList<IrInstGen *> instrs_to_move = {};
             while (ira->new_irb.current_basic_block->instruction_list.length != 0) {
                 instrs_to_move.append(ira->new_irb.current_basic_block->instruction_list.pop());
             }

The title screen comes up, but when I play the game, I get this:

Segmentation fault at address 0x0
/home/andy/Downloads/oxid/src/oxid/systems/player_movement.zig:26:20: 0x2a4a97 in oxid.systems.player_movement.think (oxid)
    if (self.player.spawn_anim_y_remaining > 0) {
                   ^
/home/andy/Downloads/oxid/lib/gbe/gbe.zig:495:37: 0x21e711 in .gbe.Impl.run (oxid)
                const result = think(gs, entry, ctx);
                                    ^
/home/andy/Downloads/oxid/src/oxid/frame.zig:27:55: 0x21cb38 in oxid.frame.gameFrame (oxid)
            @import("systems/player_movement.zig").run(gs, context);
                                                      ^
/home/andy/Downloads/oxid/src/main_sdl.zig:656:22: 0x2267b1 in tick (oxid)
            gameFrame(&self.main_state.session, frame_context, draw, paused);
                     ^
/home/andy/Downloads/oxid/src/main_sdl.zig:246:21: 0x22479f in main (oxid)
                tick(self, refresh_rate);
                    ^
/home/andy/Downloads/zig/lib/std/start.zig:247:29: 0x226c90 in std.start.main (oxid)
            return root.main();
                            ^

are you able to tell if this is progress, or does that crash seem related to the gbe code?

@ghost
Copy link
Author

ghost commented Feb 3, 2020

Thanks for looking into it. The crash is absolutely related to the gbe code. Your patch does help me look a little deeper with debug prints.

Quick explainer: the entity iterator takes custom structs like this one: https://github.com/dbandstra/oxid/blob/master/src/oxid/systems/player_movement.zig#L15 . The next method looks for a game "entity" (entities are just IDs) for which an instance of each component listed in the struct exists. It sets the fields in the custom struct to point to those components and returns it. So the fact that self.player points to garbage means the iterator next method did something wrong (didn't get to the code that set those struct fields).

By adding some debug prints, I figured out that next doesn't go wrong until it gets to this inline for loop: https://github.com/dbandstra/oxid/blob/2459c2cd2395f25412c15e8002d9b2af5a4090af/lib/gbe/gbe.zig#L375 . In the first iteration, it visits the first field in the struct, id: EntityId, which is handled specially (line 381). But then, it drops out of the loop instead of continuing. This is the bug. The compiler assertion I was hitting before your patch pointed to line 378 here, which seems significant.

Because this loop broke before it was supposed to, fields of the returned struct end up unset (left undefined), hence the garbage pointers and runtime crash.

There are inline for loops earlier in the function which seem to work fine.


You might not need this, but I pushed a branch with debug prints added. Here is the same code with prints added: https://github.com/dbandstra/oxid/blob/zig-issue-4364/lib/gbe/gbe.zig#L394

And the output I get, demonstrating that the loop is being exited early (id is the first field in the struct):

- id
  is EntityId
  end of inline for iteration
  outside the loop

Should have seen it continue with another field instead of printing "outside the loop".

@andrewrk
Copy link
Member

andrewrk commented Feb 3, 2020

Thanks for the clues. I'll have another look this week. This might be a good time for me to experiment with creduce.

@ghost
Copy link
Author

ghost commented Feb 3, 2020

Reduction (I'm still running your zig patch):

const std = @import("std");

const T = struct {
    id: u64,
    one: u32,
    two: u32,
    three: u32,
};

pub fn main() void {
    var runtime_value = false;

    inline for (@typeInfo(T).Struct.fields) |field, field_index| {
        std.debug.warn("- {}\n", .{field.name});

        if (runtime_value) {

        } else if (runtime_value) {

        } else if (field.field_type == u64) {

        }
    }
}

Output:

- id

Delete the else if (runtime_value) branch and it segfaults...

Put the field_type branch (comptime known condition) at the beginning and it works.

So it seems the problem has something to do with the pattern if (runtime_condition) ... else if (comptime_condition) ...

edit: I pushed a workaround to oxid, so no rush with the fix. We should change the issue title though.

@ghost ghost changed the title oxid build broken by ir-clean-up-vars PR crash / wrong behavior mixing runtime and comptime conditionals in inline for? Feb 16, 2020
@ghost
Copy link
Author

ghost commented Feb 17, 2020

Strongly resembles #4257 which was marked as a duplicate of #2727

@andrewrk
Copy link
Member

Ah, that's really helpful. I prioritized #2727 - I'll solve that bug and then we can revisit this one.

@andrewrk andrewrk modified the milestones: 0.6.0, 0.7.0 Apr 13, 2020
@andrewrk andrewrk modified the milestones: 0.7.0, 0.8.0 Aug 13, 2020
@andrewrk andrewrk modified the milestones: 0.8.0, 0.9.0 Nov 6, 2020
@andrewrk andrewrk modified the milestones: 0.9.0, 0.10.0 May 19, 2021
@Vexu
Copy link
Member

Vexu commented Dec 28, 2022

Fixed by 4618c41

@Vexu Vexu closed this as completed Dec 28, 2022
@andrewrk andrewrk modified the milestones: 0.12.0, 0.11.0 Dec 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Observed behavior contradicts documented or intended behavior stage1 The process of building from source via WebAssembly and the C backend.
Projects
None yet
Development

No branches or pull requests

2 participants