Skip to content

Bring windows segfault handler on par with linux #4319

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jan 30, 2020

Conversation

Rocknest
Copy link
Contributor

Adds a new way to dump stack traces on windows that closely resembles linux segfault handler. Works on i386, x86_64 (tested) and aarch64 (not tested).

I don't know where to put these definitions since there are no 'bits' for windows by architecture.

Basically changes are only seen on i386. Test code:

pub fn main() void {
    test1();
}

fn test1() void {
    illegal();
}

fn illegal() void {
    asm volatile ("nop");
    asm volatile ("ud2");
}

Before:

D:\zig>zig run ill.zig -target i386-windows-gnu
Illegal Instruction

After:

D:\zig>zig run ill.zig -target i386-windows-gnu
Illegal instruction at address 0xb6b034
D:\zig\ill.zig:10:5: 0xb6b034 in illegal (run.o)
    asm volatile ("nop");
    ^
D:\zig\ill.zig:6:12: 0xb64a17 in test1 (run.o)
    illegal();
           ^
D:\zig\ill.zig:2:10: 0xb52a97 in main (run.o)
    test1();
         ^
D:\zig\lib\zig\std\start.zig:125:65: 0xb519e2 in std.start.WinMainCRTStartup (run.o)                             
    std.os.windows.kernel32.ExitProcess(initEventLoopAndCallMain());
                                                                ^
???:?:?: 0x74f76358 in ??? (???)
???:?:?: 0x777d7b73 in ??? (???)
???:?:?: 0x777d7b43 in ??? (???)
!integer overflow
!D:\zig\lib\zig\std\debug.zig:140:65: 0xb6adbd in std.debug.dumpStackTraceFromBase (run.o)
!        printSourceAtAddress(debug_info, stderr, return_address - 1, tty_config) catch return;
!                                                                ^
!D:\zig\lib\zig\std\debug.zig:2331:31: 0xb649ca in std.debug.handleSegfaultWindowsExtra (run.o)
!        dumpStackTraceFromBase(regs.bp, regs.ip);
!                              ^
!D:\zig\lib\zig\std\debug.zig:2313:76: 0xb52a7a in std.debug.handleSegfaultWindows (run.o)
!        windows.EXCEPTION_ILLEGAL_INSTRUCTION => handleSegfaultWindowsExtra(info, 2, null),
!                                                                           ^
!???:?:?: 0x777dbba5 in ??? (???)
!???:?:?: 0x777d808a in ??? (???)
!???:?:?: 0x777e4215 in ??? (???)
!D:\zig\ill.zig:6:12: 0xb64a17 in test1 (run.o)
!    illegal();
!           ^
!D:\zig\ill.zig:2:10: 0xb52a97 in main (run.o)
!    test1();
!         ^
!D:\zig\lib\zig\std\start.zig:125:65: 0xb519e2 in std.start.WinMainCRTStartup (run.o)
!    std.os.windows.kernel32.ExitProcess(initEventLoopAndCallMain());
!                                                                ^
!???:?:?: 0x74f76358 in ??? (???)
!???:?:?: 0x777d7b73 in ??? (???)
!???:?:?: 0x777d7b43 in ??? (???)

Highlighted is a panic caused by underflow in dumpStackTraceFromBase when it received 0 from StackIterator. I'm sure its a bug.

There is also a problem that stack trace points to "nop" and not "ud2" instruction.

else => return windows.EXCEPTION_CONTINUE_SEARCH,
}
}

// zig won't let me use an anon enum here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is that? What error does it give?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D:\zig>zig run ill.zig -target x86_64-windows-gnu
.\lib\zig\std\debug.zig:2311:85: error: expected type 'std.debug.enum:2320:80', found 'std.debug.enum:2320:80'
        windows.EXCEPTION_DATATYPE_MISALIGNMENT => handleSegfaultWindowsExtra(info, .fmt, "Unaligned Memory Access"),
                                                                                    ^
.\lib\zig\std\debug.zig:2320:80: note: std.debug.enum:2320:80 declared here
fn handleSegfaultWindowsExtra(info: *windows.EXCEPTION_POINTERS, comptime msg: enum {fmt, segv, ill}, comptime format: ?[]const u8) noreturn {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah the infamous enum bug, I think it stops being a problem if you don't make it a comptime param.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, comptime does not make a difference

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an issue for this? I'll take a look

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#3707, the anonymous enum is analyzed twice and produces two disjoint types.

@andrewrk
Copy link
Member

There is also a problem that stack trace points to "nop" and not "ud2" instruction.

Could this be related to 7336b75?

@Rocknest
Copy link
Contributor Author

@andrewrk no, that function actually not even called once when i tested.

I've managed to make it work, now it points to ud2 asm block. I have no idea why adding 1 to ip fixes it. Also stack iterator won't return nulls now.

D:\zig>zig run ill.zig -target i386-windows-gnu
Illegal instruction at address 0xdeb054
D:\zig\ill.zig:10:5: 0xdeb055 in illegal (run.o)
    asm volatile ("ud2");
    ^
D:\zig\ill.zig:6:12: 0xde4a17 in test1 (run.o)
    illegal();
           ^
D:\zig\ill.zig:2:10: 0xdd2a97 in main (run.o)
    test1();
         ^
D:\zig\lib\zig\std\start.zig:125:65: 0xdd19e2 in std.start.WinMainCRTStartup (run.o)
    std.os.windows.kernel32.ExitProcess(initEventLoopAndCallMain());
                                                                ^
???:?:?: 0x74f76358 in ??? (???)
???:?:?: 0x777d7b73 in ??? (???)
???:?:?: 0x777d7b43 in ??? (???)

D:\zig>

@andrewrk
Copy link
Member

Thanks @Rocknest. This looks like a good change. Happy to merge after @LemonBoy's review comment #4319 (comment) is solved

@LemonBoy
Copy link
Contributor

I've managed to make it work, now it points to ud2 asm block. I have no idea why adding 1 to ip fixes it.

Hold your horses, this is not correct. Get a debugger and you'll see that the exception address is indeed correct and points to the ud2 instruction, there's something fishy in the debug info and/or its handling.

@Rocknest
Copy link
Contributor Author

@LemonBoy Yes the instruction is exactly where ip points to, but it looks like there is some extra -1 in printSourceAtAddress which is causing off by one error.

@LemonBoy
Copy link
Contributor

Yes the instruction is exactly where ip points to, but it looks like there is some extra -1 in printSourceAtAddress which is causing off by one error.

There's no -1 in printSourceAtAddressWindows, the -1 offset is needed only when you're dealing with return addresses as you may be past the function boundary, but this is not the case as you have the ip.

@Rocknest
Copy link
Contributor Author

Rocknest commented Jan 29, 2020

@LemonBoy i suspect that off-by-one is debug.zig:474. <= instead of <.

@LemonBoy
Copy link
Contributor

i suspect that off-by-one is debug.zig:472. <= instead <.

If the comment above is to be trusted then yes, it should be <. I don't know too much about pdbs to say if that's correct or not, try it and see? :)

@andrewrk
Copy link
Member

The comment is from a60ecdc

@Rocknest
Copy link
Contributor Author

Looks like that's a proper fix. Stack traces look identical to linux (up to main fn).

@andrewrk
Copy link
Member

💯 Wonderful. Thank you both.

Nevermind the sr.ht build failure. It appears to be a temporary CI issue.

LastExceptionFromRip: DWORD64,

pub fn getRegs(ctx: *const CONTEXT) struct {bp: usize, ip: usize} {
return .{.bp = @intCast(usize, ctx.Rbp), .ip = @intCast(usize, ctx.Rip)};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@intCast not needed here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got another nonsensical compile error

D:\zig>zig run ill.zig
.\lib\zig\std\os\windows\bits.zig:1036:29: error: array access of non-array type 'std.os.windows.bits.struct:1035:49'
                return .{ctx.Rbp, ctx.Rip};
                            ^

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally the thing to do is to mark a workaround like this:

// TODO: The @as is a workaround for https://github.com/ziglang/zig/issues/4310
return .{.bp = @as(usize, ctx.Rbp), .ip = @as(usize, ctx.Rip)}; 

(replace 4310 with the issue number corresponding to this bug)

Does @as provide a workaround? That would be a preferable workaround to @intCast.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a confusing compile error. Actually the problem is that anon list literal cannot coerce into given type.

I think it is #4148

@andrewrk andrewrk merged commit 0303e7b into ziglang:master Jan 30, 2020
@andrewrk
Copy link
Member

Nice work!

@Rocknest Rocknest deleted the windows-traces branch January 30, 2020 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants