Skip to content

Self hosted: @memset causes failure in debug build (freestanding) #13422

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
NerdySouth opened this issue Nov 3, 2022 · 10 comments
Closed

Self hosted: @memset causes failure in debug build (freestanding) #13422

NerdySouth opened this issue Nov 3, 2022 · 10 comments
Labels
arch-arm 32-bit ARM bug Observed behavior contradicts documented or intended behavior
Milestone

Comments

@NerdySouth
Copy link

Zig Version

0.10.0

Steps to Reproduce and Observed Behavior

I have a GPIO blink program written for the RPI zero/A+ which I know successfully blinks the light. Due to this being a bare metal environment, and a building block for my kernel, I followed the age old wisdom to zero the bss section of my program in my zigMain. zigMain is the first function called after my asm entry point, which only sets up the sp and turns off interrupts.
With the stage 1 compiler (0.9.1) all is well, and I call this code at the start of zigMain:

extern var __bss_start: u8;
extern var __bss_end: u8;
@memset(@as(*volatile [1]u8, &__bss_start), 0, @ptrToInt(&__bss_end) - @ptrToInt(&__bss_start));

This seemingly worked fine with the stage1 compiler, however, with the self hosted, it causes the program not to blink my LED. If I compile with optimizations (fast or small) then the program will work (likely optimizing out this memset since my binary never actually had anything in the bss section. Does Zig even use that section?)

Expected Behavior

Should call @Memset to zero the bss section and NOT change program correctness.

@NerdySouth NerdySouth added the bug Observed behavior contradicts documented or intended behavior label Nov 3, 2022
@NerdySouth
Copy link
Author

#13421 Has the project files for this.

@llogick
Copy link
Contributor

llogick commented Nov 3, 2022

Make sure the resulting binary does have (valid) __bss_start and __bss_end otherwise you might be memsetting who knows what :)

@NerdySouth
Copy link
Author

Hm. I see I included a version where I had commented out my use of the bss_start and bss__end symbols. In digging into your suggestion though I did notice what I think to be the culprit. In zig9, nothing was ever being put in my bss section, and so a memset call to it just so happened to be a memset on size 0, and was likely optimized out.

However, with the self hosted compiler, it seems that the <os.main> symbol is being put into the bss section, and now the bss has size > 1, and thus my memset is causing some weird behavior. I am including the project zip this time so if anyone wants to tinker they can, but I still find it weird that the memset would be wrong. If the bss__start/bss__end symbols aren't getting included in the output binary for my zig code to use, then that is also a problem is it not?

For reference, I am using llvm-objdump with llvm 15.0.2 for this.

project:
pi-zig.zip

@DanB91
Copy link

DanB91 commented Nov 12, 2022

I'm thinking that this might be due to broken memset and memclr referenced in #13530. I did a quick disassembly of your executable and found this:

0000c7a8 <__aeabi_memset>:
    c7a8:       e3510000        cmp     r1, #0
    c7ac:       012fff1e        bxeq    lr
    c7b0:       e92d4800        push    {fp, lr}
    c7b4:       e1a0b00d        mov     fp, sp
    c7b8:       ebfffffa        bl      c7a8 <__aeabi_memset>
    c7bc:       e8bd8800        pop     {fp, pc}

0000c7c0 <__aeabi_memclr>:
    c7c0:       e3510000        cmp     r1, #0
    c7c4:       012fff1e        bxeq    lr
    c7c8:       e92d4800        push    {fp, lr}
    c7cc:       e1a0b00d        mov     fp, sp
    c7d0:       ebfffffa        bl      c7c0 <__aeabi_memclr>
    c7d4:       e8bd8800        pop     {fp, pc}

@NerdySouth
Copy link
Author

I think you are right @DanB91. It appears to be the same issue as yours because if i define my own implementation in assembly it works fine.

@daurnimator
Copy link
Contributor

Expected Behavior

Should call @Memset to zero the bss section and NOT change program correctness.

I believe I've run into this before on riscv. https://github.com/im-tomu/fomu-workshop/blob/1018c590028b1fef578023360952a2b84d1809ec/riscv-zig-blink/src/fomu/start.zig#L29-L42

The answer I found at the time was that there is nothing to say that the implementation of memset doesn't use symbols in bss itself, and hence the clearing of bss during initialisation should always be done with inline assembly.

@Vexu Vexu added the arch-arm 32-bit ARM label Dec 5, 2022
@Vexu Vexu added this to the 0.11.0 milestone Dec 5, 2022
@matu3ba
Copy link
Contributor

matu3ba commented May 4, 2023

Please provide a minimal reduction + instruction to reproduce including the assembly (extern function, which can be linked to reproduce the problem).
If only an example with start code is viable, then please add proper bss zeroing and stack setup into the _start function.

@Srekel
Copy link

Srekel commented May 9, 2023

Hi, I'm not sure if this is the same bug or not but I found a very small repro for a @memset crash: #15634

@frmdstryr
Copy link
Contributor

frmdstryr commented May 10, 2023

Can confirm it is still broken. The old std.mem.set worked fine but not that was dropped.

Not really any new info here but:

// Set by linker 
extern var __text_end: u32;
extern var __data_start: u32;
extern var __data_size: u32;
extern var __bss_start: u32;
extern var __bss_size: u32;

export fn Reset_Handler() void {
    const data_size = @ptrToInt(&__data_size);
    const data = @ptrCast([*]u8, &__data_start);
    const text_end = @ptrCast([*]u8, &__text_end);
    @memcpy(data[0..data_size], text_end[0..data_size]);

    const bss_size = @ptrToInt(&__bss_size);
    const bss = @ptrCast([*]u8, &__bss_start);
    // This works fine
    // for (0..bss_size) |i| {
    //     bss[i] = 0;
    // }
    @memset(bss[0..bss_size], 0);
    while (true) {}
}

Bss and size are as follows

(gdb) p bss
$6 = (u8 *) 0x20000010 <builtin.os> "\030"
(gdb) p bss_size
$7 = 164

At start of @memset It pushes bss into r0, and the length into r1, then jumps to __aeabi_memclr4

 |    0x5f2 <Reset_Handler+326>                       movw    r0, #160x5f6 <Reset_Handler+330>                       movt    r0, #8192       ; 0x20000x5fa <Reset_Handler+334>                       movw    r1, #164        ; 0xa40x5fe <Reset_Handler+338>                       movt    r1, #0> 0x602 <Reset_Handler+342>                       bl      0x5d4c <__aeabi_memclr4>           

Registers are

r0             0x20000010          536870928
r1             0xa4                164
sp             0x2000ffe0          0x2000ffe0

memclr4 jumps to memclr

>0x5d4c <__aeabi_memclr4>        b.w     0x5d3c <__aeabi_memclr>                                                                                                                                       

Then sits in a recursive loop comparing the bss ptr to 0 until it hard faults.

|    > 0x5d3c <__aeabi_memclr>         cmp     r1, #00x5d3e <__aeabi_memclr+2>       it      eq0x5d40 <__aeabi_memclr+4>       bxeq    lr0x5d42 <__aeabi_memclr+6>       push    {r7, lr}                                                                                                                                                       
│   0x5d44 <__aeabi_memclr+8>       mov     r7, sp0x5d46 <__aeabi_memclr+10>      bl      0x5d3c <__aeabi_memclr>  

Not sure why it replaces @memset with memclr, but maybe is optimizing memset out? Edit: Doesn't work with memset either.

@matu3ba
Copy link
Contributor

matu3ba commented Jun 27, 2023

I think this issue can be closed, since the ARM compiler_rt function has been fixed in #15655.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-arm 32-bit ARM bug Observed behavior contradicts documented or intended behavior
Projects
None yet
Development

No branches or pull requests

8 participants