Skip to content

ROM Region Merging is not padding correctly #11140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
DrynnBavis opened this issue Aug 1, 2019 · 73 comments
Closed

ROM Region Merging is not padding correctly #11140

DrynnBavis opened this issue Aug 1, 2019 · 73 comments

Comments

@DrynnBavis
Copy link

DrynnBavis commented Aug 1, 2019

Description

I've got a managed bootloader application of which I've set the target.restrict_size to a value of 0x40000. When I use this binary as a bootloader for my main application, I get the following in my build logs:

Using ROM regions bootloader, application in this build.
  Region bootloader: size 0x10000, offset 0x8000000
  Region application: size 0x170000, offset 0x8010000
Compile [100.0%]: mbed_application.c

This is problematic because the bootloader calls mbed_app_start(POST_APPLICATION_ADDR) which has a value of 0x8040000. Essentially, the linker is placing my main program at 0x8010000 while the bootloader is actually booting into 0x8040000. The behaviour of this is my application just "hangs" after the bootloader completes.

Note that I've also printed the address, sp, and pc, before booting the assembly code within start_new_application() is called, and I got the following results:

calling start_new_application
address -> 0x08040000
sp ------> 0xFFFFFFFF
pc ------> 0xFFFFFFFF

and after I set target.restrict_size back down to something smaller like 0x1000, I see something of more expected behaviour (but still seems to hang):

calling start_new_application
address -> 0x08010000
sp ------> 0x20050000
pc ------> 0x080132DD

Is this a bug on mbed's side? Or do I need to do something in my main program's config to pad out until 0x8040000?

As requested these are contents of your requested files @AGlass0fMilk
bootloader's mbed_app.json:

{
    "config": {
        "default_target" : {
            "value": "TRACK_PILOT2"
        }
    },
    "target_overrides": {
        "*": {
            "mbed-trace.enable": 1,
            "platform.stdio-baud-rate": 115200,
            "platform.stdio-convert-newlines": true
        },
        "MY_CUSTOM_TARGET": {
            "target.restrict_size": "0x00020000"
        }
    }
}

main app's mbed_app.json:

"MY_CUSTOM_TARGET": {
    "enable-swo": "0",
    "spif-driver.SPI_MOSI": "FLASH_SPI_MOSI",
    "spif-driver.SPI_MISO": "FLASH_SPI_MISO",
    "spif-driver.SPI_CLK": "FLASH_SPI_SCK",
    "spif-driver.SPI_CS": "FLASH_SPI_CS",
    "spif-driver.SPI_FREQ": 40000000,
    "target.features_add": ["STORAGE"],
    "target.components_add": ["SPIF"],
    "target.bootloader_img": "../bootloader/build/develop/bootloader.bin"
}

target definition:

"MY_CUSTOM_TARGET": {
    "inherits": ["FAMILY_STM32"],
    "core": "Cortex-M4F",
    "extra_labels_add": [
        "STM32F4",
        "STM32F413xx",
        "STM32F413RH",
        "STM32F413xH",
        "STM32F413ZH"
    ],
    "config": {
        "clock_source": {
            "help": "Mask value : USE_PLL_HSE_EXTC | USE_PLL_HSE_XTAL (need HW patch) | USE_PLL_HSI",
            "value": "USE_PLL_HSE_EXTC|USE_PLL_HSI",
            "macro_name": "CLOCK_SOURCE"
        },
        "lpticker_lptim": {
            "help": "This target supports LPTIM. Set value 1 to use LPTIM for LPTICKER, or 0 to use RTC wakeup timer",
            "value": 1
        }
    },
    "overrides": {
        "lpticker_delay_ticks": 4
    },
    "macros_add": [
        "MBED_TICKLESS",
        "USB_STM_HAL"
    ],
    "device_has_add": [
        "ANALOGOUT",
        "CAN",
        "SERIAL_ASYNCH",
        "TRNG",
        "FLASH",
        "MPU"
    ],
    "bootloader_supported": true,
    "release_versions": [
        "5"
    ],
    "device_name": "STM32F413ZHTx"
}

Issue request type

Target: custom MCU with STM32F413RH processor
Toolchain: GCC_ARM 8.2.1
Tool: mbed-cli
Vers: cfa7938 (HEAD, tag: mbed-os-5.12.2)

[x] Question
[ ] Enhancement
[ ] Bug
@DrynnBavis DrynnBavis changed the title Region Merging is not padding correctly ROM Region Merging is not padding correctly Aug 1, 2019
@AGlass0fMilk
Copy link
Member

I can't comment on the padding issue at the moment -- can you post both of your 'mbed_app.json' files for the bootloader and the application? Also post the 'custom_target.json' if you have one.

As for sp and pc, the first one is the initial stack pointer and the second one is the initial program counter (address of next instruction to be executed). In this case, it's pointing to the Reset Vector of your application. This is the real entry point of your application and does all the hardware/system/RTOS initialization before jumping to your main function.

See this diagram below showing the reset vector format on Cortex M4 devices:

image
(Page 37 of this file)

So in your second case, the bootloader/application merging seems to be working -- the sp and pc values are reasonable, indicating it really is the vector table of your application binary. However, there is a problem!

The reason your program is hanging is because the initial program counter pointer is unaligned -- that is the value 0x080132DD is not word-aligned (since it's a 32-bit processor, must be divisible by 4). 0x080132DD is an odd number. Attempting to execute an unaligned instruction is illegal in CM4 processors and will cause a HardFault/MemoryManagement Fault.

Unfortunately this kind of bug pops up from time to time, so it's good to know the root cause of this issue.

This is certainly a problem with the online compiler tools...

Until someone from Mbed addresses this issue I would recommend trying to get set up with Mbed-CLI and building offline, that way you have a lot more control over the build and various tool/software versions.

@DrynnBavis
Copy link
Author

DrynnBavis commented Aug 1, 2019

can you post both of your 'mbed_app.json' files for the bootloader and the application? Also post the 'custom_target.json' if you have one.

@AGlass0fMilk I've edited my OP to include the 3 files you asked for. Just as a reminder, the device_name listed in my target definition isn't the processor I'm actually using (since my processor - STM32F413RH - doesn't seem to have support from mbed), so instead I'm using STM32F413ZHTx since it has sectors defined in arm_pack_manager/index.json that seem to be similar to my processor's ROM mapping. Please see #11120 for more info on that.

I had a feeling that 0x080132DD wasn't a valid address. I don't see what I'm doing on my end wrong that's causing this though. Recall from mbed_start_application() from mbed_application.c that the definition of pc is

pc = *((void **)address + 1);

To print this, I was c-style casting it to a char pointer like so

printf("pc ------> 0x%08X\r\n", (char*)pc);

Maybe I'm just printing this wrong? Or does the logic for pc make sense? My interpretation is, we're casting address to a double void pointer, incrementing the address of this pointer by 1 (whatever that means, please enlighten me here!) and then finally de referencing it back to a single star void pointer.

This is certainly a problem with the online compiler tools...

Until someone from Mbed addresses this issue I would recommend trying to get set up with Mbed-CLI and building offline, that way you have a lot more control over the build and various tool/software versions.

I am indeed using Mbed-CLI and building offline. What do you recommend I do to fix this?

@0Grit
Copy link

0Grit commented Aug 1, 2019

I recall ST parts having memory regions all over the place versus 1 contiguous flash region.

image

Gut feel is you should confirm those sectors that seem to be similar.

@DrynnBavis
Copy link
Author

DrynnBavis commented Aug 1, 2019

Gut feel is you should confirm those sectors that seem to be similar.

@loverdeg-ep tagged you on the other issue with an update to my understanding of the sectors for this chip.

@AGlass0fMilk
Copy link
Member

AGlass0fMilk commented Aug 1, 2019

To verify the actual value programmed into the device for the initial program counter/reset vector you can use a hex editor to view the binary.

I recently had a situation where the build tools were putting an illegal address in the reset vector location...

Try opening the binary up with something like https://hexed.it/?hl=en and take a screenshot.

@DrynnBavis
Copy link
Author

To verify the actual value programmed into the device for the initial program counter/reset vector you can use a hex editor to view the binary.

What address is this exactly? Sorry. I've got the file open in HexEd.it just not sure what address you want to see.

@AGlass0fMilk
Copy link
Member

The beginning of the application. So whatever that address is that the bootloader jumps to.

@DrynnBavis
Copy link
Author

DrynnBavis commented Aug 1, 2019

The beginning of the application. So whatever that address is that the bootloader jumps to.

image

The first image above is the hex at the offset of `0x00020000`, the expected offset of the main program from the `main.bin` compiled file of the master application with the merged bootloader in it.

image

This second image here is from `main_application.bin` notice the content that started at 0x00020000 in `main.bin` is starting here in application at 0x000210C0. Does this seem suspicious too?

@DrynnBavis
Copy link
Author

DrynnBavis commented Aug 1, 2019

@AGlass0fMilk Something else interesting, I've changed the target on my bootloader and main application to an older MCU I had on hand that I previously had success with, and this was the sp and pc print:

calling start_new_application
address -> 0x08020000
sp ------> 0x20030000
pc ------> 0x08024DC5

Again, here the program counter is pointing to an odd number, and thus seemingly unaligned. However the program does seem to be working fine. My blinky application successfully booted from the bootloader. Nothing changed but the target.

So my hypothesis is that these addresses might just be random garbage at the point of time when I'm printing them.

This odd pc value might be a dead end.

@DrynnBavis
Copy link
Author

Just a bump for the new week
@AGlass0fMilk what do you think of this from @40Grit:

In the past @AGlass0fMilk was getting a hardfault after booting mcuboot. i beleive it was due to an invalid vector table relocation?
vtor is the register to confirm.

How would I go about confirming that vtor is indeed pointing to the correct address of the vector table? I guess I'm asking how would I check the value in vtor after the bootloader boots, and then where would I find what the expected address of the vector table should be?

@0Grit
Copy link

0Grit commented Aug 6, 2019

@DrynnBavis
@AGlass0fMilk is travelling currently.

The vector table is at address 0 by default. Where it gets relocated to must be aligned correctly.
The first thing the processor does after power on reset is load the value contained at the 0th entry of the table into the stack pointer and then executes the reset vector.

As I recall Mbed OS 5 generally ends up relocating the vector table to somewhere in SRAM

See the following from http://infocenter.arm.com/help/topic/com.arm.doc.dui0553b/DUI0553.pdf

image

@0Grit
Copy link

0Grit commented Aug 6, 2019

@kjbracey-arm may or may not be able to cut to the heart of the topic for you, upon returning from leave.

@DrynnBavis
Copy link
Author

@loverdeg-ep thanks for the response. I was looking at this datasheet yesterday, and I'm pretty convinced it is indeed being relocated to SRAM. I'm just frustrated in trying to figure out how to debug this properly.

What I've learned is that the bootloader has it's own v table and after finishing (i.e. I call mbed_app_start(POST_APPLICATION_ADDR);, something happens and I think the sp is supposed to point at the vector table of my new application where it's now been offset to some place in SRAM. Ideally at this point a reset will occur and initialisation work the processor needs to do will be performed to configure RTOS and such before the application starts.

What exactly is expected to happen between my calling of mbed_app_start(POST_APPLICATION_ADDR); from the bootloader and the start of my main application? Knowing this would help a lot, if anyone knows.

@40Grit
Copy link

40Grit commented Aug 7, 2019

@DrynnBavis are you able to step debug?

If so, have you loaded the .elf files for both your bootloader and application?

Edit: have you loaded the .elf's into gdb

@DrynnBavis
Copy link
Author

@40Grit I haven't tried yet. For some reason when I build with the debug profile I get this crash from __disable_irq() within mbed_start_application() from mbed_application.c:

++ MbedOS Error Info ++
Error Status: 0x8001012F Code: 303 Module: 1
Error Message: Error - writing to a file in an ISR or critical section

Location: 0x8006BE5
Error Value: 0x1
Current Thread: main  Id: 0x200012B0 Entry: 0x8007AFF StackSize: 0x1000 StackMem: 0x20001C18 SP: 0x200028F8 
For more info, visit: https://mbed.com/s/error?error=0x8001012F&tgt=TRACK_PILOT2
-- MbedOS Error Info --

@AGlass0fMilk
Copy link
Member

AGlass0fMilk commented Aug 7, 2019 via email

@DrynnBavis
Copy link
Author

Hey so I removed the print statements and as you suspected it solved the crash I was seeing on the debug profile. Finally, I was able to backtrace the crash and I found this:

(gdb) bt
#0  0x08006ce4 in write (fildes=2, buf=0x2004fec8, length=0) at libs/mbed-os/platform/mbed_retarget.cpp:677
#1  0x0800602c in mbed_error_puts (str=0x2004fec8 "\n++ MbedOS Fault Handler ++\n\nFaultType: ")
    at libs/mbed-os/platform/mbed_board.c:97
#2  0x08006010 in mbed_error_vprintf (format=0x8018de4 "\n++ MbedOS Fault Handler ++\n\nFaultType: ", arg=...)
    at libs/mbed-os/platform/mbed_board.c:71
#3  0x08005fca in mbed_error_printf (format=0x8018de4 "\n++ MbedOS Fault Handler ++\n\nFaultType: ")
    at libs/mbed-os/platform/mbed_board.c:55
#4  0x08005aee in mbed_fault_handler (fault_type=16, mbed_fault_context_in=0x8018de0 <mbed_fault_context>)
    at libs/mbed-os/platform/TARGET_CORTEX_M/mbed_fault_handler.c:48
#5  0x08005ad6 in Fault_Handler () at except.S:187
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Not at all sure if there's anything helpful here, but I feel that this line:
0x08005aee in mbed_fault_handler (fault_type=16, mbed_fault_context_in=0x8018de0 <mbed_fault_context>) might help give us some context in what's going on here. I just don't know how to interpret it.

@AGlass0fMilk
Copy link
Member

AGlass0fMilk commented Aug 7, 2019 via email

@0Grit
Copy link

0Grit commented Aug 7, 2019

@DrynnBavis Are you printing to uart, SWO, or semihosting?

@0Grit
Copy link

0Grit commented Aug 7, 2019

@DrynnBavis Also, if you can step debug down at the assembly level I would see if there is a particular instruction that your fault is occurring on.

@DrynnBavis
Copy link
Author

@AGlass0fMilk The fault error never actually prints to serial. @loverdeg-ep I'm using UART.

@DrynnBavis
Copy link
Author

Just stepping through mbed_start_application() now, found these values of the core registers after assigning the new values to sp and pc from the address:

r0             0x8020000           134348800
r1             0xe000e100          3758153984
r2             0xe000e000          3758153728
r3             0x20050000          537198592
r4             0x0                 0
r5             0x0                 0
r6             0x0                 0
r7             0x0                 0
r8             0x0                 0
r9             0x0                 0
r10            0x0                 0
r11            0x0                 0
r12            0xffffffff          4294967295
sp             0x20002bd0          0x20002bd0 <_main_stack+4024>
lr             0x8005dbd           134241725
pc             0x8005dd2           0x8005dd2 <mbed_start_application+50>
xpsr           0x610e0000          1628307456
msp            0x2004ffb0          537198512
psp            0x20002bd0          536882128
primask        0x1                 1
basepri        0x0                 0
faultmask      0x0                 0
control        0x6                 6
fpscr          0x0                 0

@0Grit
Copy link

0Grit commented Aug 7, 2019

Check value of VTOR as well.

@DrynnBavis
Copy link
Author

@loverdeg-ep I read VTOR while I was stepping through powerdown_scb() and found VTOR at that time to have a value of 0x08020000 (same as the value of POST_APPLICATION_ADDR)

@DrynnBavis
Copy link
Author

DrynnBavis commented Aug 7, 2019

@AGlass0fMilk @loverdeg-ep
Found the point of breaking to be right after this line:

Breakpoint 2, start_new_application (sp=0x20050000, pc=0x802635d) at libs/mbed-os/platform/mbed_application.c:175
175         __asm volatile(
(gdb) info registers
r0             0x20050000          537198592
r1             0x802635d           134374237
r2             0xe000e000          3758153728
r3             0x802635d           134374237
r4             0x0                 0
r5             0x0                 0
r6             0x0                 0
r7             0x0                 0
r8             0x0                 0
r9             0x0                 0
r10            0x0                 0
r11            0x0                 0
r12            0xffffffff          4294967295
sp             0x20002bc8          0x20002bc8 <_main_stack+4016>
lr             0x8005f03           134242051
pc             0x8005ff2           0x8005ff2 <start_new_application+6>
xpsr           0x10e0000           17694720
msp            0x2004ffb0          537198512
psp            0x20002bc8          536882120
primask        0x1                 1
basepri        0x0                 0
faultmask      0x0                 0
control        0x6                 6
fpscr          0x0                 0
(gdb) info locals
No locals.
(gdb) print sp
$1 = (void *) 0x20050000
(gdb) print pc
$2 = (void *) 0x802635d

after stepping one more time I get the crash:

(gdb) step

Program received signal SIGTRAP, Trace/breakpoint trap.
0x08006ca4 in write (fildes=2, buf=0x2004fec8, length=0) at libs/mbed-os/platform/mbed_retarget.cpp:678
678             errno = EBADF;
(gdb) 

It seems to be this assembly code that's faulting:

    __asm volatile(
        "movs   r2, #0      \n"
        "msr    control, r2 \n" // Switch to main stack
        "mov    sp, %0      \n"
        "msr    primask, r2 \n" // Enable interrupts
        "bx     %1          \n"
        :
        : "l"(sp), "l"(pc)
        : "r2", "cc", "memory"
    );

How does one go about stepping through these line by line?

@0Grit
Copy link

0Grit commented Aug 7, 2019

Are you using command line GDB?

@DrynnBavis
Copy link
Author

DrynnBavis commented Aug 7, 2019

Are you using command line GDB?

yes
GNU gdb (GNU Tools for Arm Embedded Processors 8-2018-q4-major) 8.2.50.20181213-git

@0Grit
Copy link

0Grit commented Aug 7, 2019

I don't know off hand but you'll need the disassembly.

Might need to make it an output of your build.

@DrynnBavis
Copy link
Author

DrynnBavis commented Aug 9, 2019

Took a while but here it is @40Grit:
Nothing changed except for these two peripherals. (P.S. I also compared these register readings of these peripherals earlier in the program to the step before crashing but they were identical)

Before crashing ->

Registers in SCB:
        CPUID:                 1091551809  CPUID base register
        ICSR:                     4194304  Interrupt control and state register
        VTOR:                   134348800  Vector table offset register
        AIRCR:                 -100334848  Application interrupt and reset control register
        SCR:                            0  System control register
        CCR:                          512  Configuration and control register
        SHPR1:                          0  System handler priority registers
        SHPR2:                          0  System handler priority registers
        SHPR3:                          0  System handler priority registers
        SHCRS:                          0  System handler control and state register
        CFSR_UFSR_BFSR_MMFSR:           0  Configurable fault status register
        HFSR:                           0  Hard fault status registerAES
        MMFAR:                 -536809992  Memory management fault address register
        BFAR:                  -536809992  Bus fault address register
        AFSR:                           0  Auxiliary fault status register

After crashing ->

Registers in SCB:
        CPUID:                 1091551809  CPUID base register
        ICSR:                           3  Interrupt control and state register
        VTOR:                   536870912  Vector table offset register
        AIRCR:                 -100334848  Application interrupt and reset control register
        SCR:                            0  System control register
        CCR:                          512  Configuration and control register
        SHPR1:                          0  System handler priority registers
        SHPR2:                          0  System handler priority registers
        SHPR3:                          0  System handler priority registers
        SHCRS:                        128  System handler control and state register
        CFSR_UFSR_BFSR_MMFSR:         130  Configurable fault status register
        HFSR:                  1073741824  Hard fault status register
        MMFAR:                         72  Memory management fault address register
        BFAR:                          72  Bus fault address register
        AFSR:                           0  Auxiliary fault status register

And NVIC also changed.
Before crashing ->

Registers in NVIC:
        ISER0:       0  Interrupt Set-Enable Register
        ISER1:       0  Interrupt Set-Enable Register
        ISER2:       0  Interrupt Set-Enable Register
        ICER0:       0  Interrupt Clear-Enable Register
        ICER1:       0  Interrupt Clear-Enable Register
        ICER2:       0  Interrupt Clear-Enable Register
        ISPR0:       0  Interrupt Set-Pending Register
        ISPR1:  262144  Interrupt Set-Pending Register
        ISPR2:       0  Interrupt Set-Pending Register
        ICPR0:       0  Interrupt Clear-Pending Register
        ICPR1:  262144  Interrupt Clear-Pending Register
        ICPR2:       0  Interrupt Clear-Pending Register
        IABR0:       0  Interrupt Active Bit Register
        IABR1:       0  Interrupt Active Bit Register
        IABR2:       0  Interrupt Active Bit Register
        IPR0:        0  Interrupt Priority Register
        IPR1:        0  Interrupt Priority Register
        IPR2:        0  Interrupt Priority Register
        IPR3:        0  Interrupt Priority Register
        IPR4:        0  Interrupt Priority Register
        IPR5:        0  Interrupt Priority Register
        IPR6:        0  Interrupt Priority Register
        IPR7:        0  Interrupt Priority Register
        IPR8:        0  Interrupt Priority Register
        IPR9:        0  Interrupt Priority Register
        IPR10:       0  Interrupt Priority Register
        IPR11:       0  Interrupt Priority Register
        IPR12:       0  Interrupt Priority Register
        IPR13:       0  Interrupt Priority Register
        IPR14:       0  Interrupt Priority Register
        IPR15:       0  Interrupt Priority Register
        IPR16:       0  Interrupt Priority Register

After crashing ->

Registers in NVIC:
        ISER0:       0  Interrupt Set-Enable Register
        ISER1:  262144  Interrupt Set-Enable Register
        ISER2:       0  Interrupt Set-Enable Register
        ICER0:       0  Interrupt Clear-Enable Register
        ICER1:  262144  Interrupt Clear-Enable Register
        ICER2:       0  Interrupt Clear-Enable Register
        ISPR0:       0  Interrupt Set-Pending Register
        ISPR1:       0  Interrupt Set-Pending Register
        ISPR2:       0  Interrupt Set-Pending Register
        ICPR0:       0  Interrupt Clear-Pending Register
        ICPR1:       0  Interrupt Clear-Pending Register
        ICPR2:       0  Interrupt Clear-Pending Register
        IABR0:       0  Interrupt Active Bit Register
        IABR1:       0  Interrupt Active Bit Register
        IABR2:       0  Interrupt Active Bit Register
        IPR0:        0  Interrupt Priority Register
        IPR1:        0  Interrupt Priority Register
        IPR2:        0  Interrupt Priority Register
        IPR3:        0  Interrupt Priority Register
        IPR4:        0  Interrupt Priority Register
        IPR5:        0  Interrupt Priority Register
        IPR6:        0  Interrupt Priority Register
        IPR7:        0  Interrupt Priority Register
        IPR8:        0  Interrupt Priority Register
        IPR9:        0  Interrupt Priority Register
        IPR10:       0  Interrupt Priority Register
        IPR11:       0  Interrupt Priority Register
        IPR12:       0  Interrupt Priority Register
        IPR13:       0  Interrupt Priority Register
        IPR14:       0  Interrupt Priority Register
        IPR15:       0  Interrupt Priority Register
        IPR16:       0  Interrupt Priority Register

@40Grit
Copy link

40Grit commented Aug 9, 2019

Can you do the same for the working part.

I guess just dump the peripherals that seem relevant.

it would be good to see a side by side diff of the dumps using a program like win merge or kdiff etc.

@DrynnBavis
Copy link
Author

Can you do the same for the working part.

I guess just dump the peripherals that seem relevant.

it would be good to see a side by side diff of the dumps using a program like win merge or kdiff etc.

Assuming you're referring to the disco board, unfortunately I don't have a JTAG header soldered onto this board so I can't do that right now. Put an order in for one though.

@40Grit
Copy link

40Grit commented Aug 9, 2019

disco has onboard st-link i thought.

you should be able to use pyocd, or some st tool to do the dump.

@0xc0170
Copy link
Contributor

0xc0170 commented Aug 9, 2019

Just in case cc @ARMmbed/team-st-mcd

@DrynnBavis
Copy link
Author

DrynnBavis commented Aug 10, 2019

@40Grit Got GDB going on the disco board finally. So this is the same spot we were crashing previously with the register info of the disco board (not crashing)

(gdb) disassemble
Dump of assembler code for function start_new_application:
   0x08001914 <+0>:     sub     sp, #8
   0x08001916 <+2>:     str     r0, [sp, #4]
   0x08001918 <+4>:     str     r1, [sp, #0]
   0x0800191a <+6>:     ldr     r3, [sp, #4]
   0x0800191c <+8>:     ldr     r1, [sp, #0]
   0x0800191e <+10>:    movs    r2, #0
   0x08001920 <+12>:    msr     CONTROL, r2
=> 0x08001924 <+16>:    mov     sp, r3
   0x08001926 <+18>:    msr     PRIMASK, r2
   0x0800192a <+22>:    bx      r1
   0x0800192c <+24>:    nop
   0x0800192e <+26>:    add     sp, #8
   0x08001930 <+28>:    bx      lr
End of assembler dump.
(gdb) info registers
r0             0x20050000          537198592
r1             0x802667d           134375037
r2             0x0                 0
r3             0x20050000          537198592
r4             0x0                 0
r5             0x0                 0
r6             0x0                 0
r7             0x0                 0
r8             0x0                 0
r9             0x0                 0
r10            0x0                 0
r11            0x0                 0
r12            0x20002c08          536882184
sp             0x2004ffb0          0x2004ffb0
lr             0x800182b           134223915
pc             0x8001924           0x8001924 <start_new_application+16>
xPSR           0x410f0000          1091502080
fpscr          0x0                 0
msp            0x2004ffb0          0x2004ffb0
psp            0x20002808          0x20002808 <_main_stack+4040>
primask        0x1                 1
basepri        0x0                 0
faultmask      0x0                 0
control        0x0                 0

@40Grit
Copy link

40Grit commented Aug 10, 2019

can you show it as a side by side diff?
I'm on my phone after hours usually. harder to diff with my thumb

@40Grit
Copy link

40Grit commented Aug 10, 2019

screenshot of,

kdiff, git diff, winmerge. MS paint with highlighted differences.

any of the above

@DrynnBavis
Copy link
Author

DrynnBavis commented Aug 10, 2019

image

@40Grit
bit of word wrap on the sp and pc lines but made it condense for you

@40Grit
Copy link

40Grit commented Aug 10, 2019

you said these are the same parts except for package?

@DrynnBavis
Copy link
Author

DrynnBavis commented Aug 10, 2019

you said these are the same parts except for package?

@40Grit Jut realised I was on line 16 rather than 6 in the disco debugging when I printed the registers, see comment above with new pic

@40Grit
Copy link

40Grit commented Aug 10, 2019

same exact binary flashed to both parts?!?

@DrynnBavis
Copy link
Author

DrynnBavis commented Aug 10, 2019

same exact binary flashed to both parts?!?

No, compiled for and flash to different targets (my custom and the disco) but both with the exact same chip. Technically they have different packages (disco uses STM32F413ZH, our target uses STM32F413RH), however both target definitions have "device_name": "STM32F413ZH"

@40Grit
Copy link

40Grit commented Aug 10, 2019

I assume they share datasheets?
Gimee a link to it.

@DrynnBavis
Copy link
Author

@40Grit
Copy link

40Grit commented Aug 10, 2019

that is the reference manual. wanted datasheet to see the packages etc.

we should move this offline at this point.

contact me at my work email.
@LoVerdeg-ep has it in my profile

@DrynnBavis
Copy link
Author

DrynnBavis commented Aug 10, 2019

@40Grit Link to pdf is here: https://www.st.com/resource/en/datasheet/stm32f413rh.pdf

EDIT: fixed link

@40Grit
Copy link

40Grit commented Aug 10, 2019

Wrong datasheet linked but I got the right one.

I'd need to sit down and re-evaluate all the information at this point.

I recommend pulling all the data you have collected together in an organized fashion and updating your main post.

@DrynnBavis
Copy link
Author

DrynnBavis commented Aug 12, 2019

Wrong datasheet linked but I got the right one.

I'd need to sit down and re-evaluate all the information at this point.

I recommend pulling all the data you have collected together in an organized fashion and updating your main post.

Hey @40Grit just found something really interesting. I've gone and stripped the main.cpp of my bootloader to this:

#include "drivers/FlashIAP.h"
#include "features/storage/blockdevice/BlockDevice.h"
#include "platform/mbed_application.h"
#include <cstdio>
#include "platform/mbed_application.h"

//BlockDevice *bd = BlockDevice::get_default_instance();
mbed::FlashIAP flash;

int main()
{
    printf("Booting to main app space 0x%08lX\r\n", POST_APPLICATION_ADDR);
    mbed_start_application(POST_APPLICATION_ADDR);
}

And this is actually working, remarkably. To be explicit, I mean I've successfully booted from bootloader and into my main application (I know this because I see a prints from my main application in my terminal).

So now I've started adding back the other components of my bootloader and I've found two things are causing the previous hanging bootloader behaviour once added back in:

  • BlockDevice *bd = BlockDevice::get_default_instance();
  • Note1: This seems to silently fail. My pins are indeed correct, but when I try reading the program, erase, or size of the bd, they all return 0.
  • Note2: Error returned from GDB is pretty generic with no helpful back trace. Just returns 0x08005b44 in write (fildes=2, buf=0x2004fec8, length=0) at libs/mbed-os/platform/mbed_retarget.cpp:681 681 errno = EBADF;
  • int rc = mbed_trace_init();
  • Note1: This function returns 0 (assuming success) and I am able to tracing after also setting config. So although it doesn't seem to be failing, it is indeed causing bootloader hanging.
  • Note2: Same difficulty of tracing as with the block device, both bt and the fault messages don't help in determining why the faults are occuring.

Results from debugging around mbed_app_start:

  • the "working" bootloader (not using block device or tracing) has the pretty much the exact same register values as the non working ones in this region of machine code. Even has the same error message about: sp=<error reading variable: Cannot access memory at address 0x20050004>, pc=<error reading variable: Cannot access memory at address 0x20050000> (despite this, it actually still works). The only difference in register values are the obvious offsets in PC and LR register values.
  • The faults that occur within the "broken" bootloader both appear after the mbed_app_start() returns. I can't step any further past this point because gdb loses context. Are we then faulting in the main app space (i.e. has the bootloader succeeded, and it's instead the main app that's faulting?)

This is what I mean when I say I believe the bootloader has finished and booted to the main app:

Breakpoint 7, start_new_application (sp=0x20050000, pc=0x8026e0d) at libs/mbed-os/platform/mbed_application.c:175
175         __asm volatile(
(gdb) stepi
0x08002694      175         __asm volatile(
(gdb) 
0x08002696      175         __asm volatile(
(gdb) 
0x08002698      175         __asm volatile(
(gdb) 
0x0800269c in start_new_application (sp=0x3, pc=0x0) at libs/mbed-os/platform/mbed_application.c:175
175         __asm volatile(
(gdb) 
0x0800269e in start_new_application (sp=<error reading variable: Cannot access memory at address 0x20050004>, 
    pc=<error reading variable: Cannot access memory at address 0x20050000>)
    at libs/mbed-os/platform/mbed_application.c:175
175         __asm volatile(
(gdb) 
0x080026a2      175         __asm volatile(
(gdb) 
0x08026e0c in ?? () // This is where I think we've exited the bootloader, but I could be wrong
(gdb) 
0x08026e10 in ?? ()
(gdb) 
0x08026e12 in ?? ()

I'm starting to think there's something in the main app that's actually causing the faults here. But this conflicts with the fact that it's my changes in the bootloader that are causing this to fail. I'd like to know (or if someone could direct me to some good literature) what is going on between mbed_start_application(POST_APPLICATION_ADDR); of my bootloader and the int main() of my main application. @40Grit @AGlass0fMilk Any new ideas come to mind with what I've found above?

@40Grit
Copy link

40Grit commented Aug 12, 2019

I'd need to dig in and simmer at this point.

Is this a standard bootloader based on mbed-bootloader?

Also, if you want to see past the bootloader in gdb, you should be able to load the .elf of your application into gdb via command line alongside the one already loaded for the bootloader.

You may or may not need to set a source location for the application as well.

@DrynnBavis
Copy link
Author

DrynnBavis commented Aug 12, 2019

Is this a standard bootloader based on mbed-bootloader?

Not sure if there's a strict definition of an "mbed-bootloader" but I'd say it's pretty standard. I'm using restrict size and I built it from their example here .

You may or may not need to set a source location for the application as well.

Do you mean this in the context of loading a second elf for my blinky (main) application? Or is this unrelated to that, maybe an mbed_app target setting?

Also if I haven't said it yet, thank you very much for you help this far @40Grit.

@40Grit
Copy link

40Grit commented Aug 12, 2019

may need to set source location when loading 2nd elf

@sethitow
Copy link
Contributor

And this is actually working, remarkably. To be explicit, I mean I've successfully booted from bootloader and into my main application (I know this because I see a prints from my main application in my terminal).

two things are causing the previous hanging bootloader behaviour once added back in:

  • BlockDevice *bd = BlockDevice::get_default_instance();
  • int rc = mbed_trace_init();

Will each of these lines individually cause the error, or do they both need to be present?
Is the error/registers values around the error the same in all three cases (one, the other, both)?

You might need to de-initialize the peripherals before booting into the main app. What happens if you call bd->deinit() before mbed_start_application()?

@DrynnBavis
Copy link
Author

DrynnBavis commented Aug 12, 2019

Will each of these lines individually cause the error, or do they both need to be present?
Is the error/registers values around the error the same in all three cases (one, the other, both)?

each, no need for both, but both will also error. Currently drilling into step debugging with a block device present. Using the tip from @40Grit about the second elf file, I've been able to read what happens after mbed start. It's failing in mbed_init(), specifically on the function call of mbed_rtos_init(). See line 51 for context here. This is my debugging so far:

(gdb) disassemble
Dump of assembler code for function mbed_init:
   0x08023b30 <+0>:     push    {r3, lr}
   0x08023b32 <+2>:     bl      0x8021f64 <mbed_mpu_init>
=> 0x08023b36 <+6>:     bl      0x8023b64 <mbed_cpy_nvic>
   0x08023b3a <+10>:    bl      0x802927c <mbed_sdk_init>
   0x08023b3e <+14>:    bl      0x8029ff4 <us_ticker_init>
   0x08023b42 <+18>:    bl      0x8023bac <mbed_rtos_init>
   0x08023b46 <+22>:    nop
   0x08023b48 <+24>:    pop     {r3, pc}
End of assembler dump.
(gdb) next
78          mbed_sdk_init();
(gdb) 
80          us_ticker_init();
(gdb) 
82          mbed_rtos_init();
(gdb) 

Program received signal SIGTRAP, Trace/breakpoint trap.
0x08005b44 in write (fildes=2, buf=0x2004fec8, length=0) at libs/mbed-os/platform/mbed_retarget.cpp:681
681             

You might need to de-initialize the peripherals before booting into the main app. What happens if you call bd->deinit() before mbed_start_application()?

Yeah so I was thinking of this earlier, but the thing is that I never had to do this earlier. What would change about it now? Additionally, even without actually calling bd->init() I still see my problem. So it seems to be something in the constructor causing the fuss here. I'll try deconstructing after I finish drilling in with GDB. Maybe I can find exactly what in the RTOS initialisation is killing me.

@DrynnBavis
Copy link
Author

DrynnBavis commented Aug 12, 2019

@40Grit one more thing you might be able to help me with here. I've got it down to a fault being caused within an assembly source irq_cm4f.S after being called from SVC_Hander() from osKernelInitialize(). My understanding from 10 mins of google research is that the SVCs are made when something wants to access a hardware resource (perhaps this has something to do with my flash from before!).

I'm guessing I can't just load an assembly source file into GDB like I could with the elf files previously... I did find this specific assembly source file here, but I'm cautious of the line numbers. How does assembly handle whitespace/comments? Do they count as line numbers or no?

Breakpoint 14, __svcKernelInitialize () at libs/mbed-os/rtos/TARGET_CORTEX/rtx5/RTX/Source/rtx_kernel.c:485
485     SVC0_0 (KernelInitialize,       osStatus_t)
(gdb) next
SVC_Handler () at irq_cm4f.S:52
52      irq_cm4f.S: No such file or directory.
(gdb) 
53      in irq_cm4f.S
(gdb) 
54      in irq_cm4f.S
(gdb) 
55      in irq_cm4f.S
(gdb) 
57      in irq_cm4f.S
(gdb) 
58      in irq_cm4f.S
(gdb) 
59      in irq_cm4f.S
(gdb) 
61      in irq_cm4f.S
(gdb) 
SVC_Handler () at irq_cm4f.S:62
62      in irq_cm4f.S
(gdb) 
63      in irq_cm4f.S
(gdb) 
64      in irq_cm4f.S
(gdb) 
SVC_Handler () at irq_cm4f.S:65
65      in irq_cm4f.S
(gdb) 
68      in irq_cm4f.S
(gdb) 
69      in irq_cm4f.S
(gdb) 
70      in irq_cm4f.S
(gdb) 
71      in irq_cm4f.S
(gdb) 
72      in irq_cm4f.S
(gdb) 
74      in irq_cm4f.S
(gdb) 
86      in irq_cm4f.S
(gdb) 
88      in irq_cm4f.S
(gdb) 
89      in irq_cm4f.S
(gdb) 
90      in irq_cm4f.S
(gdb) 
93      in irq_cm4f.S
(gdb) 
HardFault_Handler () at except.S:50
50      except.S: No such file or directory.
(gdb) 
51      in except.S

@40Grit
Copy link

40Grit commented Aug 12, 2019

Your build should be able to generate an assembly listing with your build. I don't know for sure but I would think you can use it to step through the assembly and it would retain symbols.

@DrynnBavis
Copy link
Author

Closing this issue as it's moved pretty far from the original issue I identified about incorrect padding. Taking the summary from all this debugging and effort and moving it to a new issue with a better title: #11205

@DrynnBavis
Copy link
Author

Also to address your comment @40Grit I was unable to get gdb to go into the source files. I even tried adding them using dir <path_to_source_files> but it still wouldn't step in. Always said "No such file or directory". Best I can do is open the assembly source and follow along in another window.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants