Skip to content

Custom target silently fails to start main app after adding mbed_trace_init(); to bootloader #11205

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
DrynnBavis opened this issue Aug 13, 2019 · 72 comments

Comments

@DrynnBavis
Copy link

DrynnBavis commented Aug 13, 2019

Description

The following is some barebones code I'm using to just boot to the main app space at POST_APPLICATION_ADDR:

#include "mbed.h"
#include "mbed-trace/mbed_trace.h"

int main()
{
    //mbed_trace_init();
    mbed_start_application(POST_APPLICATION_ADDR);
}

This code works, and I successfully boot from the bootloader into the main app space as expected. However, once I uncomment mbed_trace_init();, my app hangs after the bootloader and is unable to start the main app.

Using gdb, I've found that the fault seems to arrive at some point within __scvKernelInitialize (). This is the disassemble print out of its definition:

Dump of assembler code for function osKernelInitialize:
   0x0802463c <+0>:	push	{lr}
   0x0802463e <+2>:	sub	sp, #12
   0x08024640 <+4>:	bl	0x8024638 <osRtxKernelPreInit>
   0x08024644 <+8>:	bl	0x80240a0 <IsIrqMode>
   0x08024648 <+12>:	mov	r3, r0
   0x0802464a <+14>:	cmp	r3, #0
   0x0802464c <+16>:	bne.n	0x8024658 <osKernelInitialize+28>
   0x0802464e <+18>:	bl	0x80240ba <IsIrqMasked>
   0x08024652 <+22>:	mov	r3, r0
   0x08024654 <+24>:	cmp	r3, #0
   0x08024656 <+26>:	beq.n	0x8024668 <osKernelInitialize+44>
   0x08024658 <+28>:	mvn.w	r0, #5
   0x0802465c <+32>:	bl	0x8023df4 <EvrRtxKernelError>
   0x08024660 <+36>:	mvn.w	r3, #5
   0x08024664 <+40>:	str	r3, [sp, #4]
   0x08024666 <+42>:	b.n	0x8024672 <osKernelInitialize+54>
=> 0x08024668 <+44>:	ldr	r3, [pc, #16]	; (0x802467c <osKernelInitialize+64>)
   0x0802466a <+46>:	mov	r12, r3
   0x0802466c <+48>:	svc	0
   0x0802466e <+50>:	mov	r3, r0
   0x08024670 <+52>:	str	r3, [sp, #4]
   0x08024672 <+54>:	ldr	r3, [sp, #4]
   0x08024674 <+56>:	mov	r0, r3
   0x08024676 <+58>:	add	sp, #12
   0x08024678 <+60>:	ldr.w	pc, [sp], #4
   0x0802467c <+64>:	rors	r1, r3
   0x0802467e <+66>:	lsrs	r2, r0, #32

From there, a few things happen inside of irq_cm4f.S before hitting an exception on line 93 of said file. After that except.S is called from there a final fault message is seen in the debugger:

Program received signal SIGTRAP, Trace/breakpoint trap.
0x08003320 in write (fildes=2, buf=0x2004fec8, length=0) at libs/mbed-os/platform/mbed_retarget.cpp:681
681	        errno = EBADF;

Question

What's causing the fault within irq_cm4f.S?

Issue request type

My custom target uses an STM32F413RH chip, but inside custom_targets.json I've specified the device name to be "device_name": "STM32F413ZH" because this has sector information filled out and is virtually the same processor but with a larger package / more pins.

Target: Custom target with STM32F413RH processor
Toolchain: GCC_ARM 8.2.1
Tool: mbed-cli
Vers: mbed-os 5.13

[x] Question
[ ] Enhancement
[ ] Bug
@DrynnBavis
Copy link
Author

For what it's worth, this is the diff of the progress through that SVC0 from irq_cm4f.S between a bootloader not using trace (AKA no faults, left side) and bootloader that uses trace (AKA a fault, right side).

image

@DrynnBavis
Copy link
Author

In the faulting condition, it seems that it diverges from 72 because the contents of print osRtxInf show that thread.run.curr = 0x10 and thread.run.next = 0x0. Because these aren't equal, a different branch is jumped to. I'm guessing this means that when trace is included there's an extra thread that needs to close. Again, just guessing here.

Moving on, line 93 is where the exception is caused. This is the line:

STR      R12,[R1,#TCB_SP_OFS]   ; Store SP

The values involved here are as follows:

  • R12: 0x2004ff98
  • R01: 0x10
  • #TCB_SP_OFS: 0x80048A5*

I don't really understand the STR op code but something in the above values is disagreeing with it.

*n.b. I got this value in gdb using print (uint32_t)TCB_SP_OFS -- it's possible I chose the wrong cast type here.

@DrynnBavis
Copy link
Author

@AGlass0fMilk @40Grit @0xc0170 Tagging you guys now that I've got something worth showing and the issue thread has moved.

@kjbracey
Copy link
Contributor

kjbracey commented Aug 13, 2019

The bootloader->main image transition basically tries to "reset" hardware things back to a safe state, then effectively runs the main image as if through the reset vector, so everything gets reinitialised.

there's an extra thread that needs to close.

All software state is lost, so it's not about closing - we effectively do a software "reset" in mbed_start_application and run an entirely new stand-alone image.

But if the hardware is not correctly reset by either the bootloader's shutdown or main image's start-up code, then the new image could be being confused by hardware not being in the expected state.

In this case though, it seems like maybe it's just a failure to initialize RAM correctly? At the point that first SVC_Handler is called, RTX should not have been initialised yet, meaning all its static osRtxInfo structure should be in its default state (as specified by its initialiser in rtx_kernel.c).

I would expect that structure to have been placed in the .data area of the linker map, and then copy-initialised by the assembler CopyDataInit in startup_stm32f413xx.S early in the main image boot.

Check to see why that apparently hasn't happened, or has gone wrong. Does the linker map show it between _sdata and _edata?

Maybe stick a watchpoint on the location of osThreadInfo.thread.run.curr. That '0x10` is nonsense - it should be 0 at startup, and later become a valid pointer value.

I don't really understand the STR op code but something in the above values is disagreeing with it.

It's expecting R1 to be a pointer to a osRtxThread_t structure, and it's trying to set its sp member (TCB_SP_OFS is offsetof(osRtxThread_t, sp)). But R1 is not a valid pointer.

@DrynnBavis
Copy link
Author

DrynnBavis commented Aug 13, 2019

Did a comparison of osRtxInfo between the working bootloader (not calling mbed_trace_init()) and the failing code captured at line 69 of this SVC handling source file for context. Here's the dif (left side works, right side fails):
image
image
Interesting that the os_id and version are miss-matching. Definitely would have figured these to be the same.

Anyway, @kjbracey-arm I've compared the .map files of the successful bootloader (left) and the failing (right) at the region between _sdata and _edata:
image

Not sure what to make of this. What's left out of the screen shot is just more of the same symbols with the right side just offset by 0x40 all the way to _edata. Interesting to see .data.m_trace show up here though. Does anything look bizarre here? Not sure if this is exactly what you wanted to see. Perhaps you meant the map files of the main application, blinky_application.map.

@DrynnBavis
Copy link
Author

DrynnBavis commented Aug 13, 2019

Maybe stick a watchpoint on the location of osThreadInfo.thread.run.curr. That '0x10` is nonsense - it should be 0 at startup, and later become a valid pointer value.

Couldn't find a var named "osThreadInfo" so I'm assuming you meant "osRtxInfo". Here's the watchpoint trace of this var from start to fail:

Hardware watchpoint 15: osRtxInfo.thread.run.curr

Old value = (osRtxThread_t *) 0x10
New value = (osRtxThread_t *) 0x0
Reset_Handler ()
    at libs/mbed-os/targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F413xH/device/TOOLCHAIN_GCC_ARM/startup_stm32f413xx.S:91
91	  ldr  r0, =_sdata
(gdb) c
Continuing.

Hardware watchpoint 15: osRtxInfo.thread.run.curr

Old value = (osRtxThread_t *) 0x0
New value = (osRtxThread_t *) 0x200012a4 <os_timer_thread_cb>
SVC_Handler () at irq_cm4f.S:102
102	irq_cm4f.S: No such file or directory.
(gdb) c
Continuing.

Hardware watchpoint 15: osRtxInfo.thread.run.curr

Old value = (osRtxThread_t *) 0x200012a4 <os_timer_thread_cb>
New value = (osRtxThread_t *) 0x20001198 <_main_obj>
SVC_Handler () at irq_cm4f.S:102
102	in irq_cm4f.S
(gdb) c
Continuing.

Hardware watchpoint 15: osRtxInfo.thread.run.curr

Old value = (osRtxThread_t *) 0x20001198 <_main_obj>
New value = (osRtxThread_t *) 0x0
0x08026e1c in OS_Tick_Setup (freq=134244071, handler=0x4cc28)
    at libs/mbed-os/rtos/TARGET_CORTEX/rtx5/Source/os_systick.c:77
77	}
(gdb) c
Continuing.

Breakpoint 13, 0x08024668 in osKernelInitialize ()
    at libs/mbed-os/rtos/TARGET_CORTEX/rtx5/RTX/Source/rtx_kernel.c:519
519	  if (IsIrqMode() || IsIrqMasked()) {
(gdb) c
Continuing.

Hardware watchpoint 15: osRtxInfo.thread.run.curr

Old value = (osRtxThread_t *) 0x0
New value = (osRtxThread_t *) 0x10
0x0802421c in svcRtxKernelInitialize () at libs/mbed-os/rtos/TARGET_CORTEX/rtx5/RTX/Source/rtx_kernel.c:93
93	  memset(&osRtxInfo.kernel, 0, sizeof(osRtxInfo) - offsetof(osRtxInfo_t, kernel));
(gdb) c
Continuing.

Program received signal SIGTRAP, Trace/breakpoint trap.
0x08005ae8 in write (fildes=2, buf=0x2004fec8, length=0) at libs/mbed-os/platform/mbed_retarget.cpp:681
681	        errno = EBADF;

That last line where the value of 0x10 is assigned, is actually given this value in the working bootloader:

	New value = (osRtxThread_t *) 0x20001104 <os_timer_thread_cb>

Not sure why having mbed_trace present is preventing this from being assigned to osRtxInfo.thread.run.curr rather than 0x10...

@kjbracey
Copy link
Contributor

Not sure if this is exactly what you wanted to see. Perhaps you meant the map files of the main application

What I'm looking for is where osRtxInfo is in the main image map - that should be somewhere between _sdata and _edata so that it gets correctly initialised.

From the dump, it looks like there's been no attempt to actually initialise that structure at the point you've stopped in the SVC Handler.

The initialisation code should be copying the initialised data from _sidata in the ROM to _sdata in the RAM.

@kjbracey
Copy link
Contributor

To double-check - the crash is happening in the main image, right? As it tries to initialise its OS?

Also, I'd like to see the lines from _sidata to _sdata in each image's map.

@kjbracey
Copy link
Contributor

Are all those watchpoints from the main image execution - so the Reset_Handler is after your mbed_start_application?

It looks like it is being set to zero, but I don't see where that 0x10 has come from. That doesn't correspond to the code you're stopping at. Except that I can see a nearby osRtxInfo.isr_queue.max = osRtxConfig.isr_queue.max;, which might be setting a value to 16.

So mismatch between loaded image and ELF in the debugger?

Is this is a custom bootloader you've created that has the RTOS in use? Our own bootloaders disable the RTOS, so you may be hitting some sort of problem with interrupts not being properly shut down that we haven't?

The fact that SVC_Handler is being called before osKernelInitialize makes very little sense. As does being in OS_Tick_Setup. That shouldn't happen until osKernelStart, which should be after osKernelInitialize.

Are you sure you've got the right ELF file in the debugger?

@DrynnBavis
Copy link
Author

DrynnBavis commented Aug 14, 2019

What I'm looking for is where osRtxInfo is in the main image map - that should be somewhere between _sdata and _edata so that it gets correctly initialised.

To be really clear here, I was previously using the .map file from the bootloader. That was probably wrong, so here's the section of the main application's .map file with the copy of osRtxInfo:

(osRtxInfo business occurs around 0x00000000200002ec)

.iplt           0x0000000008033830        0x0
 .iplt          0x0000000008033830        0x0 /Users/brydavis/repos/pick-to-light/tools/gcc-arm-none-eabi-8-2018-q4-major-mac/bin/../lib/gcc/arm-none-eabi/8.2.1/thumb/v7e-m+fp/softfp/crtbegin.o

.ARM.extab
 *(.ARM.extab* .gnu.linkonce.armextab.*)
                0x0000000008033830                __exidx_start = .

.ARM.exidx      0x0000000008033830        0x8
 *(.ARM.exidx* .gnu.linkonce.armexidx.*)
 .ARM.exidx     0x0000000008033830        0x8 /Users/brydavis/repos/pick-to-light/tools/gcc-arm-none-eabi-8-2018-q4-major-mac/bin/../lib/gcc/arm-none-eabi/8.2.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/softfp/crt0.o
 .ARM.exidx     0x0000000008033838        0x0 projects/blinky/build/debug/libs/mbed-os/platform/TARGET_CORTEX_M/TOOLCHAIN_GCC/except.o
                                         0x28 (size before relaxing)
 .ARM.exidx     0x0000000008033838        0x0 projects/blinky/build/debug/libs/mbed-os/rtos/TARGET_CORTEX/rtx5/RTX/Source/TOOLCHAIN_GCC/TARGET_RTOS_M4_M7/irq_cm4f.o
                                         0x18 (size before relaxing)
 .ARM.exidx     0x0000000008033838        0x0 /Users/brydavis/repos/pick-to-light/tools/gcc-arm-none-eabi-8-2018-q4-major-mac/bin/../lib/gcc/arm-none-eabi/8.2.1/thumb/v7e-m+fp/softfp/libgcc.a(_udivmoddi4.o)
                                          0x8 (size before relaxing)
                0x0000000008033838                __exidx_end = .
                0x0000000008033838                __etext = .
                0x0000000008033838                _sidata = .

.rel.dyn        0x0000000008033838        0x0
 .rel.iplt      0x0000000008033838        0x0 /Users/brydavis/repos/pick-to-light/tools/gcc-arm-none-eabi-8-2018-q4-major-mac/bin/../lib/gcc/arm-none-eabi/8.2.1/thumb/v7e-m+fp/softfp/crtbegin.o

.data           0x00000000200001d8      0xb78 load address 0x0000000008033838
                0x00000000200001d8                __data_start__ = .
                0x00000000200001d8                _sdata = .
 *(vtable)
 *(.data*)
 .data          0x00000000200001d8        0x4 /Users/brydavis/repos/pick-to-light/tools/gcc-arm-none-eabi-8-2018-q4-major-mac/bin/../lib/gcc/arm-none-eabi/8.2.1/thumb/v7e-m+fp/softfp/crtbegin.o
                0x00000000200001d8                __dso_handle
 .data.irq_handler
                0x00000000200001dc        0x4 projects/blinky/build/debug/libs/mbed-os/hal/mbed_lp_ticker_api.o
 .data.irq_handler
                0x00000000200001e0        0x4 projects/blinky/build/debug/libs/mbed-os/hal/mbed_us_ticker_api.o
 .data._ZL11filehandles
                0x00000000200001e4      0x100 projects/blinky/build/debug/libs/mbed-os/platform/mbed_retarget.o
 .data._ZZ5_sbrkE4heap
                0x00000000200002e4        0x4 projects/blinky/build/debug/libs/mbed-os/platform/mbed_retarget.o
 .data._ZL14idle_hook_fptr
                0x00000000200002e8        0x4 projects/blinky/build/debug/libs/mbed-os/rtos/TARGET_CORTEX/mbed_rtx_idle.o
 .data.os       0x00000000200002ec       0xa4 projects/blinky/build/debug/libs/mbed-os/rtos/TARGET_CORTEX/rtx5/RTX/Source/rtx_kernel.o
                0x00000000200002ec                osRtxInfo
 .data.SystemCoreClock
                0x0000000020000390        0x4 projects/blinky/build/debug/libs/mbed-os/targets/TARGET_STM/TARGET_STM32F4/device/system_stm32f4xx.o
                0x0000000020000390                SystemCoreClock
 .data._impure_ptr
                0x0000000020000394        0x4 /Users/brydavis/repos/pick-to-light/tools/gcc-arm-none-eabi-8-2018-q4-major-mac/bin/../lib/gcc/arm-none-eabi/8.2.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/softfp/libc.a(lib_a-impure.o)
                0x0000000020000394                _impure_ptr
 .data.impure_data
                0x0000000020000398      0x428 /Users/brydavis/repos/pick-to-light/tools/gcc-arm-none-eabi-8-2018-q4-major-mac/bin/../lib/gcc/arm-none-eabi/8.2.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/softfp/libc.a(lib_a-impure.o)
 .data.__malloc_av_
                0x00000000200007c0      0x408 /Users/brydavis/repos/pick-to-light/tools/gcc-arm-none-eabi-8-2018-q4-major-mac/bin/../lib/gcc/arm-none-eabi/8.2.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/softfp/libc.a(lib_a-mallocr.o)
                0x00000000200007c0                __malloc_av_
 .data.__malloc_sbrk_base
                0x0000000020000bc8        0x4 /Users/brydavis/repos/pick-to-light/tools/gcc-arm-none-eabi-8-2018-q4-major-mac/bin/../lib/gcc/arm-none-eabi/8.2.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/softfp/libc.a(lib_a-mallocr.o)
                0x0000000020000bc8                __malloc_sbrk_base
 .data.__malloc_trim_threshold
                0x0000000020000bcc        0x4 /Users/brydavis/repos/pick-to-light/tools/gcc-arm-none-eabi-8-2018-q4-major-mac/bin/../lib/gcc/arm-none-eabi/8.2.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/softfp/libc.a(lib_a-mallocr.o)
                0x0000000020000bcc                __malloc_trim_threshold
 .data.__global_locale
                0x0000000020000bd0      0x16c /Users/brydavis/repos/pick-to-light/tools/gcc-arm-none-eabi-8-2018-q4-major-mac/bin/../lib/gcc/arm-none-eabi/8.2.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/softfp/libc.a(lib_a-locale.o)
                0x0000000020000bd0                __global_locale
                0x0000000020000d40                . = ALIGN (0x8)
 *fill*         0x0000000020000d3c        0x4 
                0x0000000020000d40                PROVIDE (__preinit_array_start = .)
 *(.preinit_array)
                0x0000000020000d40                PROVIDE (__preinit_array_end = .)
                0x0000000020000d40                . = ALIGN (0x8)
                0x0000000020000d40                PROVIDE (__init_array_start = .)
 *(SORT_BY_NAME(.init_array.*))
 *(.init_array)
 .init_array    0x0000000020000d40        0x4 /Users/brydavis/repos/pick-to-light/tools/gcc-arm-none-eabi-8-2018-q4-major-mac/bin/../lib/gcc/arm-none-eabi/8.2.1/thumb/v7e-m+fp/softfp/crtbegin.o
                0x0000000020000d44                PROVIDE (__init_array_end = .)
                0x0000000020000d48                . = ALIGN (0x8)
 *fill*         0x0000000020000d44        0x4 
                [!provide]                        PROVIDE (__fini_array_start = .)
 *(SORT_BY_NAME(.fini_array.*))
 *(.fini_array)
 .fini_array    0x0000000020000d48        0x4 /Users/brydavis/repos/pick-to-light/tools/gcc-arm-none-eabi-8-2018-q4-major-mac/bin/../lib/gcc/arm-none-eabi/8.2.1/thumb/v7e-m+fp/softfp/crtbegin.o
                [!provide]                        PROVIDE (__fini_array_end = .)
 *(.jcr*)
                0x0000000020000d50                . = ALIGN (0x8)
 *fill*         0x0000000020000d4c        0x4 
                0x0000000020000d50                __data_end__ = .
                0x0000000020000d50                _edata = .

*also note that this map file was generated from mbed-cli and I found it in the build folder of my main app project.

The chunk of the .map file above is identical between .map file produced when the application includes trace init (fails) and omits it (succeeds).

@DrynnBavis
Copy link
Author

To double-check - the crash is happening in the main image, right? As it tries to initialise its OS?

To verify -- indeed the crash occurs within the main image as it tries to initialise the OS (I think more specifically it's failing to initialise the Kernel)

Also, I'd like to see the lines from _sidata to _sdata in each image's map.

So this region in the main app is in my comment above. That same region in the bootloader_application.map is as follows (left works, right fails):


image

image

image

@DrynnBavis
Copy link
Author

DrynnBavis commented Aug 14, 2019

Are all those watchpoints from the main image execution - so the Reset_Handler is after your mbed_start_application?

I have both the bootloader and main application ELF files loaded into GDB. I've set a breakpoint at mbed_application.c:175 which is called from the bootloader - not the main applicaition - and I've recorded the watch points here for you with this breakpoint in place:

(gdb) monitor reset
Resetting target
(gdb) c
Continuing.

Hardware watchpoint 15: osRtxInfo.thread.run.curr

Old value = (osRtxThread_t *) 0x200012a4 <os_idle_thread_stack+284>
New value = (osRtxThread_t *) 0x0
Reset_Handler ()
    at libs/mbed-os/targets/TARGET_STM/TARGET_STM32F4/TARGET_STM32F413xH/device/TOOLCHAIN_GCC_ARM/startup_stm32f413xx.S:91
91	  ldr  r0, =_sdata
(gdb)
Continuing.

Hardware watchpoint 15: osRtxInfo.thread.run.curr

Old value = (osRtxThread_t *) 0x0
New value = (osRtxThread_t *) 0x20001144 <os_timer_thread_cb>
SVC_Handler () at irq_cm4f.S:102
102	irq_cm4f.S: No such file or directory.
(gdb)
Continuing.

Hardware watchpoint 15: osRtxInfo.thread.run.curr

Old value = (osRtxThread_t *) 0x20001144 <os_timer_thread_cb>
New value = (osRtxThread_t *) 0x20001038 <_main_obj>
SVC_Handler () at irq_cm4f.S:102
102	in irq_cm4f.S
(gdb)
Continuing.

Breakpoint 18, start_new_application (sp=0x20050000, pc=0x8026e0d <OS_Tick_Setup+52>)
    at libs/mbed-os/platform/mbed_application.c:175
175	    __asm volatile(
(gdb)
Continuing.

Hardware watchpoint 15: osRtxInfo.thread.run.curr

Old value = (osRtxThread_t *) 0x20001038 <_main_obj>
New value = (osRtxThread_t *) 0x0
OS_Tick_Setup (freq=<error reading variable: Cannot access memory at address 0x20050004>,
    handler=<error reading variable: Cannot access memory at address 0x20050000>)
    at libs/mbed-os/rtos/TARGET_CORTEX/rtx5/Source/os_systick.c:72
72	  SysTick->VAL  =  0U;
(gdb)
Continuing.

Hardware watchpoint 15: osRtxInfo.thread.run.curr

Old value = (osRtxThread_t *) 0x0
New value = (osRtxThread_t *) 0x10
0x0802421c in svcRtxKernelInitialize () at libs/mbed-os/rtos/TARGET_CORTEX/rtx5/RTX/Source/rtx_kernel.c:94
94	  uint32_t offset = offsetof(osRtxInfo_t, kernel);
(gdb)
Continuing.

Program received signal SIGTRAP, Trace/breakpoint trap.
0x08003880 in write (fildes=2, buf=0x2004fec8, length=0) at libs/mbed-os/platform/mbed_retarget.cpp:681
681	        errno = EBADF;

So it seems the majority of the watch points are hit before entering the main app.

It looks like it is being set to zero, but I don't see where that 0x10 has come from. That doesn't correspond to the code you're stopping at. Except that I can see a nearby osRtxInfo.isr_queue.max = osRtxConfig.isr_queue.max;, which might be setting a value to 16.

So perhaps there's an offset here causing this 0x10 that was actually meant for the field osRtxConfig? Would this explain the bizarre values in os_id = 0x5 "v", as well?

So mismatch between loaded image and ELF in the debugger?

See my bottom response in this comment.

Is this is a custom bootloader you've created that has the RTOS in use? Our own bootloaders disable the RTOS, so you may be hitting some sort of problem with interrupts not being properly shut down that we haven't?

I've had a few people ask me about the bootloader. It's a managed mbed bootloader where the contents of main.cpp can be seen in my OP above. I've used target.restrict_size: 0x00020000 in the mbed_app.json and that's about it. I don't think I'm explicitly using any RTOS features (unless trace does?). And this bootloader works fine (with trace) on supported mbed targets. It's my own custom target that's causing this trouble. Is there a way I can explicitly check the RTOS is being disabled here?

The fact that SVC_Handler is being called before osKernelInitialize makes very little sense. As does being in OS_Tick_Setup. That shouldn't happen until osKernelStart, which should be after osKernelInitialize.

SVC_Handler is used for software interrupts right? Is there a way we can trace back to see who's calling these interrupts before osKernelInitialize gets called?

Are you sure you've got the right ELF file in the debugger?

So I was making the mistakes of assuming that when I typed > load into gdb, that it would flash the combined image of my main application with the bootloader. However, experimentally I've determined this not to be the case; instead it's loading both new versions of the ELF files, but only flashing the new image of the bootloader. I'm now using JFlashLite to flash the full main.bin. So I had the new bootloader flashed, with both the new elf files loaded into gdb, but the main application was never being updated. Not sure if this matters as much as I thought it may have since I was never really changing anything in my main.cpp, but there might be more in the binary that was changing other than the active app space that has to due with the bootloader from before it...

@kjbracey-arm What information would you like me to redo now that I'm properly flashing the binaries and loading the elfs to match?

@DrynnBavis
Copy link
Author

DrynnBavis commented Aug 14, 2019

Interesting update

Seeing above that there was some offset business I thought - at a high level here - perhaps this offset is coming from trace existing in the bootloader but not existing in the main application? So I initialised trace in my blinky (main) program as well and reflashed the image.

Sure enough, the bootloader successfully boots to the main app now and both are able to use trace messages.

[INFO][BOOT]: I'm tracing!
[INFO][MAIN]: Now in main!

I'm not sure what this tells us exactly, but I think this means for now I can have a working bootloader so long as the peripherals I use in it are also present in the main application as well. Surely this isn't intended behaviour?

EDIT: Something else that sucks is this problem exists backwards too. Meaning, if I have trace defined in my main app, but not in my bootloader, then it faults as well.

EDIT2: Another thing that may or may not be relevant -> The custom target we're using does not have an external low freq clock (no LSE). We have an 8 MHz external that we're using as for high freq stuff configured as the HSE, but for the low freq stuff it's being done with the LSI. I think I've configured my target correctly with this:

        "config": {
            "clock_source": {
                "help": "Mask value : USE_PLL_HSE_EXTC | USE_PLL_HSE_XTAL (need HW patch) | USE_PLL_HSI",
                "value": "USE_PLL_HSE_EXTC|USE_PLL_HSI",
                "macro_name": "CLOCK_SOURCE"
            },
            "lpticker_lptim": {
                "help": "This target supports LPTIM. Set value 1 to use LPTIM for LPTICKER, or 0 to use RTC wakeup timer",
                "value": 1
            }
        },
        "overrides": {
            "lpticker_delay_ticks": 4,
            "lse_available": 0
        },

But if I've made a mistake maybe that could be causing some troubles.

@40Grit
Copy link

40Grit commented Aug 14, 2019

As @kjbracey-arm stated the bootloader is supposed to do a soft reset back into the reset vector which should avoid any of this weirdness.

@DrynnBavis
Copy link
Author

As @kjbracey-arm stated the bootloader is supposed to do a soft reset back into the reset vector which should avoid any of this weirdness.

How/Where would I confirm this?

@kjbracey
Copy link
Contributor

kjbracey commented Aug 15, 2019

I have both the bootloader and main application ELF files loaded into GDB.

This makes me nervous - I've no experience with loading two ELF files simultaneously into GDB, and I'm not sure how you disambiguate. Some of your output would be consistent with you printing the content of the bootloaders osRtxInfo (or the memory where it would have been) while actually running the main image.

Could you instead just load the ELF file for the main image into GDB?

I don't think I'm explicitly using any RTOS features

No, but it is an RTOS build, which means the RTOS is getting initialised, and you're running your bootloader main as an RTOS thread. As of 5.12/5.13, the RTOS can be excluded via a requires: [ "bare-metal" ] in your mbed_app.json, which would make it more like other bootloaders (and smaller). Still, the fact that your bootloader is RTOS is not supposed to break anything.

As @kjbracey-arm stated the bootloader is supposed to do a soft reset back into the reset vector which should avoid any of this weirdness.

Well, it's not really a reset, it's just "manually safe hardware and jump to Reset_Handler". Where that fails is for any hardware that was disrupted by the bootloader and not restored, and assumed to be in reset state by the main image.

SVC_Handler is used for software interrupts right? Is there a way we can trace back to see who's calling these interrupts before osKernelInitialize gets called?

If using an up-to-date pyOCD for your connection, then a breakpoint on SVC_Handler should show you - the calling point would be shown as a different "thread" - there's 1 thread for handler mode, and 1 (or more) threads for process mode.

If using a more archaic tool, then you'd have to dump the 8 words from SP while stopped on a SVC_Handler breakpoint - the return address for the SVC would be in the seventh word at SP+24. Show all 8 words in a dump, and that will tell us exactly what the supervisor call was. (SP+16 should be the address of an RTX function - look it up in the map).

These really should not be happening.

Anyway, all your memory maps look correct, so don't need to worry about them any more. But we still need to figure out what's going on with that 0x10 pointer that makes it abort - can you go around again with correctly matched ELF (main image only) and run through the boot? You won't have source references then for the bootloader bit, but that doesn't matter.

I can have a working bootloader so long as the peripherals I use in it are also present in the main application as well. Surely this isn't intended behaviour?

There isn't a very rigorous cleanup, tbh. I can imagine a situation where a peripheral is left active, then after entering the main image there's no driver (or init) code for it. Its interrupt generation should at least be masked by the powerdown_nvic call in mbed_start_application, which should minimise the opportunity for it to actually immediately break things. Power wastage would be more likely.

There's no general system "shut everything down" notification to give things a chance to clean-up before the start_new_application.

@40Grit
Copy link

40Grit commented Aug 15, 2019

@kjbracey-arm

Off topic but need to ask:
Is there any case here for the mbed bootloader to be linked to the same mbed-os instance in flash as the application?

@kjbracey
Copy link
Contributor

In principle, no, there shouldn't be. In practice, maybe?

In bigger systems, main images (like the Linux kernel) tend to be very paranoid about what their bootloader may or may not have done, so they tend have init code that manually sets/reset pretty much everything on entry. The bootloader could be years behind the kernel, so they just do not trust "reset" hardware state.

That does take code space though. It would be more space-efficient to trust the bootloader to put everything back it knows it touched, and have the main image just assume everything is in reset state.

As we're currently in a situation where neither of those is really happening, then I guess having bootloader and main image as close as possible does in practice minimise the chances of errors - such as this case potentially is.

@kjbracey
Copy link
Contributor

An ideal would be to have a chip that did support "real reset into secondary image". Have a register that the bootloader could write to that did "reset, but jump into this handler".

Or maybe you could even do that in bootloader software - if you could reliably indicate "reset reason", then the bootloader itself could do NVIC_Reset, re-enter itself, detect the reset reason and jump straight to main image this time.

I don't know if that approach has been attempted in Mbed OS - I've only seen the manual simulated-reset approach in mbed_start_application.

@40Grit
Copy link

40Grit commented Aug 15, 2019

My earlier comment about soft b reset was under the assumption that Mbed-os was using reset-reason and potentially some other core register.

I see now that is not the case.

@40Grit
Copy link

40Grit commented Aug 15, 2019

Thinking further that would require some sort of industry standards between BL and application to use the core in that way.

I'll make any further enquires into this topic in a separate issue.

@DrynnBavis
Copy link
Author

@40Grit @kjbracey-arm Going to try the bare metal setting when I get back, but I'll be out until tuesday. Just FYI so you don't think I've given up haha

@DrynnBavis
Copy link
Author

DrynnBavis commented Aug 20, 2019

Sorry for the delay, but I've returned and tried adding "requires bare-metal" to my mbed_app.json which now looks like this:

        "CUSTOM_TARGET": {
            "target.features_add": ["STORAGE"],
            "target.components_add": ["SPIF"],
            "target.components": ["SPIF"],
            "target.requires" : [ "bare-metal" ],
            "spif-driver.SPI_MOSI": "FLASH_SPI_MOSI",
            "spif-driver.SPI_MISO": "FLASH_SPI_MISO",
            "spif-driver.SPI_CLK": "FLASH_SPI_SCK",
            "spif-driver.SPI_CS": "FLASH_SPI_CS",
            "spif-driver.SPI_FREQ": 40000000,
            "target.restrict_size": "0x00020000"
        },

I'm still crashing inside of svcRtxKernelInitialize unfortunately. I can follow the crash up until this line here inside of svcRtxKernelInitialize:

0x0802445e <+622>:	ldr.w	pc, [sp], #4

where pc = 0x802445e <svcRtxKernelInitialize+622> and sp = 0x2004ffac

After this if I try to go forward (stepping), I end up at 0x08004842 and the elf file is undefined for this ROM region (using the main app's elf file as requested per @kjbracey-arm). It's some point after this that the crash occurs.

I'm confused with what's going on here. Why is it accessing code at 0x08004842?

EDIT: After adding the bootloader elf again to see what's going on, it seems to be accessing SVC_Handler code and just crashing as before:

Breakpoint 17, 0x0802445e in svcRtxKernelInitialize ()
    at libs/mbed-os/rtos/TARGET_CORTEX/rtx5/RTX/Source/rtx_kernel.c:186
186	}
(gdb) stepi
SVC_Handler () at irq_cm4f.S:64
64	irq_cm4f.S: No such file or directory.

@DrynnBavis
Copy link
Author

DrynnBavis commented Aug 20, 2019

@kjbracey-arm @40Grit Just for the sake of sanity, could either of you two reiterate the requirements for defining a custom target? I'm worried I may have missed a very basic step here and perhaps this is the source of the trouble here. I've read over this section from you guys a number of times but I want to be sure here.

I'm specifically concerned about this section here:

MCUs are required to have, and Families and Subfamilies may have:

release_versions.
supported_toolchains.
default_lib.
public.
device_has.

Is my target definition for an MCU, Family, or Subfamily? Perhaps I'm missing one of these fields and that's what's causing this.

@40Grit
Copy link

40Grit commented Aug 20, 2019

I have no direct affiliation with ARM. I just work for one of their partners (Embedded Planet)

Without watching the config, build, and debug session right in front if me I'm starting to fish for ideas.

Another sanity check could be to check the parts errata.

What linker file does your build end up using?

@kjbracey-arm might the image built for the nucleo board run in the processor that @DrynnBavis is using since it really only differs by package?

@DrynnBavis
Copy link
Author

DrynnBavis commented Aug 21, 2019

Another sanity check could be to check the parts errata.

I've been through this for the STM32F4xH already and nothing jumped out at me.

What linker file does your build end up using?

Pretty sure I'm using this one here

@DrynnBavis
Copy link
Author

Some more poking around I found the the osKernelPreInit section of assembly code is simply two lines, first is a nop and the second is just a branch back to the lr. Off-topic here, but is there any point in this and is this OK?

@40Grit
Copy link

40Grit commented Aug 21, 2019

@DrynnBavis if your bootloader has no operational dependency on peripheral IO, I would see what happens if you flash the working binaries built for the nucleo board.

@40Grit
Copy link

40Grit commented Aug 23, 2019

I'm missing something.

Apparently the bootloader and application work fine on the nucleo board. (f413zh)

@kjbracey
Copy link
Contributor

kjbracey commented Aug 23, 2019

It's interesting that this target code feels the need to explicitly set VTOR for ROM anyway.

Given that we've just reset, we surely must have entered our Reset Handler, so VTOR must be correctly set?

The only issue is that the bootloader itself doesn't modify VTOR before jumping into the reset handler manually. That would be a reason to set VTOR in main image, but surely the bootloader should have done it itself...

Edit: actually it does, powerdown_scb does set VTOR to the address of the new image.

@kjbracey
Copy link
Contributor

Apparently the bootloader and application work fine on the nucleo board. (f413zh)

According to whom? Whose bootloader? @DrynnBavis's?

Looking at that code, I don't see how any bootloader would work - FLASH_BASE is just defined to 0x08000000U, so VTOR will not be right, and mbed_copy_nvic will copy from the wrong place.

Mind you, I'm just asserting that by code inspection. No idea who's using a Nucleo F413ZH.

@40Grit
Copy link

40Grit commented Aug 23, 2019

#11140 (comment)

@kjbracey
Copy link
Contributor

But was that working by jumping into the bootloader and all the data happening to be in the same place, as it does on the custom board sometimes, if the bootloader is RTOS-based?

@40Grit
Copy link

40Grit commented Aug 23, 2019

#11140 (comment)

@kjbracey
Copy link
Contributor

kjbracey commented Aug 23, 2019

I see on that issue confirmation that VTOR was correct (0x08020000) entering the image, but by reading the code, SystemInit would then set it to 0x08000000, then mbed_copy_nvic would copy from 0x08000000 into RAM.

The moral of the story may be that SystemInit should not be setting VTOR.

@40Grit
Copy link

40Grit commented Aug 23, 2019

dumb luck that nucleo is working then?

Is this in ST's court now?

@kjbracey
Copy link
Contributor

If all this analysis is correct, @DrynnBavis can modify his own custom target's VTOR setup to match a working STM target. Or just take the VTOR setting out altogether - I don't think it's required.

ST should look at the inconsistency between their targets.

@learytomlin
Copy link

learytomlin commented Aug 23, 2019

I have a very similar issue with my custom bootloader based on the SAML21J18. I can use mbed-trace just fine; however, when I use SDBlockDevice the application fails to start. Should I create a new issue before I go into any more detail here?

@0Grit
Copy link

0Grit commented Aug 23, 2019

New issue please. Reference this one if you want.

@DrynnBavis
Copy link
Author

If all this analysis is correct, @DrynnBavis can modify his own custom target's VTOR setup to match a working STM target. Or just take the VTOR setting out altogether - I don't think it's required.

@kjbracey-arm or @40Grit (your time zone matches mine a little better), what code am I modifying here? Can you give me a file name and line number? Thanks

@0Grit
Copy link

0Grit commented Aug 23, 2019

@DrynnBavis #11205 (comment)

I the comment linked above should give a starting point.

@DrynnBavis
Copy link
Author

Somehow missed this on first read, thanks @loverdeg-ep. I've got to leave for something right now but I'll try this when I return in a few hours.

@0Grit
Copy link

0Grit commented Aug 23, 2019

@DrynnBavis and we actually are probably in the same timezone. I just get up super early and watch github.

It is the only chance I have to get in contact with the experts in Oulu.

@DrynnBavis
Copy link
Author

YES. IT'S WORKING!!!

@kjbracey-arm I tried using NVIC_FLASH_VECTOR_ADDRESS but it was undefined, so I commented this out enitrely:

//     /* Configure the Vector Table location add offset address ------------------*/
// #ifdef VECT_TAB_SRAM
//     SCB->VTOR = SRAM_BASE | VECT_TAB_OFFSET; /* Vector Table Relocation in Internal SRAM */
// #else
//     SCB->VTOR = FLASH_BASE | VECT_TAB_OFFSET; /* Vector Table Relocation in Internal FLASH */
// #endif

This seems to have fixed my troubles for now. Will having this commented out cause any problems in the future? Curious to know why this works on the ZH chip but not the RH chip...

To confirm: I did indeed copy system_clock.c from the nucleo board. Commenting out the Vector Table location and offset address section within SystemInit of system_clock.c fixed my issue.

@DrynnBavis
Copy link
Author

What's the best protocol here in terms of closing this issue? Is there a PR I can make to give a better target definition for the RH chips?

I'm also still interested in exactly why this solved the problem. So I uncommented that section (back to the faulty code) and set a break point right before this point:

#ifdef VECT_TAB_SRAM
    SCB->VTOR = SRAM_BASE | VECT_TAB_OFFSET; /* Vector Table Relocation in Internal SRAM */
#else
    SCB->VTOR = FLASH_BASE | VECT_TAB_OFFSET; /* Vector Table Relocation in Internal FLASH */
#endif
  • using info macro VECT_TAB_SRAM in GDB, I found it to be undefined
  • with the same tool, I found SRAM_BASE = SRAM1_BASE = 0x20000000U
  • and FLASH_BASE = 0x08000000U
  • and finally VECT_TAB_OFFSET = 0x00

So because VECT_TAB_SRAM is undefined, it would seem we are indeed using the bootloader's vector table in ROM rather than the main application's vector table in SRAM. But surely this exact code inside of system_clock.c is being used for the ZH targets? Why is this not a problem there?

Does this have anything to do with me using an LSI rather than an LSE?

@0Grit
Copy link

0Grit commented Aug 23, 2019

I've been through this type of exercise multiple times myself and had the Cortex-M user-manual next to me when writing custom boot-loaders. I go step by step through the assembly and watch the status of the register. This usually gets me the type of understanding you are looking for with issues like these.

Even still I don't know the architecture well enough to diagnose this without spending a couple hours of analysis.

Figure out who is most active from ST in Mbed and copy them on this thread. Or open a ticket with ST and point them to this issue.

However It will probably fall to you to learn the "rule book" and definitively prove exactly why two parts which should? have the same memory map, programmed with the same binary behave differently.

The next step i would take is prove that both parts have all the same peripherals mapped to all the same places in memory.

If that is true I would think the binary should work the same in both as long as neither part were relying on any external signals.

@kjbracey
Copy link
Contributor

Curious to know why this works on the ZH chip but not the RH chip...

There's no evidence there's any chip difference here is there? The code sets VTOR wrong, but gets away with it if (a) there is no bootloader (so "wrong" happens to be right), or (b) the bootloader has an RTOS and the memory layout is the same (or similar enough)

Presumably in all your ZH builds, you've ended up with images with matching RTOS memory layout. If adding/removing trace doesn't affect the ZH, then there's likely no deep meaningful significance - some difference in link order/padding means trace ends up not shifting osRtxInfo in that build. Feel free to investigate the maps, in case it reveals some inconsistency, but it may be nothing.

@DrynnBavis
Copy link
Author

Feel free to investigate the maps, in case it reveals some inconsistency, but it may be nothing.

Very behind on my work now since this issue came up, so first I'll have to work on that. But later this week, if I've the time then I'll definitely look into it.

@LMESTM
Copy link
Contributor

LMESTM commented Aug 27, 2019

@LMESTM - could it be a task for you to make the start-up code for various STM devices consistent, if only some support bootloaders?

@DrynnBavis @kjbracey-arm - sorry I was away when copied here and I missed it.
Most of the vector relocation mechanism has been mostly pushed by arm engineers and has evolved over the past year, but it was indeed not updated consistently over all STM32 targets but rather on a case by case when enabling bootloader feature.

Anyway I agree that we probably have to clean-up thinkgs a little bit now.

To make things clear, does you custom target boots ok if you're modifying system_clock.c file as below rather than commenting out those lines:

    /* Configure the Vector Table location add offset address ------------------*/
#ifdef VECT_TAB_SRAM
    SCB->VTOR = SRAM_BASE | VECT_TAB_OFFSET; /* Vector Table Relocation in Internal SRAM */
#else
    SCB->VTOR = NVIC_FLASH_VECTOR_ADDRESS; /* Vector Table Relocation in Internal FLASH */
#endif

@kjbracey
Copy link
Contributor

In an ideal world, you shouldn't need to set VTOR on boot, if intending to carry on using the ROM vectors. It seems reasonable to assume that as you've just booted, you came through the Reset_Handler, so VTOR must be correct.

The catch is that a bootloader might have entered your reset vector without adjusting VTOR. Our bootloader does set it, but others might not. So setting it seems reasonable.

@LMESTM
Copy link
Contributor

LMESTM commented Aug 27, 2019

This implementation was introduced back in 2017 here: #3798
but I'm also fine with removing completely those lines from all our STM targets if we consider it reliable in mbed-os scope.

@kjbracey
Copy link
Contributor

I've just had a chat with someone who is looking at booting into Mbed OS from another bootloader altogether, and he believes that it doesn't set VTOR, so better safe than sorry. (Mind you, it doesn't even set MSP or CONTROL correctly either, so you'd probably need to add more...).

@40Grit
Copy link

40Grit commented Aug 27, 2019

@kjbracey-arm Maybe this is naive but it would be nice if there were some standards surrounding this stuff.

"CMSIS-boot"

@kjbracey
Copy link
Contributor

kjbracey commented Aug 27, 2019

Well, if the bootloader is jumping into the vector table reset handler of a "raw" standalone image, then it's effectively avoiding having its own standard and using the "reset" standard. In which case it's kind of the bootloader's responsibility to have the chip in perfect reset state, right?

So by that standard, we shouldn't have to write VTOR. But in practice, the main image is the malleable one while bootloaders tend to be locked in, so it's the main image that has to cope with whatever bootloaders do.

@40Grit
Copy link

40Grit commented Aug 27, 2019

The fact that they are "locked in" I'd say is more reason to develop a standard.

That way an application could be better prepared to know what it is dealing with.

I'll keep thinking on this one and find the right place to converse further.

@DrynnBavis
Copy link
Author

To make things clear, does you custom target boots ok if you're modifying system_clock.c file as below rather than commenting out those lines:

    /* Configure the Vector Table location add offset address ------------------*/
#ifdef VECT_TAB_SRAM
    SCB->VTOR = SRAM_BASE | VECT_TAB_OFFSET; /* Vector Table Relocation in Internal SRAM */
#else
    SCB->VTOR = NVIC_FLASH_VECTOR_ADDRESS; /* Vector Table Relocation in Internal FLASH */
#endif

@LMESTM To confirm, no that macro NVIC_FLASH_VECTOR_ADDRESS is undefined. I've commented that chunk completely to get my bootloader/main app working. I haven't the time yet to check where that macro should be defined and see if maybe I could fix that instead.

@DrynnBavis
Copy link
Author

For future readers that have scrolled this far looking for a solution: the fix for me was to comment out that entire chunk of code in my comment just above this inside of the system_clock.c file that I had inside the folder of my custom target (I had copied this file from the folder nucleo_f413zh -- a mbed supported target with an STM32F413ZH chip while I was using an STM32F413RH).

Thank you everyone involved on debugging this. Quite the struggle but I'm really happy we actually came to a fix.

I'm going to close this issue now as the problem has a working solution. Though I'll still be following here (or in another thread if we do that instead)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants