Skip to content

mbed os crash on connecting ethernet adapter connect K64 #5680

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
NirSonnenschein opened this issue Dec 10, 2017 · 16 comments
Closed

mbed os crash on connecting ethernet adapter connect K64 #5680

NirSonnenschein opened this issue Dec 10, 2017 · 16 comments

Comments

@NirSonnenschein
Copy link
Contributor

NirSonnenschein commented Dec 10, 2017

Description

  • Type: Bug
  • Priority: Major

we see an error when connecting the Ethernet adaptor adaptor on K64f running MbedOS compiled under ARMCC. This doesn't always happen, but occurs quite often.
when it occurs we see the following print:
Thread 00000000 error -4: Parameter error


Bug

Target
K64F

Toolchain:
ARM (mostly on armcc)

Toolchain version:

mbed-cli version:
5.6 and 5.7

mbed-os sha:
(git log -n1 --oneline)

DAPLink version:

Expected behavior
out code creates and Ethernet interface object calls the connect function.
the connect doesn't succeed or return an org.

Actual behavior
the test hangs and we see the following print:
Thread 00000000 error -4: Parameter error

Steps to reproduce
we run the following code as part of the network initialization for the K64F

 netInterface = new EthernetInterface();
printf("new interface created\r\n");
status = netInterface->connect();
if (NSAPI_ERROR_OK == status)
{
     printf("interface registered : OK \r\n");
}
@0xc0170
Copy link
Contributor

0xc0170 commented Dec 11, 2017

@NirSonnenschein somewhere in the code, osErrorParameter is captured, but it is not clear from where it comes, were you able to at least find out the function that is causing this error parameter. From the code snippet you shared above, I would guess connect() or anything missing there?

@kjbracey-arm @SeppoTakalo

@kjbracey
Copy link
Contributor

I assume this is a debug profile build? The message comes from EvrRtxThreadError, which is a hook to catch RTX errors.

If you could stick a breakpoint on that to get a stack backtrace, it would help - I can see about a dozen places that could possibly call it with (NULL, osErrorParameter), and it's not obvious what the culprit would be. Most of them are simply calling functions with a NULL thread pointer, plus a couple of others.

This is during the connect call, right?

@NirSonnenschein
Copy link
Contributor Author

Hi @0xc0170 and @kjbracey-arm,
just to clarify, the error almost definitely happens in the connect call. we have a print before and after the call (in cases of success or failure) and when this happens we don't see any of the prints after connect.

as a general background this happens occasionally on our nightly tests (e.g. nightly for two nights ago had it, but last night didn't). the particular configuration which failed in this case was mbedOS compiled with armcc in debug mode. This issue doesn't seem to reproduce cleanly when testing locally. I'll try this again today, if I'm able to reproduce locally I can try to use a breakpoint.
I can also provide the bin / elf for the image in question if that will help.

@kjbracey
Copy link
Contributor

It seems moderately likely it might be the consequence of connection failure - some sort of teardown when giving up not going cleanly. Maybe you could encourage it by persuading connect failure - yank the cable at the crucial moment...

@NirSonnenschein
Copy link
Contributor Author

NirSonnenschein commented Dec 11, 2017

Hi @kjbracey-arm ,
Thanks for the tip, I'll try this if I'm not able to locally reproduce the issue by normal means

@NirSonnenschein
Copy link
Contributor Author

I've tried reproducing locally (including disconnecting the Ethernet wire during testing) and so far I have not been able to reproduce the issue. this seems to be more readily reproducible in the Jenkins test environment.
when disconnecting the cable during the connect step the tests halt for a while (presumably waiting for HDCP to complete) and then fail (no crash observed).

@kjbracey
Copy link
Contributor

Any chance it's this bug? #5587

Can't immediately see why we'd hit it, but it is the same error printout.

@alekshex
Copy link
Contributor

small update, happens on gcc arm also (caught in debug):
new interface created
Thread 0x0 error -4: Parameter error

@NirSonnenschein
Copy link
Contributor Author

yes the issue seems to reproduce more easily in the Jenkins lab environment (happens there pretty often but I was not able to reproduce on the local network).

@ryankurte
Copy link
Contributor

ryankurte commented Dec 20, 2017

I'm having a similar / possibly the same issue during network stack init with mbed commit 4d81eadb2 using gcc-arm on the EFR32FG12_BRD4254A target.
Mentioned in #5579 and manually applied the patch from #5587 with no effect.

Error occurs at rtos/TARGET_CORTEX/rtx5/RTX/Source/rtx_thread.c:1349 in uint32_t svcRtxThreadFlagsSet (osThreadId_t thread_id, uint32_t flags)

   |1346      // Check parameters                                                                                                                                                              │1347      if ((thread == NULL) || (thread->id != osRtxIdThread) ||                                                                                                                         │
   │1348          (flags & ~((1U << osRtxThreadFlagsLimit) - 1U))) {                                                                                                                           │
B+>1349        EvrRtxThreadError(thread, osErrorParameter);                                                                                                                                   │
   │1350        return ((uint32_t)osErrorParameter);                                                                                                                                           │
   │1351      }

Serial output:

[INFO][brro]: PANID: 691
[INFO][brro]: NET_IPV6_BOOTSTRAP_AUTONOMOUS
[WARN][brro]: Security NOT enabled
0m[DBG ][core]: NS Root task Init
[0m

[DBG ][sck ]: Socket Tasklet Generated
[sck ]: Socket Task
Thread 0x0 error -4: Parameter error

Backtrace:

Breakpoint 3, svcRtxThreadFlagsSet (thread_id=0x0 <osRegisterForOsEvents>, flags=512) at ./mbed-os/rtos/TARGET_CORTEX/rtx5/RTX/Source/rtx_thread.c:1349
(gdb) bt
#0  svcRtxThreadFlagsSet (thread_id=0x0 <osRegisterForOsEvents>, flags=512) at ./mbed-os/rtos/TARGET_CORTEX/rtx5/RTX/Source/rtx_thread.c:1349
#1  0x0004f324 in SVC_Handler () at irq_cm4f.S:59
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) p thread
$9 = (osRtxThread_t *) 0x0 <osRegisterForOsEvents>
(gdb) p *thread
$10 = {id = 0 '\000', state = 0 '\000', flags = 4 '\004', attr = 32 ' ',
  name = 0x53f41 <Reset_Handler> "H\200G\006I\aJ\aK\232B\276\277Q\370\004\vB\370\004\v\370\347\254\367", <incomplete sequence \371\135\100\005>, thread_next = 0x53f6d <WTIMER1_IRQHandler>,
  thread_prev = 0x58c09 <HardFault_Handler()>, delay_next = 0x53f6d <WTIMER1_IRQHandler>, delay_prev = 0x53f6d <WTIMER1_IRQHandler>, thread_join = 0x53f6d <WTIMER1_IRQHandler>,
  delay = 343917, priority = 109 'm', priority_base = 63 '?', stack_frame = 5 '\005', flags_options = 0 '\000', wait_flags = 343917, thread_flags = 343917,
  mutex_list = 0x4f311 <SVC_Handler>, stack_mem = 0x53f6d <WTIMER1_IRQHandler>, stack_size = 343917, sp = 324519, thread_addr = 324535, tz_memory = 343917,
  context = 0x5eb55 <FRC_PRI_IRQHandler>}
(gdb)

It appears something is prompting a SVC interrupt with an invalid thread ID, but i'm not sure how and haven't worked out how to catch it prior to execution yet.

@kjbracey
Copy link
Contributor

kjbracey commented Dec 20, 2017

Ta for the info!

That was enough to pin it down. (Despite the annoyance that debuggers keep failing to get through exception stack frames.)

It's an ordering error in the K64F driver - it's installing its interrupt handler in low_level_init via

ENET_SetCallback(&g_handle, ethernet_callback, netif);

ethernet_callback calls osThreadFlagsSet(k64f_enetdata.thread).

k64f_enetdata.thread isn't initialised until later, so there's a brief window where a receive interrupt can happen and ethernet_callback will use a null thread ID.

This is not terribly harmful, but the "trap errors" thing in the debug build intercepts it, reasonably enough.

Possible fixes:

  • change the start-up order so the thread is initialised first (should probably kill it again if low_level_init errors)
  • delay the ENET_SetCallback until after thread init (means you might process packets received during init much later - effectively existing behaviour)
  • make ethernet_callback check for thread id being NULL (same effect as previous)
  • don't use thread flags, use event flags, which means the callback can start setting them before the thread is initialised, and the thread will consume as soon as it starts

@SeppoTakalo
Copy link
Contributor

Which Ethernet driver? The LwIP one or the Nanostack one, or both?

That later debug print looks like border router so I'm assuming it is Nanostack's driver or both.

@kjbracey
Copy link
Contributor

Hang on, your #5579 is actually about a Nanostack issue. Not K64F at all. Oh well, you've helped solve this issue.

So it seems that both pieces of code probably have the same flaw - calling osThreadFlagsSet before the thread is ready. Not identified the path to it with Nanostack yet.

@ryankurte
Copy link
Contributor

Yep, yep, different cause but suspect it's the same flaw. I can open another issue if you'd like?

Looks like in NanostackRfPhyEfr32.cpp callbacks are enabled at NanostackRfPhyEfr32.cpp#L374 and the thread isn't started until NanostackRfPhyEfr32.cpp#L468, will have a shot at reordering it and see if that helps.

I wonder what changed that this is now a runtime error / how many other things it is likely to effect.

@kjbracey
Copy link
Contributor

This is only a runtime error with the RTX error trapping on, which is only in debug builds since 5.6 I think, unless that's changed. More people testing debug builds now?

kjbracey added a commit to kjbracey/mbed-os that referenced this issue Dec 21, 2017
The K64F Ethernet driver installs an interrupt handler that sets thread
flags, and this could be called before the thread was initialised, so it
would use a NULL thread ID.

This triggers an RTX error-checking trap in debug builds, and could also
lead to other problems with received packets not being processed.

Adjusted so the RX interrupt handler does nothing if the thread isn't
initialised yet, and manually trigger a RX event flag after initialising
the thread in case any interrupts were ignored.

An alternative would have been to implement eth_arch_enable_interrupts,
but this mechanism is not present in the EMAC world - drivers will have
to start returning interrupts in their power up.

Fixes ARMmbed#5680
@ryankurte
Copy link
Contributor

The silent / nearly impossible to debug runtime error handling in release builds cost me almost a month of head bashing before I worked out #5155, I wouldn't be surprised at all if / hope that is the case.

adbridge pushed a commit that referenced this issue Dec 29, 2017
The K64F Ethernet driver installs an interrupt handler that sets thread
flags, and this could be called before the thread was initialised, so it
would use a NULL thread ID.

This triggers an RTX error-checking trap in debug builds, and could also
lead to other problems with received packets not being processed.

Adjusted so the RX interrupt handler does nothing if the thread isn't
initialised yet, and manually trigger a RX event flag after initialising
the thread in case any interrupts were ignored.

An alternative would have been to implement eth_arch_enable_interrupts,
but this mechanism is not present in the EMAC world - drivers will have
to start returning interrupts in their power up.

Fixes #5680
adbridge pushed a commit that referenced this issue Dec 29, 2017
The K64F Ethernet driver installs an interrupt handler that sets thread
flags, and this could be called before the thread was initialised, so it
would use a NULL thread ID.

This triggers an RTX error-checking trap in debug builds, and could also
lead to other problems with received packets not being processed.

Adjusted so the RX interrupt handler does nothing if the thread isn't
initialised yet, and manually trigger a RX event flag after initialising
the thread in case any interrupts were ignored.

An alternative would have been to implement eth_arch_enable_interrupts,
but this mechanism is not present in the EMAC world - drivers will have
to start returning interrupts in their power up.

Fixes #5680
adbridge pushed a commit that referenced this issue Jan 2, 2018
The K64F Ethernet driver installs an interrupt handler that sets thread
flags, and this could be called before the thread was initialised, so it
would use a NULL thread ID.

This triggers an RTX error-checking trap in debug builds, and could also
lead to other problems with received packets not being processed.

Adjusted so the RX interrupt handler does nothing if the thread isn't
initialised yet, and manually trigger a RX event flag after initialising
the thread in case any interrupts were ignored.

An alternative would have been to implement eth_arch_enable_interrupts,
but this mechanism is not present in the EMAC world - drivers will have
to start returning interrupts in their power up.

Fixes #5680
ryankurte added a commit to ryankurte/mbed-os that referenced this issue Jan 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants