Skip to content

EventQueue timing errors when using tickless mode #13801

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
marcemmers opened this issue Oct 22, 2020 · 20 comments · Fixed by #13867
Closed

EventQueue timing errors when using tickless mode #13801

marcemmers opened this issue Oct 22, 2020 · 20 comments · Fixed by #13867

Comments

@marcemmers
Copy link
Contributor

marcemmers commented Oct 22, 2020

Description of defect

We have recently been experiencing issues with the timing of the EventQueue since switching to mbed 6.1.0 from 5.12.0. After some testing we have found out that the issue originates from the equeue_tick function is equeue_mbed.cpp. This gets the ticks from a different source if its called from an interrupt context (but only in tickless mode).

When debugging we halt the execution when it lands on main or on a breakpoint. While the core is halted the kernel clock is no longer updated but the rtc peripheral is still running. When we start or continue the program the rtc clock now is ahead of the kernel clock. If an event is posted to the queue in interrupt context, the tick count is also read in interrupt context from the rtc clock, so the wakeup time for that event is ahead of the kernel time which is used to determine the next wakeup event. This leads to a delay in event handling. The delay is proportionate to the time spent halted.

I've tried searching for an easy solution but I couldn't find one. In my opinion there should be only one timebase used for the eventqueue ticks but that's easier said then done. The kernel clock gets updated after the interrupts have been handled and its not that easy to update it while still in the critical sleep section.

There is an issue on the forum which looks to be the same issue: https://forums.mbed.com/t/eventqueue-call-target-method-triggers-much-later-than-expected/10163. It was "solved" by disabling tickless mode but that is not an option for us.

Target(s) affected by this defect ?

NRF52840 and probably NUCLEO_L476RG

Toolchain(s) (name and version) displaying this defect ?

IAR 8.32.3

What version of Mbed-os are you using (tag or sha) ?

mbed-os-6.1.0

What version(s) of tools are you using. List all that apply (E.g. mbed-cli)

n/a

How is this defect reproduced ?

The code from the forum should reproduce this issue. I don't know if it is necessary to halt execution for a while to reproduce the issue.

#include "mbed.h"

InterruptIn btnInt(BUTTON1);                     

Thread thread_eq_events;        
EventQueue eq_events;
volatile int isrCallTime;

Timer timer;

void event_function_periodic()
{
    printf("[DEBUG - event_function_periodic] at %d ms\n", timer.read_ms());
}

void event_function_interrupt_handler(bool state)
{
    printf("[DEBUG - event_function_interrupt_handler] delayed (deferred) by %d ms (%s)\n", timer.read_ms() - isrCallTime, state ? "L->H" : "H->L");
}

void btn_interrupt_handler()
{
    isrCallTime = timer.read_ms();
    eq_events.call(&event_function_interrupt_handler, btnInt.read());
}

//////////
// MAIN //
//////////
int main()
{
    eq_events.call_every(1000, &event_function_periodic);                               
    
    btnInt.rise(&btn_interrupt_handler);
    btnInt.fall(&btn_interrupt_handler);
    
    thread_eq_events.start(callback(&eq_events, &EventQueue::dispatch_forever));

    timer.start(); 
}
@ciarmcom
Copy link
Member

@marcemmers thank you for raising this issue.Please take a look at the following comments:

It would help if you could also specify the versions of any tools you are using?

NOTE: If there are fields which are not applicable then please just add 'n/a' or 'None'.This indicates to us that at least all the fields have been considered.
Please update the issue header with the missing information, the issue will not be mirroredto our internal defect tracking system or investigated until this has been fully resolved.

@ciarmcom
Copy link
Member

Thank you for raising this detailed GitHub issue. I am now notifying our internal issue triagers.
Internal Jira reference: https://jira.arm.com/browse/IOTOSM-2585

@marcemmers
Copy link
Contributor Author

marcemmers commented Nov 2, 2020

We have updated to OS version 6.4.0 and I can confirm that the issue is still present. It would be nice to get some feedback on this because this is kind of a showstopper for us.

@0xc0170
Copy link
Contributor

0xc0170 commented Nov 3, 2020

We have updated to OS version 6.4.0 and I can confirm that the issue is still present. It would be nice to get some feedback on this because this is kind of a showstopper for us.

@ARMmbed/mbed-os-core

When debugging we halt the execution when it lands on main or on a breakpoint.

The problem is only when debugging ?

There is an issue on the forum which looks to be the same issue: https://forums.mbed.com/t/eventqueue-call-target-method-triggers-much-later-than-expected/10163. It was "solved" by disabling tickless mode but that is not an option for us.

Didn't they actually fix it by disabling sleep manager for the period they needed to process something fast? Then reenable.

We have updated to OS version 6.4.0 and I can confirm that the issue is still present. It would be nice to get some feedback on this because this is kind of a showstopper for us.

What is the version it works, only 5.12.0 (so we shall compare 5.15.0 vs 6.1.0 ?)

What value does MBED_CONF_EVENTS_USE_LOWPOWER_TIMER_TICKER have? are you using it or not?

@marcemmers
Copy link
Contributor Author

marcemmers commented Nov 4, 2020

The problem is only when debugging ?

The issue is indeed only present when debugging or halting the cpu in any way. This stems from the RTC continuing running in the background but the systick irq not being handled.

Didn't they actually fix it by disabling sleep manager for the period they needed to process something fast? Then reenable.

Looks to me like maiorfi responded to hudakz that the issue was fixed. Hudakz was talking about the tickless option.

What is the version it works, only 5.12.0 (so we shall compare 5.15.0 vs 6.1.0 ?)

I think the issue originates from the changes made to equeue_mbed.cpp in this PR: #10104. There is a change going from a single clock source to multiple clock sources depending on the isr context. This means 5.13.0 should work and 5.14.0 would have the issue.

What value does MBED_CONF_EVENTS_USE_LOWPOWER_TIMER_TICKER have? are you using it or not?

MBED_CONF_EVENTS_USE_LOWPOWER_TIMER_TICKER is 0 but that does not have any effect because MBED_CONF_RTOS_API_PRESENT is 1 so MBED_CONF_EVENTS_USE_LOWPOWER_TIMER_TICKER is not even checked.


I can reproduce the issue with a small bit of code which also shows the timing source issue I was talking about. I use the SEGGER RTT functionality so there is no influence to the timing from a serial port. For the record, this was tested on a NRF52840_DK with shared_dispatch_from_application enabled.

main.cpp:

#include "mbed.h"
#include "SEGGER_RTT.h"

InterruptIn button1( BUTTON1, PullUp );
LowPowerTimer timer;

void PressedCallback();

uint32_t PreviousValue;

void TimerCallback()
{
    uint32_t tick = timer.read_ms();
    SEGGER_RTT_printf( 0, "Callback: %d, diff: %d\n", tick, tick - PreviousValue );
    button1.fall( &PressedCallback );
}

void PressedCallback()
{
    PreviousValue = timer.read_ms();
    SEGGER_RTT_printf( 0, "Pressed:  %d\n", PreviousValue );

    button1.fall( nullptr );
    mbed_event_queue()->call_in( 1s, &TimerCallback );
}

int main()
{
    button1.fall( &PressedCallback );
    timer.start();

    mbed_event_queue()->dispatch_forever();
    return 0;
}

I also added a print in the equeue_tick function:

#include "SEGGER_RTT.h"

unsigned equeue_tick()
{
    using unsigned_ms_t = std::chrono::duration<unsigned, std::milli>;

    unsigned_ms_t d;
#if defined MBED_TICKLESS || !MBED_CONF_RTOS_PRESENT
    // It is not safe to call get_ms_count from ISRs, both
    // because documentation says so, and because it will give
    // a stale value from the RTOS if the interrupt has woken
    // us out of sleep - the RTOS will not have updated its
    // ticks yet.
    if (core_util_is_isr_active()) {
        // And the documentation further says that this
        // should not be called from critical sections, for
        // performance reasons, but I don't have a good
        // current alternative!
        d = std::chrono::duration_cast<unsigned_ms_t>(mbed::internal::os_timer->get_time().time_since_epoch());
        SEGGER_RTT_printf( 0, "tick_irq: %d\n", d);
    } else {
        d = rtos::Kernel::Clock::now().time_since_epoch();
        SEGGER_RTT_printf( 0, "tick_now: %d\n", d );
    }
#else
    // And this is the legacy behaviour - if running in
    // non-tickless mode, this works fine, despite Mbed OS
    // documentation saying no. (Most recent CMSIS-RTOS
    // permits `ososKernelGetTickCount` from IRQ, and our
    // `rtos::Kernel` wrapper copes too).
    d = rtos::Kernel::Clock::now().time_since_epoch();
#endif
    return d.count();
}

This is the result when the execution was halted for a while at the start:

tick_now: 142
tick_now: 142
tick_now: 143
Pressed:  4640
tick_irq: 11439
tick_now: 4783
tick_now: 4783
tick_now: 12439
Callback: 12296, diff: 7656
tick_now: 12439
Pressed:  16846
tick_irq: 23645
tick_now: 16989
tick_now: 16989
tick_now: 24645
Callback: 24502, diff: 7656
tick_now: 24645

This is the result when a hardware reset was performed without the debugger attached:

tick_now: 0
tick_now: 0
tick_now: 0
Pressed:  7455
tick_irq: 7456
tick_now: 7456
tick_now: 7456
tick_now: 8456
Callback: 8455, diff: 1000
tick_now: 8456
Pressed:  10458
tick_irq: 10458
tick_now: 10458
tick_now: 10458
tick_now: 11458
Callback: 11457, diff: 999
tick_now: 11458

Even when ran into a breakpoint the timing starts to go off. This is tested with the following steps:

  • Start the processor with a hardware reset
  • Attach to running target (using IAR)
  • Press button to see if the diff is still ok
  • Set breakpoint in TimerCallback()
  • Press button again to run into breakpoint
  • Continue running after a few seconds
  • Press button again, now with wrong timing
tick_now: 0
tick_now: 0
tick_now: 0
Pressed:  15141
tick_irq: 15141
tick_now: 15142
tick_now: 15142
tick_now: 16141
Callback: 16140, diff: 999
tick_now: 16141
Pressed:  28456
tick_irq: 28457
tick_now: 28457
tick_now: 28457
tick_now: 29457
Callback: 29456, diff: 1000
tick_now: 29472

(Breakpoint here)

Pressed:  35372
tick_irq: 35372
tick_now: 32300
tick_now: 32300
tick_now: 36372
Callback: 39443, diff: 4071
tick_now: 36473

@0xc0170
Copy link
Contributor

0xc0170 commented Nov 4, 2020

The issue is indeed only present when debugging or halting the cpu in any way. This stems from the RTC continuing running in the background but the systick irq not being handled.

@kjbracey-arm Can you review?

@kjbracey
Copy link
Contributor

kjbracey commented Nov 4, 2020

Some thoughts.

The time APIs being looked at here are supposed to all be coming from the same ultimate source (HAL lpticker in this config, so RTC), and they should be being kept in sync.

This is a tickless RTOS build, right, not a non-RTOS build?

Does this happen while stopping with the debugger while running (eg a tight spin loop), or sleeping, or both? That changes what's going on - while running we are ticking, while sleeping we have separate calculations.

While running, I think overrun should be dealt with by triggering multiple ticks after a hold-up. Every accumulated 1000us read out from the lpticker clock should deliver a tick to RTX - getting a back-to-back stream of them if necessary.

In the event that the overrun is occurring due to debugging that intended tick burst might be unhelpful. I can't remember if I deliberately limited how many it would deliver, but if it is limited, then it should be tracking the ongoing total/1000 versus reported ticks difference. Maybe it is internally, but that hack is assuming they're equivalent.

Aside from the problem here - you might want to change some config registers in the device to stop the clock source when stopping the CPU. It's probably ultimately less confusing overall to freeze everything. Maybe not viable if the source is the RTC.

@kjbracey
Copy link
Contributor

kjbracey commented Nov 4, 2020

Looking more closely, it looks like it should be generating catch-up bursts.

There are two layers here - firstly TimerEvent uses insert_absolute to request interrupts from the TimerEvent layer for every 1ms. If the time has already run past the requested time, it should generate an interrupt immediately.

And then those interrupts in SysTimer::handler should send one tick request at a time to RTX, keeping track of how many it still needs to send in _unacknowledged_ticks. Whenever RTX acknowledges a tick, it should schedule another one immediately if there are some unacknowledged.

I can see there's a danger of time being non-monotonic, due to the two readouts, but in principle it could be okay. During the "wake up from stop" burst, there should be back-to-back interrupts keeping us using the top "in ISR" path giving "raw" numbers. Once we've caught up, then it can use the RTX numbers.

One of the reasons for this messing is that without it, what happens is that when woken up from tickless sleep, equeue_tick asks RTX what the time is, and RTX still reports the time is the same as when you went to sleep, because it's not done anything to update the clock yet. So the equeue system goes nowhere.

Still seems like there's something wrong in the timer system, but maybe some rework to equeue could stop it asking about time in interrupt context?

@marcemmers
Copy link
Contributor Author

marcemmers commented Nov 4, 2020

This is a tickless RTOS build, right, not a non-RTOS build?

This is indeed a normal RTOS build just with tickless enabled for low power.

Does this happen while stopping with the debugger while running (eg a tight spin loop), or sleeping, or both? That changes what's going on - while running we are ticking, while sleeping we have separate calculations.

When stopping while the device is in sleep the issue is not there. I suspect because in that case I will be halted before the exit of hal_sleep() so this is before the kernel is restarted.

As far as I've seen there will be no burst of kernel tick increments after a stop. Instead, the kernel is suspended before going to sleep and resumed after sleep; the time slept is being calculated in do_timed_sleep_relative_to_acknowledged_ticks() and then being passed to the kernel on the resume call.

Interrupts will be handled before the kernel is resumed and thus the ticks are not updated yet. So I understand why there is a different time source in interrupts but that is also kind of the source of the problem.

Aside from the problem here - you might want to change some config registers in the device to stop the clock source when stopping the CPU. It's probably ultimately less confusing overall to freeze everything. Maybe not viable if the source is the RTC.

I've looked into this and this is indeed not an option with the NRF52 series.

@kjbracey
Copy link
Contributor

kjbracey commented Nov 4, 2020

As far as I've seen there will be no burst of kernel tick increments after a stop. Instead, the kernel is suspended before going to sleep and resumed after sleep

I'd expect the catch-up burst to occur if stopped in the running state, not sleeping. While running the timing system needs to deliver a tick call every 1ms to RTX - there's no other way to feed it time progress, so it would have to send a burst to keep up the uniform time base.

If stopped while sleeping, I'd expect that "how long were we asleep" code for tickless should just measure it and do its job.

@marcemmers
Copy link
Contributor Author

Ok that makes sense. So indeed the issues are occurring in running stage.

If we are halted in running state SysTimer<Period, IRQ>::handler() would not run as well and that is the only place where _unacknowledged_ticks is being incremented. At least, it depends on the core and the debug settings if interrupt can be handled while de debugger has halted running.

Even if it would be running, _unacknowledged_ticks being an uint8_t would cause overflows and can be easily fixed using a bigger type.

@kjbracey
Copy link
Contributor

kjbracey commented Nov 4, 2020

When you start running after a debugger freeze, there would be a pending timer IRQ so it would call handler once immediately, and that should request the next timer event at the absolute time "last scheduled tick time + 1 tick" - ie already in the past, and that should reschedule handler immediately again, and that would repeat hundreds or thousands of times, until we finally request something in the future.

All of this was designed only for occasional overruns in normal operation, and the uint8_t overflows fast in this case. Try making it a uint32_t.

System could be optimised to handle this by noting the large chunk of missing time and just doing a bulk increment to unacknowledged ticks, but not sure if that might complicate the normal fast path.

@kjbracey
Copy link
Contributor

kjbracey commented Nov 4, 2020

I have a vague recollection that part of the intent might have been that I didn't want to be calling catch up things thousands of times after a big debugger-type freeze, so making that count uint8_t would just quietly wrap and we'd throw some time away. But apparently that wrap leaves everything out-of-sync.

EDIT: comment in SysTimer::acknowledge_tick agrees with that memory.

@marcemmers
Copy link
Contributor Author

marcemmers commented Nov 4, 2020

Switching to uint32_t indeed fixes the issue so the tick handling works as intended.

As you were saying the only issue with this is the amount of operations required to process all the unacknowledged ticks after several seconds of halted state.

Unfortunately there is no API available in the rtx layer to increment the ticks by a set amount without suspending and resuming the kernel or calling the rtx tick handler. The latter also performing the round robin routine which is not necessary because there are more ticks to follow.

@kjbracey
Copy link
Contributor

kjbracey commented Nov 4, 2020

The coyness about not having a big tick backlog to RTX seems futile in this situation, because we are still doing an awful lot of insert_absolute and handler calls back-to-back. There's no limit to the backlog there.

Maybe to speed that up there could be a variant of insert_absolute that rather than doing "immediate callback" for "requested time in the past" returns a value to say, "no, you've missed that time, and here's by how much". But even if we had that answer, as you say, there's no way to feed it to RTX smoothly. But we could at least push that straight into _unacknowledged_ticks.

The aim here is to try to have the time base run smoothly against real time - I was thinking more of system suspend type scenarios, and current state-of-the-art appears to be that monotonic timers do track time spent in system suspend, so there can sometimes be catch-up on wakeup. But most of that case in RTX would be dealt with via the osKernelResume parameter.

Whether that's the best thing to do for debugging freezes is less clear. We could choose to drop ticks beyond some threshold.
However that's done (wrap of the basic counter, or seeing a big response from an insert_absolute variant), we could log them in a _dropped_ticks member, and include that in internal calculations, and make sure that equeue incorporates that in its 2-mode behaviour.

But for now I'd be inclined to keep it simple, and just raise _unacknowleged_ticks to 32-bit, if that clears this problem, and you're not yet seeing other bad effects on resume.

@marcemmers
Copy link
Contributor Author

But for now I'd be inclined to keep it simple, and just raise _unacknowleged_ticks to 32-bit, if that clears this problem, and you're not yet seeing other bad effects on resume.

I agree with this in that this is the easiest solution for the issue we have now. Since this only happens when one is debugging and halting the system I suppose the impact of a large backlog should be smaller. If you can halt the system for a certain amount of time then there should be no issue in processing the back log after continuing execution.

It would be nice to add a method to prevent all the cyclic context switches when processing the backlog to improve performance but that's really a nice to have.

I will try testing the 32 bit solution with the normal software and see where that brings us.

@kjbracey
Copy link
Contributor

kjbracey commented Nov 5, 2020

Same problem has been seen without a debugger in #13772. Pretty nasty system starvation from some flash write, but they otherwise get away with it because it's just a one-off start-up event. 32-bit counter would fix that too.

@marcemmers
Copy link
Contributor Author

I did search for similar issues but I guess I missed that one.

In my tests there is the same multiple of 256ms plus 1000 for the second sleep time.

Do you want me to make a PR or will you do it yourself?

@kjbracey
Copy link
Contributor

kjbracey commented Nov 5, 2020

I was mentioned a few times in that one but missed it. Seems @mikaleppanen was ahead of us.

Could you make a PR for 32-bit? (And some intelligent adjustment to that comment about 8-bit - add a link to this issue.)

@0xc0170
Copy link
Contributor

0xc0170 commented Nov 5, 2020

Good findings here !

@ciarmcom ciarmcom removed the mirrored label Feb 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants