Skip to content

Application hangs and is not responsive to user's actions #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wmarkow opened this issue Feb 25, 2018 · 15 comments
Open

Application hangs and is not responsive to user's actions #3

wmarkow opened this issue Feb 25, 2018 · 15 comments

Comments

@wmarkow
Copy link
Owner

wmarkow commented Feb 25, 2018

I have noticed that since some time (when I started to use RDS) application hangs. It is not responsive: the LCD screen freezes and do not displays new RDS messages or do not displays current volume. Moreover, it is not possible to change volume; nothing works. After power reset it works fine, until it hangs again.

This issue is highly reproducible when I power up the system from USB wall charger by Chicony Power Technology, AC Adapter, Model W12-010N3B. When I power up the system from my laptop's USB it hangs very rare. So it has something to do with the power supply.

However it would be good to know at what line of the source code it hangs. It would be a good idea to use watchdog to reset the board.

@wmarkow
Copy link
Owner Author

wmarkow commented Feb 25, 2018

I have found Megunolink/ArduinoCrashMonitor which uses watchdog feature to:

  • help to find a source code line that was being executed when watchdog timeouted
  • resets the board after watchdog timeout

After some tries I have found two different places, where it freezes:

  • Wire/src/utility/twi.c method uint8_t twi_readFrom(uint8_t address, uint8_t* data, uint8_t length, uint8_t sendStop) line 196:
while(TWI_MRX == twi_state){
    continue;
}
  • Wire/src/utility/twi.c method uuint8_t twi_writeTo(uint8_t address, uint8_t* data, uint8_t length, uint8_t wait, uint8_t sendStop) line 278:
while(wait && (TWI_MTX == twi_state)){
    continue;
}

Interesting.

wmarkow added a commit that referenced this issue Feb 25, 2018
…duinoCrashMonitor to find where the code freezes.
@wmarkow
Copy link
Owner Author

wmarkow commented Feb 28, 2018

I'm using arduino hardware in version 1.6.17.

@wmarkow wmarkow changed the title Application hangs and is not responsible to user's actions Application hangs and is not responsive to user's actions Mar 2, 2018
@wmarkow
Copy link
Owner Author

wmarkow commented Mar 2, 2018

I just want to summarize some things, before I proceed with investigations:

  • I'm communicating with three I2C devices: LCD through port expander, PT2314 (preamplifier), RDA5807 (radio tuner),
  • I'm updating LCD once per 250ms
  • I'm updating preamplifier very often (directly in Arduino's loop)
  • I'm communicating with radio tuner very often (directly in Arduino's loop)
  • my wall USB charger (which I'm using to power up the device) seems to be not good enough as mostly with this power supply the issue is reproducible

It could be that all of the facts above make, that it is easy for me to reproduce the issue when the program freezes in Wire library. It may be a good entry point to investigate the Wire library and try to improve something.

@wmarkow
Copy link
Owner Author

wmarkow commented Mar 2, 2018

As I have written before, I started to use ArduinoCrashMonitor. I'm using this library to:

  • find the source code line that was being executed when watchdog timeouted
  • resets the board after watchdog timeout (I set the timeout to 4 seconds)

Additionally I added a piece of code in the Arduino's loop, that will blink the internal diode. If the diode is blinking then it means that loop is being constantly executed and the program doesn't freeze.
When the program freezes (for example somewhere in Wire library) the internal diode doesn't blink for about 4 seconds, then Arduino is restarted and program is back to life.

The plan is: download Adruino sources and try to modify internal Wire library. Some hints can be found:

@wmarkow
Copy link
Owner Author

wmarkow commented Mar 2, 2018

With 703496c commit (and a commit before) and standard Wire library and ArduinoCrashMonitor, I have noticed the following behavior:

  • the internal LED blinks
  • when the code freezes, the internal diode stops blinking and program doesn't respond to user actions (like changing volume, LCD is not updated)
  • the freeze code address is reported by ArduinoCrashMonitor, it freezes mostly in Wire library around while(TWI_MRX == twi_state){ codes in twi_readFrom or twi_writeTo methods.
  • after 4-6 seconds Arduino get restarted and everything works fine again

@wmarkow
Copy link
Owner Author

wmarkow commented Mar 3, 2018

With wmarkow/Arduino@4028bdc commit of my Arduino's fork I have modified the Wire library so it have a timeouts in twi_readFrom or twi_writeTo methods where it checks the twi_state variable.
This is only a partial fix and not all of the while loops have been modified. As a default the timeouts are disabled, so it will behave in the same as original Wire library.
To enable the timeout you need to call a proper method in your Arduino setup():
Wire.setTimeoutInMillis(50);
this will set timeout to 50 milliseconds (but of course you can set a different timeout value).

With my modified Wire library and ArduinoCrashMonitor, I have noticed be following behavior:

  • the internal LED blinks
  • sometimes the I2C communication is broken (LCD is not updated and teh volume change doesn't work) but the internal LED is blinking. I have noticed that it doesn't blink so fast as it is set in my program;
    it blinks slower, like the time has been slowed down. It may be the result of broken I2C state and the timeouts I have introduced are now used and it take more time to execute one Arduino loop().

Sometimes the the code freezes, the internal diode stops blinking and program doesn't respond to user actions (like changing volume, LCD is not updated). Probably this is a result of not all
while loops being modified. After a few seconds Arduino gets restarted by watchdog, but this time no crash has been reported by ArduinoCrashMonitor. Why? I started to take a look
again at the Wire library code and I have found a while loop in void twi_stop(void). This method is only called from TWI interrupt, so it probably block in the interrupt. If TWI interrupt
is being executed the watchdog interrupt will wait but will never be executed (because TWI is in infinite loop). This results in the fact that ArduinoCrashMonitor will not be able report any crash.
Fortunatelly the AVR watchdog will finally reset Arduino in that case as well.

Next part of my investigation is a loop in void twi_stop(void) method.

@wmarkow
Copy link
Owner Author

wmarkow commented Mar 3, 2018

With wmarkow/Arduino@afc25a6 commit of my Arduino's fork I have modified the Wire library so it now has a timeout in void twi_stop(void). Since this method is executed from interrupt
service routine I couldn't use millis() because it will not work correctly (it may give false time results when it is used in ISR). I used some counter that counts how many times the
while loops is executed; when some limit is reached the method timeouts.

With this solution I have never observed a freeze (however I didn't modified all while loops yet). The LED still blinks (but blinks slower, so the timouts are working), so the Arduino's loop
is being constantly executed.

Good thing that Arduino not freezes but my program is still not responsive to user: probably because there is something wrong with I2C and I can not communicate with my devices by I2C: LCD is not
updated, volume doesn't work, and so on. Previously at least watchdog restarted Arduino and everything was fine :) When I press Arduino reset button, then everything starts working correctly.

Maybe I'm missing something and to recover from that state I need to do something else (like setting some specific TWI register)?

@wmarkow
Copy link
Owner Author

wmarkow commented Mar 6, 2018

With arduino/Arduino@b325055 commit I have managed to recover TWI interface (at least it works in my project).

@MauroMombelli
Copy link

MauroMombelli commented Mar 8, 2018

hello, i just notice your work. Instead of millis() you can use micros(), that will be updated even in a ISR state, as is done by HW.

Rember that when the system freeze, the best solution in reset the HW i2c and restart teh operation.

I suggest using the TWI library directly (is what wire uses under the hood) has it will give you control over what you need

@wmarkow
Copy link
Owner Author

wmarkow commented Mar 9, 2018

@MauroMombelli, thank your your comment. Actually in my work I have used your pull request mixed with DanNixon's pull request. Maybe I have reinvented the wheel again a bit, but by the way I learned new things.
I will take a look at using micros(), as you mentioned.

By using TWI library directly you mean to use what is in *Wire\src\utility* directory (twi.c and twi.h)?

@MauroMombelli
Copy link

the way DanNixon uses the twi_guard() function is really elegant. The advantages of micros() would be to make the timeout idipendenet from the clock source; it may avoid future issues

and yes, that is the twi library, there you see how I2C work at register level, and is quite easy.

@wmarkow
Copy link
Owner Author

wmarkow commented Mar 14, 2018

I agree that using millis() or micros() will make the timeout independed from the clock source but I'm not so convinced that using micros() in interrupt service routine will help.

Calling millis() in ISR will mostly return always the same value.

What will happen if we use micros() in ISR? We need to take a look in the Arduino source code, where micros() will return something like:

return ((m << 8) + t) * (64 / clockCyclesPerMicrosecond());

m variable will mostly have the same value, it can be incremented by 1 in some specific case
t variable is derived from TCNT0 (this is a hardware register) and it varies from 0 to 255
clockCyclesPerMicrosecond() on Arduino Uno is 16.
When you call micros() in ISR the value will change but not so much, after some time (when t will overflow and starts counting from zero) it will start to return the same values.

We can even calculate that tiem by putting the values into the equation above:

 ((1 << 8) + 255) * (64 / 16)

which becomes

(256 + 255) * 4

which is 2044 us (2ms).

I have the feeling that using micros() in ISR may work (on Arduino Uno) at least for 2044us. Other words, when you put a timeout more that 2044 us then it will still hang in twi_stop() method.

@wmarkow
Copy link
Owner Author

wmarkow commented May 20, 2020

After two years I come back to the issue. In the meantime I have installed a new RF antenna so my RDA5870 chip gets a better reception and the issue is now harder to reproduce (I have tried to reproduce it with the same USB wall charger).

But I have found a very easy way to reproduce it:

  1. open the case so there is an easy acces to RDA5870 chip
  2. make sure that the device is powered up (power source is here not relevant, it may be usb wall charger or laptop)
  3. pull out the power voltage cable from the RDA5870 board
  4. check if the device reacts on the volume adjusting, if it still reacting put in the power voltage cable to RDA5870 board and go back to step 3
  5. when the device doesn't react on user input then the issue is reproduced, after 4-6 seconds the ArduinoCrashMonitor library will reset the board

@wmarkow
Copy link
Owner Author

wmarkow commented May 25, 2020

I did further tests with the latest fix from arduino/ArduinoCore-avr#107
I have tested the sources based on the 427d772 commit.

In my coe I'm checking for the TWI timeout flag by calling Wire.getWireTimeoutFlag() and reset I2C Display at that moment; the display blinks then, so I can see that the timeout occured.

When I:

  • do not call Wire.setWireTimeoutUs(uint32_t, boolean) at all then I can reproduce the freezing
  • call Wire.setWireTimeoutUs(1500ul, true) then the timeout occurs constantly and the LCD blinks all the time. I'm assuming that 1500us is to low for my I2C devices so at least one of them doesn't respond so fast. The device doesn't freeze so the fix for the issue works.
  • call Wire.setWireTimeoutUs(2000ul, true) then I do not have any spontanious timeouts reported. To get the timeout I need to perform my test routine described in my previous comment. The device doesn't freeze so the fix for the issue works.

In my solution I put a 10000us as a timeout value. The fix helps me to recover from the endless loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants