-
Notifications
You must be signed in to change notification settings - Fork 5.2k
mmc0: timeout waiting for hardware interrupt. #3446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The trace shows two separate 10 second timeouts waiting to write a group of sectors. It's impossible to know what the cause was - whether the card failed, or whether the DMA operation somehow went wrong. What you can do is attempt to rule out DMA by disabling it and seeing if the problem recurs. Try adding |
@pelwell Thanks for the insight! Rereading the card again, these speed drops become less (almost gone). Assuming intelligence of the microSD (evo+, 64 GB), maybe that means it reallocated sectors? Is such behaviour known? |
SD cards are meant to have a number of spare sectors to allow faulty ones to be mapped out, so some kind of recovery might be possible. If we don't hear from you in a month or so we'll close the issue. |
@pelwell Thanks, then I presume this happened. I will replace the card in any case in my production system and then perform tests on the potentially broken one, so if the issue reoccurs within the next 30 days in the same system with a new card, we know it should be software. |
Discards aren't propagated to the card. They get translated into block erasures for the unallocated holes in the filesystem. Regularly running fstrim may have unintended side effects - by forcing the card to erase sectors you're likely messing with the wear-leveling/hot-cold page algorithms that the card is using. The card's internal erase block sizes are huge compared to the size of a 4k filesystem sector, so there's a large amplification of the number of flash cells that get forcibly erased when trimming small discontiguous ranges. |
@P33M Sorry for asking back on the issue tracker, will try to keep it short (feel free to direct this discussion to a forum or another place you are active in)...
Thanks in advance! |
Flash erase blocks generally have 3 states - erased, partially written and fully written.
The SD card's flash translation layer manages which blocks get written to via a logical-to-physical mapping and manages the reclaim of "full" blocks back to "erased" blocks by doing copy-on-write. Doing a block erase will likely force the card into doing copy-on-write for the allocated flash pages if a there are remapped sectors in there. It also counts towards the total lifetime erasure count for the underlying flash. In older versions of the SD spec there is no such thing as a a) a background operation or b) a discard operation - cards are either busy doing stuff as a result of a host-initiated command or they are idle[0], because the interface is designed to be hot-swappable. This means that all flash maintenance operations happen during a read or write command. [0] Recent versions of the eMMC and SD spec introduce apps-class performance categories and support for background maintenance tasks - but these are intended to be implemented on hosts where the card is captive (e.g. inside a phone). |
@P33M Very enlightening indeed! This would mean for an almost-full card (from the point of view of the card, i.e. after filling it with data once, never doing "erase"), data is effectively CoWed to less-used blocks when writing and there is no real gain from having many "erased blocks" available. Only now, I understand the heaviness of the write amplification of this model. That might be one of the many reasons for UFS and other standards coming up. Thanks for enlightening me! |
Happy to be closed? |
@JamesH65 Yes, it did not happen again at least up to now, and I learnt a lot from this thread 😄. |
Describe the bug
Randomly, I encounter this error message following by a lot of debug traces. Just now, this corrupted the FS.
To reproduce
It happens randomly, not even bound to high load.
System
Logs
mmcerr.txt
Additional context
The SD card is 3 years old now, it could be that the card is just starting to fail (or the 3 year old power supply, but nothing concerning power supply shows up in
dmesg
).I hope the logs are more conclusive to an expert to identify whether this is a HW failure or a kernel bug.
The text was updated successfully, but these errors were encountered: