Skip to content

bcm2835-sdhost: wait at least 150us between read retries #1486

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

bcm2835-sdhost: wait at least 150us between read retries #1486

wants to merge 1 commit into from

Conversation

xdsopl
Copy link

@xdsopl xdsopl commented May 23, 2016

With the Compute Module I get timeout's while initializing the eMMC:

mmc0: command never completed.

And the system won't boot properly.
With this patch it boots every single time successfully and won't ever show the above error.
Also I don't see any performance regressions whatsoever.
150us seems to be a magic number, as wait times below doesn't always make the eMMC initialize successfully on first retry.
So use 150us to 200us as usleep range for between read retries.

Signed-off-by: Ahmet Inan [email protected]

With the Compute Module I get timeout's while initializing the eMMC:

mmc0: command never completed.

And the system won't boot properly.
With this patch it boots every single time successfully and won't ever show the above error.
Also I don't see any performance regressions whatsoever.
150us seems to be a magic number, as wait times below doesn't always make the eMMC initialize successfully on first retry.
So use 150us to 200us as usleep range for between read retries.

Signed-off-by: Ahmet Inan <[email protected]>
@pelwell
Copy link
Contributor

pelwell commented May 23, 2016

Please explain why increasing the polling interval improves booting on eMMC. I'm reluctant to apply a patch that seems to work without an understanding of how.

@xdsopl
Copy link
Author

xdsopl commented May 23, 2016

If we don't wait at least that 150 microseconds of time before, every single of the tons of bcm2835_sdhost_read retries we issue in that 100 milliseconds timeout window will fail.
I've got two Compute Modules here with the same problem and it only seems to affect some people.
BTW, u-boot seems to have the same issue with the eMMC on my CMs but didn't bother to look into its source as I won't need it anymore.
Besides, that longer delay only happens in the retry code path, so the main code path remains unaffected by that extra delay.

@pelwell
Copy link
Contributor

pelwell commented May 23, 2016

I think you've misunderstood that code slightly. That isn't a retry path, it's a slower poll path. The read you see isn't a read of the card it is a read of the register that indicates the status of the command.

I'm concerned that the reason this patch works is that it can delay the time between the command being reported as complete and the next operation by up to 150-200us. The reason I say "up to" is that if the command completes at the end of that 150-200us window then the added delay could be zero, which might lead to a failure. In addition, this added delay may only be required for one command. As an experiment, can you put the polling interval back to what it was, but add an explicit delay after the loop (or at the point of the break) and see if that also fixes the problem and what the minimum delay is.

@xdsopl
Copy link
Author

xdsopl commented May 23, 2016

Thanks, will investigate and come back to you :)

@xdsopl
Copy link
Author

xdsopl commented May 23, 2016

150us seems to be the absolute minimum, otherwise you will see an occasional "mmc0: command never completed."
Also putting the sleep below the loop doesn't help.
It has to be right after the break.
Should I prepare a new patch or investigate more?
About to go home, so we could continue tomorrow.

@pelwell
Copy link
Contributor

pelwell commented May 23, 2016

Tomorrow would be better for me.

@xdsopl
Copy link
Author

xdsopl commented May 24, 2016

Good morning,
I've added a counter to that loop to see how many iterations the commands need to succeed:

Before my patch:
8 to 55 iterations for all commands where most of them need around 50 iterations.

After my patch:
1 to 5 iterations, most of them needing 2 iterations.

Also discovered (maybe side effect of pr_err) that 150us is not enough, as I see one "mmc0: command never completed":

Increasing delay to 200-300us:
1 to 3 iterations, most of them needing just 1 iteration.

I still think that the extra delay before the read is the correct place, as we already have exhausted the cmd_quick_poll_retries in the quick poll loop and now we are doing things at a slower pace in the slow poll loop path and so no need to rush into things :)

Don't have the programming manual (just the datasheet) for this KLM4G1FEAC but my guess is that we are "disturbing" the command from finishing if we poll to fast - as putting the timeout even higher (2 Seconds!) doesn't help before my patch.

@pelwell
Copy link
Contributor

pelwell commented May 24, 2016

Don't have the programming manual (just the datasheet) for this KLM4G1FEAC but my guess is that we are "disturbing" the command from finishing if we poll to fast - as putting the timeout even higher (2 Seconds!) doesn't help before my patch.

The problem with this theory is that the read is only checking the status of the SDHOST interface to see if the command has completed. This doesn't trigger any activity on the SD bus - once the command is sent, all the interface can do is wait for the card.

@xdsopl
Copy link
Author

xdsopl commented May 24, 2016

Thanks, that was the info I was missing and now understand your concern.
To investigate further, I added this right to the top of bcm2835_sdhost_finish_command, which helps:
if (irq_flags) { spin_unlock_irqrestore(&host->lock, *irq_flags); usleep_range(150, 200); spin_lock_irqsave(&host->lock, *irq_flags); }

Anything below 150 gives "mmc0: command never completed"

@xdsopl
Copy link
Author

xdsopl commented May 24, 2016

FYI: That extra delay really only helps inside or anywhere above the loop, but at no point below it.

@xdsopl
Copy link
Author

xdsopl commented May 25, 2016

I made another patch and a new pull request:
#1492

This way it won't slow down the other command polls.

@xdsopl xdsopl closed this May 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants