You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a catch-all thread consolidating analysis of the issue that arises when:
Multiple devices with multiple high-speed bulk IN endpoints are attached to the Pi
Many of the endpoints are actively being polled (have URBs outstanding)
Many of the endpoints return no data for extended periods of time (10s of milliseconds or more)
In this set of conditions, the host channels in the OTG core will indefinitely poll endpoints without ever interrupting the host until one of three conditions are met:
The maximum data transfer length has been received
A short packet is received
A bus error occurred
For endpoints that have no data available, this implies that the host channel is in a stable, busy state for a long time.
As interrupts on transfer completion or error states are required in order to release host channels to service other transfers, the maximum throughput on USB drops and the endpoint servicing interval for all non-periodic endpoints increases dramatically. In extreme cases, this ends up overflowing internal device data FIFOs.
If we can somehow break this behaviour intentionally, we can then implement (yet another) software workaround that at least results in somewhat timely servicing of multiple endpoints.
We need to get two things to happen -
a) The host channel needs to raise an interrupt the first time it goes round the polling loop
b) We need to not throw data on the floor if the first response the host channel gets is a data packet.
The only way we can stop a channel in DMA mode is to set the channel disable bit and hope for the best. If we've received our interrupt that says condition a) is fulfilled (i.e. NAK), then disable the channel, the core could be in the process of sending another IN token to the same endpoint. If the endpoint then returns data, we could potentially truncate the packet and/or erroneously signal ACK to the endpoint.
I think we can satisfy the conditions (in the FIQ) if we do some deliberately bad things:
Unmask the NAK interrupt in addition to channel halted interrupt
Set the data toggle PID to be incorrect on purpose, so an ACK will not be issued
For the first interrupt we get, if it was a NAK then disable the channel
If we got a valid packet on the first interrupt, we will have thrown it away so restart the channel with correct toggle
Then we collapse into two state flows:
Waiting for channel disable:
Next interrupt could occur after the channel has performed another transfer
By setting data toggle incorrectly we will still not issue an ACK in the case of a valid data packet
If it was a valid packet, restart the channel with correct toggle
If not, poke the IRQ to release the channel for other transfers
Waiting for normal completion
Usual flow resumes - pass the resulting interrupt state to the IRQ
This is a pretty sledge-hammer (and untested) approach to fix a rather obscure edge case, but there's no guarantee that there won't in future be some wonder device that uses the maximum number of bulk endpoints.
The alternative would be to get slave mode working, but the resulting storm of interrupts would likely make this a very suboptimal option.
The text was updated successfully, but these errors were encountered:
Wontfix. There is no sensible way to nobble the bulk IN state machine of a dwc2 channel while no data is being returned from the device.
If multiple devices are connected where they have bulk IN endpoints that only infrequently return data, then the user should ensure that fewer than 6 endpoints are polled at any one time.
Uh oh!
There was an error while loading. Please reload this page.
This is a catch-all thread consolidating analysis of the issue that arises when:
In this set of conditions, the host channels in the OTG core will indefinitely poll endpoints without ever interrupting the host until one of three conditions are met:
For endpoints that have no data available, this implies that the host channel is in a stable, busy state for a long time.
As interrupts on transfer completion or error states are required in order to release host channels to service other transfers, the maximum throughput on USB drops and the endpoint servicing interval for all non-periodic endpoints increases dramatically. In extreme cases, this ends up overflowing internal device data FIFOs.
c.f:
raspberrypi/firmware#582
#1692
#2023
If we can somehow break this behaviour intentionally, we can then implement (yet another) software workaround that at least results in somewhat timely servicing of multiple endpoints.
We need to get two things to happen -
a) The host channel needs to raise an interrupt the first time it goes round the polling loop
b) We need to not throw data on the floor if the first response the host channel gets is a data packet.
The only way we can stop a channel in DMA mode is to set the channel disable bit and hope for the best. If we've received our interrupt that says condition a) is fulfilled (i.e. NAK), then disable the channel, the core could be in the process of sending another IN token to the same endpoint. If the endpoint then returns data, we could potentially truncate the packet and/or erroneously signal ACK to the endpoint.
I think we can satisfy the conditions (in the FIQ) if we do some deliberately bad things:
Then we collapse into two state flows:
This is a pretty sledge-hammer (and untested) approach to fix a rather obscure edge case, but there's no guarantee that there won't in future be some wonder device that uses the maximum number of bulk endpoints.
The alternative would be to get slave mode working, but the resulting storm of interrupts would likely make this a very suboptimal option.
The text was updated successfully, but these errors were encountered: