You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was going though a libp2p tutorial and found that I couldn't reliably run libp2p-lookup against libp2p-relay_v2. The libp2p-lookup would work the first time I ran libp2p-relay_v2 after a reboot. But libp2p-lookup would fail if I stopped and restarted libp2p-relay_v2. Here is a discussion on libp2p repo. Initially I thought it was a performance problem as debug builds could fail but release builds worked. Also, I didn't see the problem on my desktop machine only on a very small Digital Ocean VM with 1CPU and 1GB ram. Also, recently I tested a tokio build of libp2p-relay_v2 and it worked fine.
By using RUST_LOG=trace the "timeout" symptom would "always" fail and I started adding copious amounts of logging trying to narrow down the problem. To shorten the story, the problem appears to be that async-io can sometimes miss waker.wake events and is resolved by making a one line change to async-io in Source::poll_ready.
My analysis of the problem is that using the current Reactor::ticker value in ticks is not always correct. It is my understanding that when placing a ticker value into Direction::ticks there is a guarantee that the waker.wake call has completed and the associated task has been run. The problem is the increment of the ticker value is in ReactorLock::react is typically running on a different CPU thread than the Source::poll_ready (called by poll_readable, poll_writeable), which is saving the ticker value in ticks as show in the above code snippet.
Because the increment and use of ticker value are on different threads the required guarantee can not be full filled and I'm proposing the above fix which only uses the a current state[dir].tick and not the ticker value. Another possible solution is to set state[dir].ticks = None but the proposed solution is a "smaller" change so I've chosen it.
Of course I fully expect async-io experts may provide other solutions or even identify that problem as something else all together. But, at the moment, this solution does resolve the problem.
The text was updated successfully, but these errors were encountered:
winksaville
added a commit
to winksaville/async-io
that referenced
this issue
May 6, 2022
My analysis of the problem is that using the current `Reactor::ticker`
value in `ticks` is not always correct. It is my understanding that when
placing a ticker value into `Direction::ticks` there is a guarantee that
the waker.wake call has completed and the associated task has been run.
The problem is the increment of the ticker value is in `ReactorLock::react`
is typically running on a different CPU thread than the
`Source::poll_ready` (called by `poll_readable`, `poll_writeable`), which
is saving the ticker value in ticks:
state[dir].ticks = Some((Reactor::get().ticker(), state[dir].tick));
Because the increment and use of ticker value are on different threads
the required guarantee can not be full filled and I'm proposing the
following fix in this PR, which only uses the a current `state[dir].tick`
and not the ticker value:
state[dir].ticks = Some((state[dir].tick, 0)
fixsmol-rs#78
I was going though a libp2p tutorial and found that I couldn't reliably run libp2p-lookup against libp2p-relay_v2. The libp2p-lookup would work the first time I ran libp2p-relay_v2 after a reboot. But libp2p-lookup would fail if I stopped and restarted libp2p-relay_v2. Here is a discussion on libp2p repo. Initially I thought it was a performance problem as debug builds could fail but release builds worked. Also, I didn't see the problem on my desktop machine only on a very small Digital Ocean VM with 1CPU and 1GB ram. Also, recently I tested a tokio build of libp2p-relay_v2 and it worked fine.
By using
RUST_LOG=trace
the "timeout" symptom would "always" fail and I started adding copious amounts of logging trying to narrow down the problem. To shorten the story, the problem appears to be that async-io can sometimes misswaker.wake
events and is resolved by making a one line change to async-io in Source::poll_ready.I change:
to:
My analysis of the problem is that using the current
Reactor::ticker
value inticks
is not always correct. It is my understanding that when placing a ticker value intoDirection::ticks
there is a guarantee that the waker.wake call has completed and the associated task has been run. The problem is the increment of the ticker value is inReactorLock::react
is typically running on a different CPU thread than theSource::poll_ready
(called bypoll_readable
,poll_writeable
), which is saving the ticker value in ticks as show in the above code snippet.Because the increment and use of ticker value are on different threads the required guarantee can not be full filled and I'm proposing the above fix which only uses the a current
state[dir].tick
and not the ticker value. Another possible solution is to setstate[dir].ticks = None
but the proposed solution is a "smaller" change so I've chosen it.Of course I fully expect async-io experts may provide other solutions or even identify that problem as something else all together. But, at the moment, this solution does resolve the problem.
The text was updated successfully, but these errors were encountered: