Skip to content

feat: Make the IoPausedGuard a simple sender #4184

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 22, 2023

Conversation

flub
Copy link
Contributor

@flub flub commented Mar 20, 2023

This replaces the mechanism by which the IoPauseGuard makes sure the IO scheduler is resumed: it really is a drop guard now by sending a single message on drop.

This makes it not have to hold on to anything like the context so makes it a lot easier to use.

The trade-off is that a long-running task is spawned when the guard is created, this task needs to receive the message from the drop guard in order for the scheduler to resume.

#skip-changelog


Targets flub/backup-provider-pause-io to minimise rebasing. Should be merged independently after #4182.

@link2xt what do you think about this approach to async drop guard? It is a lot simpler to manage. I don't think it ever runs into issues of things not being alive when the world is dropped. The main thing is that it relies on a long-running task, I don't have any issues with this but my only worry is that phones might kill our executor in funny ways? Is it possible for the spawned task to disappear without the Context itself being destroyed? I guess not since we'd be running into all sorts of issues if our spawned tasks disappear.

flub added 3 commits March 20, 2023 12:01
This makes the BackupProvider automatically invoke pause-io while it
is needed.

It needed to make the guard independent from the Context lifetime to
make this work.  Which is a bit sad.
This replaces the mechanism by which the IoPauseGuard makes sure the
IO scheduler is resumed: it really is a drop guard now by sending a
single message on drop.

This makes it not have to hold on to anything like the context so
makes it a lot easier to use.

The trade-off is that a long-running task is spawned when the guard is
created, this task needs to receive the message from the drop guard in
order for the scheduler to resume.
@flub flub requested a review from link2xt March 20, 2023 12:08
src/scheduler.rs Outdated
///
/// Does nothing special, simply drops this guard.
pub(crate) fn resume(self) {
drop(self)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arguably we could remove this method. i couldn't make up my mind so feel free to have an opinion.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm +1 on removing this.

@link2xt
Copy link
Collaborator

link2xt commented Mar 20, 2023

Maybe not entirely related, but what happens if you have overlapping pause guards?

  1. Start the scheduler.
  2. Create pause guard A, which stops I/O and remembers that it needs to restart I/O.
  3. Create pause guard B, which stops I/O and will not restart I/O on drop.
  4. Drop guard B. We are no longer "paused", paused is false, but there is still a guard and some operation running which expects that guard A pauses the scheduler.
  5. Start I/O. It just starts and breaks operation A.

Shouldn't the guard also hold the lock on the scheduler place, so it cannot be started?

Base automatically changed from flub/backup-provider-pause-io to master March 20, 2023 18:57
@iequidoo
Copy link
Collaborator

iequidoo commented Mar 20, 2023

Btw, why do we have pause guards instead of "run guards" locking some mutex in the scheduler? So, when we start IO, we lock that mutex and monitor whether we're requested to stop IO and release the mutex. And then try to acquire it again if the program isn't requested to terminate / a user didn't issue the stop_io command?

@link2xt
Copy link
Collaborator

link2xt commented Mar 20, 2023

We don't even care about I/O start/stop, but a general write lock on the storage, the database plus blobdir. E.g. keeping IMAP connections is fine if we don't download messages, there is no need to close them. If we acquire a write lock when entering the "fetch messages" procedure and release it before going to idle, we don't need to stop I/O at all. Not sure what the situation on iOS is, but I expect that it is not required to close all sockets before going to background sleep.

So maybe with #4165 we don't even need to stop the I/O to take database snapshot, and if we don't modify or garbage-collect the blobs while running, we can even keep using the database while setting up a new device.

@flub
Copy link
Contributor Author

flub commented Mar 21, 2023

IIUC IOS does never stop IO and only starts IO whenever it is woken up, brought to the foreground or similar just to make sure things are really running.

@flub
Copy link
Contributor Author

flub commented Mar 21, 2023

Maybe not entirely related, but what happens if you have overlapping pause guards?

Good catch, I created a #4193 separately as it's not really what this PR tries to address.

@flub
Copy link
Contributor Author

flub commented Mar 21, 2023

I'd also like to use this PR as a model to improve the ongoing process API, which is simpler anyway as there can only be one.

So what do we think about the merits of this implementation? Is it an improvement? I think it is.

src/scheduler.rs Outdated
///
/// Does nothing special, simply drops this guard.
pub(crate) fn resume(self) {
drop(self)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm +1 on removing this.

}
let (tx, rx) = oneshot::channel();
tokio::spawn(async move {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, why can't it be spawned in drop()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.rs/tokio/latest/tokio/fn.spawn.html#panics

If you happen to drop this outside of the runtime, which you have effectively no control over, things start going wrong. This could potentially happen at shutdown time. By spawning here you have guaranteed correct semantics even when things are being torn down.

If you want to dive deep into this there's a super long essay exploring async drop that I still haven't read myself: https://sabrinajewson.org/blog/async-drop

Copy link
Collaborator

@iequidoo iequidoo Mar 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, another approach is to make a generic async function that:

  • Pauses IO
  • Awaits a future passed to it
  • Resumes IO
  • Returns the result of the awaited future

This way no async drop is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, callback based APIs are often more awkward to use though

@flub flub merged commit 616eabc into master Mar 22, 2023
@flub flub deleted the flub/pause-io-drop-guard-simple-sender branch March 22, 2023 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants