Skip to content

gh-90155: Fix broken asyncio.Semaphore and strengthen FIFO guarantee. #93222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Sep 22, 2022

Conversation

cykerway
Copy link
Contributor

gh-90155: Fix broken :class:asyncio.Semaphore and strengthen FIFO guarantee.

Current asyncio.Semaphore may become broken on certain workflow. Tasks waiting on a broken Semaphore can hang forever. This PR not only fixes this problem but also strengthens the FIFO guarantee on Semaphore waiters. Test cases show details.

@cykerway cykerway requested review from 1st1 and asvetlov as code owners May 25, 2022 16:34
@cykerway cykerway changed the title Fix broken :class:asyncio.Semaphore and strengthen FIFO guarantee. [3.12] GH-90155 Fix broken :class:asyncio.Semaphore and strengthen FIFO guarantee. May 25, 2022
@cykerway cykerway changed the title [3.12] GH-90155 Fix broken :class:asyncio.Semaphore and strengthen FIFO guarantee. [3.12] GH-90155: Fix broken :class:asyncio.Semaphore and strengthen FIFO guarantee. May 25, 2022
@cykerway cykerway changed the title [3.12] GH-90155: Fix broken :class:asyncio.Semaphore and strengthen FIFO guarantee. gh-90155: Fix broken asyncio.Semaphore and strengthen FIFO guarantee. May 26, 2022
@mguentner
Copy link

@cykerway Thank you for your work on this. I can confirm that this is indeed an issue in the wild.

I spend quite some time to debug an application that was blocking in rare cases after seeing cancelled tasks. Finally I checked the internal state of the Semaphore and saw that _wakeup_scheduled was True with no _waiters left to set it to False which blocks it forever.

Before I found this PR, I came up with another solution which I have documented here together with a program that reproduces the race condition (inspired by the tests added by this PR).

Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not going to lie, I don't understand this code well enough to approve it. :-(

I wonder where we went wrong in asyncio's design that the simple version (which has long been fixed and fixed again) didn't work. :-(

@cykerway
Copy link
Contributor Author

I wonder where we went wrong in asyncio's design that the simple version (which has long been fixed and fixed again) didn't work. :-(

(If I remembered correctly)

  1. The original implementation before asyncio.Semaphore waiters deque doesn't work #90155 was working but just not fifo. But the doc doesn't say semaphores are fair or fifo. People who don't need fifo semaphores can use the original one.

  2. Someone was unsatisfied about task starvation and opened asyncio.Semaphore waiters deque doesn't work #90155. Changes introduced by it (such as 9d59381) were aimed to bring fairness to semaphores, but the implementation was flawed and introduced regression that makes the semaphore unusable in certain race conditions. This is what I showed in asyncio.Semaphore waiters deque doesn't work #90155 (comment)

  3. Those patches have been officially merged into 3.10 and other versions. So semaphores are currently broken. I mean the python installed by system package managers, not nightly build here. This includes current version 3.10.7.

  4. This PR is meant to be a hotfix to this problem. It doesn't mean to address fifo or non-fifo problem but tries to bring back a usable semaphore with minimal change on the existing implementation. Another way is to just revert everything to the very beginning, but I think the advantage of taking this PR than reverting everything is to prevent task starvation. And in either case the tests added by this PR are still useful in preventing future regressions. You are welcome to add more tests to ensure this PR itself doesn't bring more regressions.

Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve of the new solution. I have some nits about the tests, not too sure about those so push back if you think I misunderstand how it works!

Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've cracked the sleep(0.01) mystery.

`Semaphore` docstring says the counter can never go below zero.
Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several more sleep(0.01) cases that should be sleep(0), and I had a suggestion for the wait_for(..., timeout=0.01) too.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@kumaraditya303
Copy link
Contributor

FYI 3.9 is now is in security fixes only so most likely this cannot be backport to 3.9.

@cykerway
Copy link
Contributor Author

Do you think that version would pass all the tests you added?

That version can pass all the tests so far, and may look better if you have the "obsession over minimizing the number of sleep(0) calls needed to get progress". But that version was made without knowledge about the loop internals. I didn't find a spec anywhere. For example, the documentation doesn't not say what exactly happens when you cancel a task who is waiting on a future. And I don't know what could be dependable in that case. The new version is cleaner. Waking up waiters one by one is the safer way to go. Should you really want to wake up more and process them in batch, there better be a spec on async loops so we know what we (and users) can rely on.

@gvanrossum
Copy link
Member

Okay, we'll go with this version.

@gvanrossum gvanrossum merged commit 24e0379 into python:main Sep 22, 2022
@miss-islington
Copy link
Contributor

Thanks @cykerway for the PR, and @gvanrossum for merging it 🌮🎉.. I'm working now to backport this PR to: 3.9, 3.10, 3.11.
🐍🍒⛏🤖

@bedevere-bot
Copy link

GH-97019 is a backport of this pull request to the 3.11 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.10 only security fixes label Sep 22, 2022
@bedevere-bot
Copy link

GH-97020 is a backport of this pull request to the 3.10 branch.

@miss-islington
Copy link
Contributor

Sorry, @cykerway and @gvanrossum, I could not cleanly backport this to 3.9 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker 24e03796248ab8c7f62d715c28156abe2f1c0d20 3.9

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Sep 22, 2022
…antee (pythonGH-93222)

The main problem was that an unluckily timed task cancellation could cause
the semaphore to be stuck. There were also doubts about strict FIFO ordering
of tasks allowed to pass.

The Semaphore implementation was rewritten to be more similar to Lock.
Many tests for edge cases (including cancellation) were added.
(cherry picked from commit 24e0379)

Co-authored-by: Cyker Way <[email protected]>
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Sep 22, 2022
…antee (pythonGH-93222)

The main problem was that an unluckily timed task cancellation could cause
the semaphore to be stuck. There were also doubts about strict FIFO ordering
of tasks allowed to pass.

The Semaphore implementation was rewritten to be more similar to Lock.
Many tests for edge cases (including cancellation) were added.
(cherry picked from commit 24e0379)

Co-authored-by: Cyker Way <[email protected]>
@gvanrossum
Copy link
Member

@cykerway Do you want to do the 3.9 backport by hand? We can also leave it be, 3.9 is in security-fix mode anyways.

miss-islington added a commit that referenced this pull request Sep 22, 2022
…H-93222)

The main problem was that an unluckily timed task cancellation could cause
the semaphore to be stuck. There were also doubts about strict FIFO ordering
of tasks allowed to pass.

The Semaphore implementation was rewritten to be more similar to Lock.
Many tests for edge cases (including cancellation) were added.
(cherry picked from commit 24e0379)

Co-authored-by: Cyker Way <[email protected]>
miss-islington added a commit that referenced this pull request Sep 22, 2022
…H-93222)

The main problem was that an unluckily timed task cancellation could cause
the semaphore to be stuck. There were also doubts about strict FIFO ordering
of tasks allowed to pass.

The Semaphore implementation was rewritten to be more similar to Lock.
Many tests for edge cases (including cancellation) were added.
(cherry picked from commit 24e0379)

Co-authored-by: Cyker Way <[email protected]>
@cykerway
Copy link
Contributor Author

Do you want to do the 3.9 backport by hand? We can also leave it be, 3.9 is in security-fix mode anyways.

No idea about that. I'm not quite familiar with the project management. I don't even know what the backport conflict is. Perhaps @kumaraditya303 is better on that topic.

@kumaraditya303
Copy link
Contributor

3.9 is in security fixes only now, so better just leave it as it is, no need to backport.

@gvanrossum
Copy link
Member

That's cool.

@gvanrossum gvanrossum removed the needs backport to 3.9 only security fixes label Sep 22, 2022
@gvanrossum
Copy link
Member

gvanrossum commented Sep 25, 2022

Okay, so after merging this I still couldn't stop thinking about it, and I came up with a scenario where this is substantially worse (orders of magnitude) than before.

The test program spawns oodles of trivial tasks and tries to rate-limit them by making each task acquire a semaphore first. The semaphore allows 50 tasks at a time. With python 3.11rc2, on my Mac it does nearly 25,000 iterations per second. With the new code it does about 900, or about 27 times slower.

The first 50 tasks take the easy path through acquire(), every following task gets put in the queue first. What makes it so slow is that once 50 tasks have acquired the semaphore in a single event loop iteration, they will also release it all in a single iteration -- but the new algorithm only wakes up the first waiting task, and the next 49 release() calls do nothing to the queue. So from then on we only wake up one task per iteration.

Of course, it's easy to complain when the broken code is fast. :-) But I think we can do better.

(UPDATE: The same code without using a semaphore can do over 200,000 loops/sec. Same if I keep the semaphore but set the initial value to 50,000. If I also remove the sleep(0) from the task it can do over 400,000 loops/sec. A loop that doesn't create tasks but calls sleep(0) and bumps the counter runs around 50,000.)

@cykerway
Copy link
Contributor Author

cykerway commented Sep 25, 2022

Looks like the slowdown was caused by the context switch between tasks. I'm not familiar enough with the loop to estimate how much overhead there is when control is passed to loop then passed back. The new version implements fifo order without depending on loop guarantees and this is safer. It sacrifices performance because it is not aided by the loop.

There are several things here: correctness, performance, fifo. We definitely want correctness, and there is a tradeoff between performance and fifo. If you try my first version it runs almost as fast as the old one before this commit and it doesn't not seem to have starvation problem but it's not obviously fifo. To tackle this tradeoff there should be some explicit guarantee (about order of execution of tasks waiting on futures when those futures are awaken) from the loop yet I haven't seen it.

@gvanrossum
Copy link
Member

From reading the code I know that the event loop absolutely calls callbacks that were scheduled using call_soon() in the order in which they were registered. I know that uvloop also uses this order. I feel that it's silly not to rely on this, since it gives us an important tool to fix the slowdown here. We should probably add a comment that states the dependency on rhis loop property.

@kumaraditya303
Copy link
Contributor

Can you create a new issue to discuss this? Your gist program looks like the worst case scenario for this as it does no io and just yields. Also what is the throughput impact when you do some real work in function?

@gvanrossum
Copy link
Member

gvanrossum commented Sep 25, 2022

DO NOT FOLLOW UP HERE; DISCUSS AT #97545

Actually the behavior is guaranteed. Under Scheduling Callbacks I read

Callbacks are called in the order in which they are registered.

So we can definitely rely on this. (Sorry I hadn't mentioned this earlier, I wasn't actually aware of the guarantee, just of how the code works.)

Something that doesn't affect correctness but may affect performance is that the event loop goes through a cycle:

  • wait for I/O events (*)
  • register callbacks for I/O events
  • call all callbacks that are registered and ready at this point (I/O and otherwise)
  • go back to top

(*) The timeout for the I/O wait is zero if there are callbacks that are immediately ready, otherwise the time until the first callback scheduled for a particular time in the future (call_later()).

The I/O wait has expensive fixed overhead, so we want to call as many callbacks in a single iteration as possible.

Therefore I think a it behooves us to make all futures ready for which we have place. I think it can be like this abstract algorithm:

  • Definitions:
    • Level: L = self._value
    • Waiters: W = self._waiters
    • Ready: R = [w for w in W if w.done() and not w.cancelled()]
    • Cancelled: C = [w for w in W if w.cancelled()]
    • Blocked: B = [w for w in W if not w.done()]
    • Note that R, C and B are views on W, not separate data structures
  • Invariant that should hold at all times:
    • L >= |R|
      (I.e., we should not promise more guests to seat than we have open tables)
  • Operations:
    • Equalize: while |B| > 0 and L >= |R|: make the first item of B ready (move it to R)
    • Release: L++; Equalize
    • Acquire:
      • if L > 0 and |R| == 0 and |B| == 0: L--; return
      • create a future F, append it to B, await it
      • when awaken (with or without exception):
        • assertion: F should be in either R or C
        • remove F from W (hence from R or C)
        • if no exception caught: L--; Equalize; return
        • if CancelledException caught and F.cancelled(): return
        • if CancelledException caught and not F.cancelled(): Equalize; return
        • (other exceptions are not expected and will bubble out unhandled)

DO NOT FOLLOW UP HERE; DISCUSS AT #97545

@gvanrossum gvanrossum mentioned this pull request Sep 25, 2022
@python python locked as resolved and limited conversation to collaborators Sep 25, 2022
@gvanrossum
Copy link
Member

(It's not really resolved, but I needed to give a reason and that was less wrong than the other options GitHub gave me.)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants