-
-
Notifications
You must be signed in to change notification settings - Fork 32k
Document whether asyncio.wait_for(q.get(), timeout)
is safe of race-conditions.
#92824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
FWIW as #90927 is implemented, @gvanrossum What do you think about deprecating |
@kumaraditya303 -- Thank you very much for contributing to this discussion. If you want, we can formulate my documentation request more generally and write: The This would cover any cancellation-based timeout implementation. Of course, it's important that this promise is actually kept. In fact, I feel quite uneasy about using exception-based mechanisms as the only way to achieve a timeout at an elementary operation such as |
I think it's way too early to think about that. I would think there's a lot of code using What we could do would be to add something to the docs for I haven't quite grasped the key topic of this issue yet, will respond about that later. |
Oops, didn't mean to close. (Seems I discovered a new GitHub keyboard shortcut. :-) |
@gvanrossum -- Let me explain the reason why I opened this ticket: Confidently writing correct code. With the current documentation of the existing functionality, I cannot be confident that my code does what I want -- and it is very difficult to work around that. Let me explain with an example:
Let's ignore for the time being the timeout context. Without it, it is quite easy to keep some promises about the outcome of the code:
Now, the timeout manager adds another promise (the program will complete at evening) but it breaks the above promises that are important to me. This means: My code is not cancellation-safe. More than that: even if my code would be cancellation-safe, I would also depend on whether the function Currently, it is difficult to find documentation about how to write cancellation-safe code with python3. What tools are available for this goal? What primitives are cancellation-safe and what primitives are not? What are the precise promises that I can rely on when a cancellation-safe library function is cancelled? Maybe there is documentation about these questions and I just haven't found it. While the method As of the above algorithm, the easiest solution is to limit the timeout scope just to one call: I know that implementing timeouts consistently is very complex and messy: Timeout simply do not compose cleanly. Adding an optional However, it's essential to document clearly what can be relied upon and what cannot. Without such documentation, the available features cannot be used confidently. |
Hi @Yaakov-Belch, Knowing what can be relied upon to be cancellation-safe is really hard, and I think it's usually better if users write code in a defensive way. Making promises about this is even harder, since there's no way to reliably test such assurances (there just are too many points where code might be cancelled, as you've demonstrated in your own 7-line code snippet). Queue data types are especially hard to get right -- for example, the non-async I do know one thing (hopefully you already figured this out). The All in all I am not inclined to add such guarantees to the documentation. If you're really worried about losing a queue item occasionally you should probably use a persistent queue implementation. |
Hi @gvanrossum , I fully understand your reasoning -- the costs (developer attention and time) involved in promising more -- and keeping such promises. Would it be fair to at least put this into the documentation? Due to possible race conditions, Such a statement (at the same place where the use of generic timeout primitives is recommended) would have saved me hours of work: Instead of searching for bug reports and hints what's implemented or not -- I would have gone straight to implementing a workaround for my concrete use case. |
We would have to add that warning to every single API in asyncio. I don't think that's reasonable. Like all aspects of Python, it's open source and you use it at your own risk (read the license). It is looking more and more like the only point of this issue is for you to have something or someone to blame for the two hours you wasted. If you knew how much time I wasted debugging trivial bugs in my own code you would understand that I don't think that's a lot of wasted time. So I am closing this issue. |
Thanks for the clarification. I was not actually suggesting to add a warning to every API call in asyncio. I would like to see that warning at the one place where the documentation suggests the usage of generic cancellation-based timeout primitives. I very much respect your giant work creating python -- and the huge contribution of the community. If it appears that I intend to blame someone for something then this is certainly my mistake in expressing myself and not my actual intention. |
Let me clarify what my actual intention was -- and resolve this issue in a way that's good for me: I am developing a library to be intended to be used by my clients in production. "Normal python bugs" are extremely rare and acceptable. But possible race conditions are not. At some point I need an asynchronous As explained above: This depends on whether My search revealed that:
This answers my question: To be certain, I should implement a separate queue with the features and promises that I need. Here is the starting point of a
The call This implementation promises:
Summary: Any programmer who finds himself with the same question that I had: "Shall I use the recommended features of the standard library or shall I write my own queue implementation?" can find this closed documentation request and can assess the costs and risks involved with both sides of the choice. He can use my code as a starting point for his own
Even though this request is now closed as 'not planned' it achieves my goal of clearly documenting of whether cancellation-safety is a promised feature of the standard library. I want to thank @gvanrossum and @kumaraditya303 for their time and feedback. |
@gvanrossum I don't think I understand this answer. Haven't we established that the generic timeout primitive
If I understand you right, "defensive way" here means that users should defend themselves against the possibility that |
No, my understanding is that the last time this construct was reported to lose items was in 2015, and both issues look closed to me, so no, I don't believe it has been established that this construct can still lose items. I invite you to look at the source code for asyncio/queues.py and asyncio/timeouts.py and demonstrate a race condition. (Note that the implementation of |
Documentation
Please add the following text (or some equivalent) to the documentation of
asyncio.Queue
:The construct
asyncio.wait_for(queue.get(), timeout)
is guaranteed to be free of race conditions. It will not lose queue items.Explanation: According to the public documentation, this construct is allowed to lose queue items: The documentation states that
wait_for
uses cancellation which is implemented by raising aCancelledError
exception. Depending on the implementation, this may happen in the callqueue.get()
after an item has been removed from the queue. This would cause the extracted queue item to be lost.According to the current documentation, this race condition is allowed: Such a race condition would be surprising for most programmers --- but it would not violate the letter of the documentation. Hence, judging only the documented features, a responsible programmer will hesitate to use this construct in production settings where losing queue items is not acceptable.
This issue has been reported and addressed at least twice:
Based on the comments in these two bug reports it seems to me that the current implementation avoids this race condition. However, this is not clearly spelled out as a commitment for future versions.
Conclusion
At the current state of affairs, a responsible programmer will find no assurance in the documentation of whether the construct
asyncio.wait_for(queue.get(), timeout)
can safely be used in production environments. He will need to spend much time (I spend more than two hours) to search for bug reports such as listed above and still be left with doubts about whether he can safely use the simple construct or still needs to implement a safe queue-with-timeout-in-get.If the python implementation promises to avoid this race condition --- please state it clearly in the documentation.
The text was updated successfully, but these errors were encountered: