-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[Coroutines] Support @llvm.coro.await_suspend
actually
#64945
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@llvm/issue-subscribers-coroutines |
Sorry, maybe i dont fully understand problem described in #56301 , but it seems common problem is that all inlined functions locals for some reason becames part of coroutine frame, while they are not. If it is problem may we somehow mark coroutine-used locals/args/unnamed temp values(including awaiters) at frontend and then on middle end they will use coroutine frame as storage and others just use stack? |
… regressions The fix we sent for llvm/llvm-project#56301 may bring performance regressions. But we didn't mention it in the ReleaseNotes so that users may get confused. e.g, llvm/llvm-project#64933. So this patch mentions the possible side effect and the potential solutions in llvm/llvm-project#64945 to avoid misunderstandings.
It may not be easy to explain in several words. But generally it is bad/annoying/inefficient to do analysis jobs in the frontend. And I know (and all the reviewers and commentors know) the method will bring some bad codes. But the correctness is better than the performance. BTW, although we had a long term solution plan to fix the issue, I would expect that there are still some performance regressions if your await_suspend is super complex. |
Oh, wait. Maybe it is a idea to not do anything if the await_suspend function is marked always_suspend by the user explicitly. I'll do this. |
I was talking about this: For example we have code: my_coroutine c() {
foo(...);
co_await <expr>;
bar(args...);
} Here compiler must be sure, that no matter what 'foo' and 'bar' do and return, coroutine frame must never store anything which is not used in <expr> But as i understand after inlining 'foo' and 'bar' clang do not sees difference between local coroutine variable and something created as local in inlined function. I saw this on clang 14/15 when repeating function call was increasing coroutine frame without real reasion
And problem with #56301 coro handle returned from 'Register' and stored as local variable in 'final_suspend' stored on coroutine frame. If clang 'splits' coroutine only on clang-IR stage, may be introduce tags like 'call in the coroutine' which after inline will be like: inlined_call_start:
/* something */
inlined_call_end: |
… regressions The fix we sent for llvm/llvm-project#56301 may bring performance regressions. But we didn't mention it in the ReleaseNotes so that users may get confused. e.g, llvm/llvm-project#64933. So this patch mentions the possible side effect and the potential solutions in llvm/llvm-project#64945 to avoid misunderstandings.
…ecified as always_inline already Address #64933 and partially #64945. After c467245, we will add a noinline attribute to the await_suspend member function of an awaiter if the awaiter has any non static member functions. Obviously, this decision will bring some performance regressions. And people may complain about this while the long term solution may not be available soon. In such cases, it is better to provide a solution for the users who met the regression surprisingly. Also it is natural to not prevent the inlining if the function is marked as always_inline by the users already.
@kelbon BTW, I don't know why I can't receive the mail from your comment. And your comment about inlining tags is relatively implemented as lifetime intrinsics in LLVM IR... But things in the middle end transformation are pretty complex to handle and explain... the variables in the frontend may get moved and merged or optimized away. And the many optimization are not teached about the idea of coroutines well. Then this is the root problem and we can't a lot things for that at once. |
Close #65054 The direct issue is still the call to coroutine_handle<>::address() after await_suspend(). Without optimizations, the current logic will put the temporary result of await_suspend() to the coroutine frame since the middle end feel the temporary is escaped from coroutine_handle<>::address. To fix this fundamentally, we should wrap the whole logic about await-suspend into a standalone function. See #64945 And as a short-term workaround, we probably can mark coroutine_handle<>::address() as always-inline so that the temporary result may not be thought to be escaped then it won't be put on the coroutine frame. Although it looks dirty, it is probably do-able since the compiler are allowed to do special tricks to standard library components.
Hello! Even though I'm new to LLVM, I'd like to work on this issue. I already have a draft implementation, that seems to work. It would be great if somebody was willing to review it and provide guidance so it could be completed. |
Thanks! I'll try to look into it later this week. |
Implement `llvm.coro.await.suspend` intrinsics, to deal with performance regression after prohibiting `.await_suspend` inlining, as suggested in #64945. Actually, there are three new intrinsics, which directly correspond to each of three forms of `await_suspend`: ``` void llvm.coro.await.suspend.void(ptr %awaiter, ptr %frame, ptr @wrapperFunction) i1 llvm.coro.await.suspend.bool(ptr %awaiter, ptr %frame, ptr @wrapperFunction) ptr llvm.coro.await.suspend.handle(ptr %awaiter, ptr %frame, ptr @wrapperFunction) ``` There are three different versions instead of one, because in `bool` case it's result is used for resuming via a branch, and in `coroutine_handle` case exceptions from `await_suspend` are handled in the coroutine, and exceptions from the subsequent `.resume()` are propagated to the caller. Await-suspend block is simplified down to intrinsic calls only, for example for symmetric transfer: ``` %id = call token @llvm.coro.save(ptr null) %handle = call ptr @llvm.coro.await.suspend.handle(ptr %awaiter, ptr %frame, ptr @wrapperFunction) call void @llvm.coro.resume(%handle) %result = call i8 @llvm.coro.suspend(token %id, i1 false) switch i8 %result, ... ``` All await-suspend logic is moved out into a wrapper function, generated for each suspension point. The signature of the function is `<type> wrapperFunction(ptr %awaiter, ptr %frame)` where `<type>` is one of `void` `i1` or `ptr`, depending on the return type of `await_suspend`. Intrinsic calls are lowered during `CoroSplit` pass, right after the split. Because I'm new to LLVM, I'm not sure if the helper function generation, calls to them and lowering are implemented in the right way, especially with regard to various metadata and attributes, i. e. for TBAA. All things that seemed questionable are marked with `FIXME` comments. There is another detail: in case of symmetric transfer raw pointer to the frame of coroutine, that should be resumed, is returned from the helper function and a direct call to `@llvm.coro.resume` is generated. C++ standard demands, that `.resume()` method is evaluated. Not sure how important is this, because code has been generated in the same way before, sans helper function.
This should be implemented by f786881 |
@llvm/issue-subscribers-clang-frontend Author: Chuanqi Xu (ChuanqiXu9)
After https://github.com/llvm/llvm-project/commit/c4672454743e942f148a1aff1e809dae73e464f6, we will add a `noinline` attribute to the await_suspend member function of an awaiter if the awaiter has any non static member functions.
Obviously, this decision will bring some performance regressions. e.g., #64933 The suggested long term solution is to introduce the new intrinisic:
Then in the middle end, we can perform analysis the awaitSuspendFn to see if the coroutine handle escapes or not. If it won't escape, we can replace above intrinsic to the direct call. Otherwise, we will only convert them in the CoroSplit phase. The key point of the solution is that it is much easier and better to do analysis in the middle end instead of the frontend. |
@llvm/issue-subscribers-clang-codegen Author: Chuanqi Xu (ChuanqiXu9)
After https://github.com/llvm/llvm-project/commit/c4672454743e942f148a1aff1e809dae73e464f6, we will add a `noinline` attribute to the await_suspend member function of an awaiter if the awaiter has any non static member functions.
Obviously, this decision will bring some performance regressions. e.g., #64933 The suggested long term solution is to introduce the new intrinisic:
Then in the middle end, we can perform analysis the awaitSuspendFn to see if the coroutine handle escapes or not. If it won't escape, we can replace above intrinsic to the direct call. Otherwise, we will only convert them in the CoroSplit phase. The key point of the solution is that it is much easier and better to do analysis in the middle end instead of the frontend. |
After c467245, we will add a
noinline
attribute to the await_suspend member function of an awaiter if the awaiter has any non static member functions.Obviously, this decision will bring some performance regressions. e.g., #64933
The suggested long term solution is to introduce the new intrinisic:
Then in the middle end, we can perform analysis the awaitSuspendFn to see if the coroutine handle escapes or not. If it won't escape, we can replace above intrinsic to the direct call. Otherwise, we will only convert them in the CoroSplit phase.
The key point of the solution is that it is much easier and better to do analysis in the middle end instead of the frontend.
The text was updated successfully, but these errors were encountered: