Skip to content

[Coroutines] Support @llvm.coro.await_suspend actually #64945

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ChuanqiXu9 opened this issue Aug 24, 2023 · 11 comments
Closed

[Coroutines] Support @llvm.coro.await_suspend actually #64945

ChuanqiXu9 opened this issue Aug 24, 2023 · 11 comments
Assignees
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang:frontend Language frontend issues, e.g. anything involving "Sema" coroutines C++20 coroutines missed-optimization

Comments

@ChuanqiXu9
Copy link
Member

After c467245, we will add a noinline attribute to the await_suspend member function of an awaiter if the awaiter has any non static member functions.

Obviously, this decision will bring some performance regressions. e.g., #64933

The suggested long term solution is to introduce the new intrinisic:

call @llvm.coro.await_suspend(ptr %awaiter, ptr %handle,
                                                      ptr @awaitSuspendFn)

Then in the middle end, we can perform analysis the awaitSuspendFn to see if the coroutine handle escapes or not. If it won't escape, we can replace above intrinsic to the direct call. Otherwise, we will only convert them in the CoroSplit phase.

The key point of the solution is that it is much easier and better to do analysis in the middle end instead of the frontend.

@llvmbot
Copy link
Member

llvmbot commented Aug 24, 2023

@llvm/issue-subscribers-coroutines

ChuanqiXu9 added a commit that referenced this issue Aug 24, 2023
… regressions

The fix we sent for #56301
may bring performance regressions. But we didn't mention it in the
ReleaseNotes so that users may get confused. e.g,
#64933. So this patch
mentions the possible side effect and the potential solutions in
#64945 to avoid
misunderstandings.
@kelbon
Copy link
Contributor

kelbon commented Aug 24, 2023

Sorry, maybe i dont fully understand problem described in #56301 , but it seems common problem is that all inlined functions locals for some reason becames part of coroutine frame, while they are not.

If it is problem may we somehow mark coroutine-used locals/args/unnamed temp values(including awaiters) at frontend and then on middle end they will use coroutine frame as storage and others just use stack?

llvmbot pushed a commit to llvm/llvm-project-release-prs that referenced this issue Aug 25, 2023
… regressions

The fix we sent for llvm/llvm-project#56301
may bring performance regressions. But we didn't mention it in the
ReleaseNotes so that users may get confused. e.g,
llvm/llvm-project#64933. So this patch
mentions the possible side effect and the potential solutions in
llvm/llvm-project#64945 to avoid
misunderstandings.
@ChuanqiXu9
Copy link
Member Author

Sorry, maybe i dont fully understand problem described in #56301 , but it seems common problem is that all inlined functions locals for some reason becames part of coroutine frame, while they are not.

If it is problem may we somehow mark coroutine-used locals/args/unnamed temp values(including awaiters) at frontend and then on middle end they will use coroutine frame as storage and others just use stack?

It may not be easy to explain in several words. But generally it is bad/annoying/inefficient to do analysis jobs in the frontend.

And I know (and all the reviewers and commentors know) the method will bring some bad codes. But the correctness is better than the performance. BTW, although we had a long term solution plan to fix the issue, I would expect that there are still some performance regressions if your await_suspend is super complex.

@ChuanqiXu9
Copy link
Member Author

Oh, wait. Maybe it is a idea to not do anything if the await_suspend function is marked always_suspend by the user explicitly. I'll do this.

@kelbon
Copy link
Contributor

kelbon commented Aug 25, 2023

Oh, wait. Maybe it is a idea to not do anything if the await_suspend function is marked always_suspend by the user explicitly. I'll do this.

I was talking about this:
#56301 (comment)

For example we have code:

my_coroutine c() {
  foo(...);
  co_await <expr>;
  bar(args...);
}

Here compiler must be sure, that no matter what 'foo' and 'bar' do and return, coroutine frame must never store anything which is not used in <expr>

But as i understand after inlining 'foo' and 'bar' clang do not sees difference between local coroutine variable and something created as local in inlined function. I saw this on clang 14/15 when repeating function call was increasing coroutine frame without real reasion

co_await <expr> turns to

  • create awaiter (stored in frame)
  • just call functions (await_ready, await_suspend, await_resume ...), all what they will produce must not be stored in coroutine frame

And problem with #56301 coro handle returned from 'Register' and stored as local variable in 'final_suspend' stored on coroutine frame.

If clang 'splits' coroutine only on clang-IR stage, may be introduce tags like 'call in the coroutine' which after inline will be like:

inlined_call_start:
/* something */
inlined_call_end:

tru pushed a commit to llvm/llvm-project-release-prs that referenced this issue Aug 27, 2023
… regressions

The fix we sent for llvm/llvm-project#56301
may bring performance regressions. But we didn't mention it in the
ReleaseNotes so that users may get confused. e.g,
llvm/llvm-project#64933. So this patch
mentions the possible side effect and the potential solutions in
llvm/llvm-project#64945 to avoid
misunderstandings.
ChuanqiXu9 added a commit that referenced this issue Aug 28, 2023
…ecified as always_inline already

Address #64933 and partially
#64945.

After c467245, we will add a noinline attribute to the await_suspend
member function of an awaiter if the awaiter has any non static member
functions.

Obviously, this decision will bring some performance regressions. And
people may complain about this while the long term solution may not be
available soon. In such cases, it is better to provide a solution for
the users who met the regression surprisingly.

Also it is natural to not prevent the inlining if the function is marked
as always_inline by the users already.
@ChuanqiXu9
Copy link
Member Author

@kelbon BTW, I don't know why I can't receive the mail from your comment. And your comment about inlining tags is relatively implemented as lifetime intrinsics in LLVM IR... But things in the middle end transformation are pretty complex to handle and explain... the variables in the frontend may get moved and merged or optimized away. And the many optimization are not teached about the idea of coroutines well. Then this is the root problem and we can't a lot things for that at once.

ChuanqiXu9 added a commit that referenced this issue Aug 29, 2023
Close #65054

The direct issue is still the call to coroutine_handle<>::address()
after await_suspend(). Without optimizations, the current logic will put
the temporary result of await_suspend() to the coroutine frame since the
middle end feel the temporary is escaped from
coroutine_handle<>::address. To fix this fundamentally, we should wrap
the whole logic about await-suspend into a standalone function. See
#64945

And as a short-term workaround, we probably can mark
coroutine_handle<>::address() as always-inline so that the temporary
result may not be thought to be escaped then it won't be put on the
coroutine frame. Although it looks dirty, it is probably do-able since
the compiler are allowed to do special tricks to standard library
components.
@fpasserby
Copy link
Contributor

Hello! Even though I'm new to LLVM, I'd like to work on this issue. I already have a draft implementation, that seems to work. It would be great if somebody was willing to review it and provide guidance so it could be completed.
Would you be interested?

@ChuanqiXu9
Copy link
Member Author

Hello! Even though I'm new to LLVM, I'd like to work on this issue. I already have a draft implementation, that seems to work. It would be great if somebody was willing to review it and provide guidance so it could be completed. Would you be interested?

Thanks! I'll try to look into it later this week.

ChuanqiXu9 pushed a commit that referenced this issue Mar 11, 2024
Implement `llvm.coro.await.suspend` intrinsics, to deal with performance
regression after prohibiting `.await_suspend` inlining, as suggested in
#64945.
Actually, there are three new intrinsics, which directly correspond to
each of three forms of `await_suspend`:
```
void llvm.coro.await.suspend.void(ptr %awaiter, ptr %frame, ptr @wrapperFunction)
i1 llvm.coro.await.suspend.bool(ptr %awaiter, ptr %frame, ptr @wrapperFunction)
ptr llvm.coro.await.suspend.handle(ptr %awaiter, ptr %frame, ptr @wrapperFunction)
```
There are three different versions instead of one, because in `bool`
case it's result is used for resuming via a branch, and in
`coroutine_handle` case exceptions from `await_suspend` are handled in
the coroutine, and exceptions from the subsequent `.resume()` are
propagated to the caller.

Await-suspend block is simplified down to intrinsic calls only, for
example for symmetric transfer:
```
%id = call token @llvm.coro.save(ptr null)
%handle = call ptr @llvm.coro.await.suspend.handle(ptr %awaiter, ptr %frame, ptr @wrapperFunction)
call void @llvm.coro.resume(%handle)
%result = call i8 @llvm.coro.suspend(token %id, i1 false)
switch i8 %result, ...
```
All await-suspend logic is moved out into a wrapper function, generated
for each suspension point.
The signature of the function is `<type> wrapperFunction(ptr %awaiter,
ptr %frame)` where `<type>` is one of `void` `i1` or `ptr`, depending on
the return type of `await_suspend`.
Intrinsic calls are lowered during `CoroSplit` pass, right after the
split.

Because I'm new to LLVM, I'm not sure if the helper function generation,
calls to them and lowering are implemented in the right way, especially
with regard to various metadata and attributes, i. e. for TBAA. All
things that seemed questionable are marked with `FIXME` comments.

There is another detail: in case of symmetric transfer raw pointer to
the frame of coroutine, that should be resumed, is returned from the
helper function and a direct call to `@llvm.coro.resume` is generated.
C++ standard demands, that `.resume()` method is evaluated. Not sure how
important is this, because code has been generated in the same way
before, sans helper function.
@ChuanqiXu9
Copy link
Member Author

This should be implemented by f786881

@EugeneZelenko EugeneZelenko added clang:frontend Language frontend issues, e.g. anything involving "Sema" and removed llvm:optimizations labels Nov 6, 2024
@EugeneZelenko EugeneZelenko added the clang:codegen IR generation bugs: mangling, exceptions, etc. label Nov 6, 2024
@llvmbot
Copy link
Member

llvmbot commented Nov 6, 2024

@llvm/issue-subscribers-clang-frontend

Author: Chuanqi Xu (ChuanqiXu9)

After https://github.com/llvm/llvm-project/commit/c4672454743e942f148a1aff1e809dae73e464f6, we will add a `noinline` attribute to the await_suspend member function of an awaiter if the awaiter has any non static member functions.

Obviously, this decision will bring some performance regressions. e.g., #64933

The suggested long term solution is to introduce the new intrinisic:

call @<!-- -->llvm.coro.await_suspend(ptr %awaiter, ptr %handle,
                                                      ptr @<!-- -->awaitSuspendFn)

Then in the middle end, we can perform analysis the awaitSuspendFn to see if the coroutine handle escapes or not. If it won't escape, we can replace above intrinsic to the direct call. Otherwise, we will only convert them in the CoroSplit phase.

The key point of the solution is that it is much easier and better to do analysis in the middle end instead of the frontend.

@llvmbot
Copy link
Member

llvmbot commented Nov 6, 2024

@llvm/issue-subscribers-clang-codegen

Author: Chuanqi Xu (ChuanqiXu9)

After https://github.com/llvm/llvm-project/commit/c4672454743e942f148a1aff1e809dae73e464f6, we will add a `noinline` attribute to the await_suspend member function of an awaiter if the awaiter has any non static member functions.

Obviously, this decision will bring some performance regressions. e.g., #64933

The suggested long term solution is to introduce the new intrinisic:

call @<!-- -->llvm.coro.await_suspend(ptr %awaiter, ptr %handle,
                                                      ptr @<!-- -->awaitSuspendFn)

Then in the middle end, we can perform analysis the awaitSuspendFn to see if the coroutine handle escapes or not. If it won't escape, we can replace above intrinsic to the direct call. Otherwise, we will only convert them in the CoroSplit phase.

The key point of the solution is that it is much easier and better to do analysis in the middle end instead of the frontend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang:frontend Language frontend issues, e.g. anything involving "Sema" coroutines C++20 coroutines missed-optimization
Projects
Status: Done
Development

No branches or pull requests

5 participants