Skip to content

[AMDGPU] Wrong O0 codegen for workgroup id x + control flow #61083

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jdoerfert opened this issue Mar 1, 2023 · 11 comments
Open

[AMDGPU] Wrong O0 codegen for workgroup id x + control flow #61083

jdoerfert opened this issue Mar 1, 2023 · 11 comments

Comments

@jdoerfert
Copy link
Member

I tried to narrow down what the OpenMP AMDGPU bugs are all about, the ones that do not make sense. Unrelated changes cause it to pass or fail. Anyhow, I stumbled upon something interesting that I think is broken. I have a runnable reproducer but it's a little tricky (I use the JIT to splice in the IR). Anyhow, here is what I think should suffice, I hope, to see a problem. That is, for someone that actually understands AMDGCN.

In the attached zip file is a good.ll and a broken.ll. I got the respective .s files with llc -O0.
In my experiments, good.ll will not run into the trap, broken.ll will.
The trap should not execute, assuming I didn't break stuff doing the manual reduction.
The initial code asserted that workgroup.id.x < workgroup.size.x.
However, when I store away the latter in the broken version, I get 0, in the good version I get 256.

I think the underlying problem is some value propagation along the control edges.
If I store %i15 (workgroup.size.x) in %bb I get 256, if I do it in %bb194 I get 0, the same value that triggers my trap in the broken case.

@arsenm @nhaehnle Help appreciated.

amdgpu_backend_bug.zip

@llvmbot
Copy link
Member

llvmbot commented Mar 1, 2023

@llvm/issue-subscribers-backend-amdgpu

@jdoerfert
Copy link
Member Author

I checked, the LLVM-IR looks good till the very end, unable to debug this one :(

@arsenm
Copy link
Contributor

arsenm commented Mar 2, 2023

The broken version has a spill and reload, and the "good" version has the spill but no matching reload

@arsenm
Copy link
Contributor

arsenm commented Mar 2, 2023

This is because the second VGPR run of regalloc fast picks the wrong point to restore the VGPR reload at the top of the block. The first SGPR regalloc run does find the correct prolog insert point, but since it has live in-SGPRs used in the prolog, it inserts restores there. The prolog identifying code then does not skip over these new instructions in the second VGPR run so the VGPR spill gets inserted before exec is written

@arsenm
Copy link
Contributor

arsenm commented Mar 4, 2023

Changing this to true should fix/work around this

@arsenm
Copy link
Contributor

arsenm commented Mar 4, 2023

Lit test update for making this always split was not as bad as I thought. The only uncovered issue I see is one WQM globalisel tests fails with a LiveVariables update verifier error

@arsenm
Copy link
Contributor

arsenm commented Mar 5, 2023

https://reviews.llvm.org/D145323 fixes the uncovered LiveVariables error

@arsenm
Copy link
Contributor

arsenm commented Mar 5, 2023

@arsenm
Copy link
Contributor

arsenm commented Jun 28, 2023

Patch still breaks blender after https://reviews.llvm.org/D153859 and https://reviews.llvm.org/D153877 fix the machine verifier failures in it

@cdevadas
Copy link
Collaborator

Can you check if the following commit fixes the wrong codegen at -O0?
a0eb6b8

@arsenm
Copy link
Contributor

arsenm commented Mar 29, 2024

This is most likely the issue that #86012 should fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants