-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[AMDGPU] Wrong O0 codegen for workgroup id x + control flow #61083
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@llvm/issue-subscribers-backend-amdgpu |
I checked, the LLVM-IR looks good till the very end, unable to debug this one :( |
The broken version has a spill and reload, and the "good" version has the spill but no matching reload |
This is because the second VGPR run of regalloc fast picks the wrong point to restore the VGPR reload at the top of the block. The first SGPR regalloc run does find the correct prolog insert point, but since it has live in-SGPRs used in the prolog, it inserts restores there. The prolog identifying code then does not skip over these new instructions in the second VGPR run so the VGPR spill gets inserted before exec is written |
Changing this to true should fix/work around this |
Lit test update for making this always split was not as bad as I thought. The only uncovered issue I see is one WQM globalisel tests fails with a LiveVariables update verifier error |
https://reviews.llvm.org/D145323 fixes the uncovered LiveVariables error |
Patch still breaks blender after https://reviews.llvm.org/D153859 and https://reviews.llvm.org/D153877 fix the machine verifier failures in it |
Can you check if the following commit fixes the wrong codegen at -O0? |
This is most likely the issue that #86012 should fix |
I tried to narrow down what the OpenMP AMDGPU bugs are all about, the ones that do not make sense. Unrelated changes cause it to pass or fail. Anyhow, I stumbled upon something interesting that I think is broken. I have a runnable reproducer but it's a little tricky (I use the JIT to splice in the IR). Anyhow, here is what I think should suffice, I hope, to see a problem. That is, for someone that actually understands AMDGCN.
In the attached zip file is a good.ll and a broken.ll. I got the respective .s files with
llc -O0
.In my experiments, good.ll will not run into the trap, broken.ll will.
The trap should not execute, assuming I didn't break stuff doing the manual reduction.
The initial code asserted that
workgroup.id.x < workgroup.size.x
.However, when I store away the latter in the broken version, I get 0, in the good version I get 256.
I think the underlying problem is some value propagation along the control edges.
If I store
%i15 (workgroup.size.x)
in%bb
I get 256, if I do it in%bb194
I get 0, the same value that triggers my trap in the broken case.@arsenm @nhaehnle Help appreciated.
amdgpu_backend_bug.zip
The text was updated successfully, but these errors were encountered: