Description
I tried to narrow down what the OpenMP AMDGPU bugs are all about, the ones that do not make sense. Unrelated changes cause it to pass or fail. Anyhow, I stumbled upon something interesting that I think is broken. I have a runnable reproducer but it's a little tricky (I use the JIT to splice in the IR). Anyhow, here is what I think should suffice, I hope, to see a problem. That is, for someone that actually understands AMDGCN.
In the attached zip file is a good.ll and a broken.ll. I got the respective .s files with llc -O0
.
In my experiments, good.ll will not run into the trap, broken.ll will.
The trap should not execute, assuming I didn't break stuff doing the manual reduction.
The initial code asserted that workgroup.id.x < workgroup.size.x
.
However, when I store away the latter in the broken version, I get 0, in the good version I get 256.
I think the underlying problem is some value propagation along the control edges.
If I store %i15 (workgroup.size.x)
in %bb
I get 256, if I do it in %bb194
I get 0, the same value that triggers my trap in the broken case.