-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Potential Miscompiles with RegUnits-based MachineLICM liveness calculation #96146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@llvm/issue-subscribers-backend-aarch64 Author: Pierre van Houtryve (Pierre-vh)
After https://github.com//pull/94608 and https://github.com//pull/95746, some code can miscompile in AArch64 because Qn and Dn registers both only have Bn registers as their regunits, and nothing else.
This means that the when a regmask marks Dn as being preserved across a call, Qn is also preserved if we analyze liveness using register units. It's actually not preserved and it's the source of the miscompile. The easy solution would be to just revert the patches, but I would like to avoid that outcome as RU-based liveness analysis is much faster, and MachineLICM was extremely expensive on AMDGPU prior to these patches due to how it used RegAliasIterator intensively. I would like to first discuss other possibilities to sort this out. Ideally, Q registers would have something to represent the upper 64 bits that can be lost. One option would be to add a fake high 64 register in TableGen that can't be selected by regalloc. Another option, which I tried in this branch, is to add another regunit to Q registers, but it seems to cause a lot of changes in codegen that I can't quite understand yet https://github.com/Pierre-vh/llvm-project/tree/rfc-self-ru |
Fixes a miscompile on AArch64, at the cost of a small regression on AMDGPU. llvm#96146 opened to investigate the issue.
Preserved masks should also really be in terms of regunits, not registers |
This wasn't just AArch64, I saw this affect X86 as well. On Windows, XMM6-XMM15 are callee-saved, but #95746 resulted in YMM9 being moved out of a loop that contained function calls. The XMM9-part of YMM9 is callee-saved, but the rest of YMM9 is not. #95926 appears to work around / fix this for X86 as well, thanks for the quick action. Adding a comment to make sure this is known so it can be taken into account for follow up work. |
@llvm/issue-subscribers-backend-x86 Author: Pierre van Houtryve (Pierre-vh)
After https://github.com//pull/94608 and https://github.com//pull/95746, some code can miscompile in AArch64 because Qn and Dn registers both only have Bn registers as their regunits, and nothing else.
This means that the when a regmask marks Dn as being preserved across a call, Qn is also preserved if we analyze liveness using register units. It's actually not preserved and it's the source of the miscompile. The easy solution would be to just revert the patches, but I would like to avoid that outcome as RU-based liveness analysis is much faster, and MachineLICM was extremely expensive on AMDGPU prior to these patches due to how it used RegAliasIterator intensively. I would like to first discuss other possibilities to sort this out. Ideally, Q registers would have something to represent the upper 64 bits that can be lost. One option would be to add a fake high 64 register in TableGen that can't be selected by regalloc. Another option, which I tried in this branch, is to add another regunit to Q registers, but it seems to cause a lot of changes in codegen that I can't quite understand yet https://github.com/Pierre-vh/llvm-project/tree/rfc-self-ru The miscompile has been fixed on trunk by making MachineLICM's handling of CSR regmasks overly conservative: #95926 |
FWIW, I tried adding an extra regunitt in the AArch case, but it causes big changes to codegen. I think it changes regalloc somehow - perhaps allocation order changes? I will likely need the help of some people really familiar with LLVM RegAlloc infra to make it work. |
The allocation order is explicit and per register class. The order shouldn't have changed from adding a new unit |
Reverts the behavior introduced by 770393b while keeping the refactored code. Fixes a miscompile on AArch64, at the cost of a small regression on AMDGPU. llvm#96146 opened to investigate the issue
I have a suggestion to deal with it, while doing codegening the regsiters, we can check whether explicitSubreg covers entire register based on size. If it is not, if somehow we can check for exactly for what location & size its missing, creating the addtional regUnit for them. For example, in X86, in RAX (64-bit), the explicitSubregs is {EAX}. Since EAX covers only lower 32 bits can be checked somehow using SubRegIndex info. we can emit additional regUnit for uncovered parts of RAX at tablegening time. It will save as making changes for each tablegen for multiple targets, having such scenario where subregs & superregs cannot be distinguished using solely regunits, due to the fact the suprresg is not fully covered by subregs! How does it sounds? Can it have some issues as I quite recently delved into this concept of register tableGen. CC : @Pierre-vh , @jayfoad , @arsenm |
Adding to previous suggestion, even I have seen some superregs which is exactly composed of 1 subregs that exactly covers it entirely. For example, in RISCV, x0_w (32bit) is subreg of x0(also 32bit). So, why don't make it as an alias rather than subreg since now they both as aliases would have different regUnit list. As this case apparently seems me only that goes against the suggested solution above. |
That is difficult to implement because the tablegen definition of a register does not include its size. |
What about the RegisterClass, they maintain the size. If somehow attach the regClass details on registers (which are respectively contained in the respective regClass), is it feasible then?
This can be taken care by SubRegIdx(stores both size & relative location) associated with each subreg of superreg. |
Registers can belong to multiple register classes, which may not have the same size |
There might be a couple of different options for fixing this:
|
Can anyone tell why the field
Also, what root registers(leaf regs) it would have as in real there is no subreg covering it (since CoveredBySubRegs = 0). As for lanemask, cannot subregIndices help in that?? |
Uh oh!
There was an error while loading. Please reload this page.
After #94608 and #95746, some code can miscompile in AArch64 because Qn and Dn registers both only have Bn registers as their regunits, and nothing else.
This means that the when a regmask marks Dn as being preserved across a call, Qn is also preserved if we analyze liveness using register units. It's actually not preserved and it's the source of the miscompile.
The easy solution would be to just revert the patches, but I would like to avoid that outcome as RU-based liveness analysis is much faster, and MachineLICM was extremely expensive on AMDGPU prior to these patches due to how it used RegAliasIterator intensively.
I would like to first discuss other possibilities to sort this out. Ideally, Q registers would have something to represent the upper 64 bits that can be lost.
One option would be to add a fake high 64 register in TableGen that can't be selected by regalloc. Another option, which I tried in this branch, is to add another regunit to Q registers, but it seems to cause a lot of changes in codegen that I can't quite understand yet https://github.com/Pierre-vh/llvm-project/tree/rfc-self-ru
The miscompile has been fixed on trunk by making MachineLICM's handling of CSR regmasks overly conservative: #95926 - so this is not an urgent fix needed, but it's a sign of something wrong with reg units and I think it needs attention.
The text was updated successfully, but these errors were encountered: