Skip to content

Potential Miscompiles with RegUnits-based MachineLICM liveness calculation #96146

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Pierre-vh opened this issue Jun 20, 2024 · 15 comments
Open

Comments

@Pierre-vh
Copy link
Contributor

Pierre-vh commented Jun 20, 2024

After #94608 and #95746, some code can miscompile in AArch64 because Qn and Dn registers both only have Bn registers as their regunits, and nothing else.

This means that the when a regmask marks Dn as being preserved across a call, Qn is also preserved if we analyze liveness using register units. It's actually not preserved and it's the source of the miscompile.

The easy solution would be to just revert the patches, but I would like to avoid that outcome as RU-based liveness analysis is much faster, and MachineLICM was extremely expensive on AMDGPU prior to these patches due to how it used RegAliasIterator intensively.

I would like to first discuss other possibilities to sort this out. Ideally, Q registers would have something to represent the upper 64 bits that can be lost.

One option would be to add a fake high 64 register in TableGen that can't be selected by regalloc. Another option, which I tried in this branch, is to add another regunit to Q registers, but it seems to cause a lot of changes in codegen that I can't quite understand yet https://github.com/Pierre-vh/llvm-project/tree/rfc-self-ru

The miscompile has been fixed on trunk by making MachineLICM's handling of CSR regmasks overly conservative: #95926 - so this is not an urgent fix needed, but it's a sign of something wrong with reg units and I think it needs attention.

@llvmbot
Copy link
Member

llvmbot commented Jun 20, 2024

@llvm/issue-subscribers-backend-aarch64

Author: Pierre van Houtryve (Pierre-vh)

After https://github.com//pull/94608 and https://github.com//pull/95746, some code can miscompile in AArch64 because Qn and Dn registers both only have Bn registers as their regunits, and nothing else.

This means that the when a regmask marks Dn as being preserved across a call, Qn is also preserved if we analyze liveness using register units. It's actually not preserved and it's the source of the miscompile.

The easy solution would be to just revert the patches, but I would like to avoid that outcome as RU-based liveness analysis is much faster, and MachineLICM was extremely expensive on AMDGPU prior to these patches due to how it used RegAliasIterator intensively.

I would like to first discuss other possibilities to sort this out. Ideally, Q registers would have something to represent the upper 64 bits that can be lost.

One option would be to add a fake high 64 register in TableGen that can't be selected by regalloc. Another option, which I tried in this branch, is to add another regunit to Q registers, but it seems to cause a lot of changes in codegen that I can't quite understand yet https://github.com/Pierre-vh/llvm-project/tree/rfc-self-ru

@Pierre-vh
Copy link
Contributor Author

Pierre-vh added a commit to Pierre-vh/llvm-project that referenced this issue Jun 20, 2024
Fixes a miscompile on AArch64, at the cost of a small regression on AMDGPU.

llvm#96146 opened to investigate the issue.
Pierre-vh added a commit that referenced this issue Jun 20, 2024
Reverts the behavior introduced by 770393b while keeping the refactored
code.

Fixes a miscompile on AArch64, at the cost of a small regression on
AMDGPU.
#96146 opened to investigate the issue
@jayfoad
Copy link
Contributor

jayfoad commented Jun 20, 2024

@arsenm
Copy link
Contributor

arsenm commented Jun 20, 2024

Preserved masks should also really be in terms of regunits, not registers

@hvdijk
Copy link
Contributor

hvdijk commented Jun 20, 2024

This wasn't just AArch64, I saw this affect X86 as well. On Windows, XMM6-XMM15 are callee-saved, but #95746 resulted in YMM9 being moved out of a loop that contained function calls. The XMM9-part of YMM9 is callee-saved, but the rest of YMM9 is not. #95926 appears to work around / fix this for X86 as well, thanks for the quick action. Adding a comment to make sure this is known so it can be taken into account for follow up work.

@Pierre-vh Pierre-vh changed the title AArch64 Miscompile with RegUnits-based MachineLICM liveness calculation Potential Miscompiles with RegUnits-based MachineLICM liveness calculation Jun 20, 2024
@llvmbot
Copy link
Member

llvmbot commented Jun 20, 2024

@llvm/issue-subscribers-backend-x86

Author: Pierre van Houtryve (Pierre-vh)

After https://github.com//pull/94608 and https://github.com//pull/95746, some code can miscompile in AArch64 because Qn and Dn registers both only have Bn registers as their regunits, and nothing else.

This means that the when a regmask marks Dn as being preserved across a call, Qn is also preserved if we analyze liveness using register units. It's actually not preserved and it's the source of the miscompile.

The easy solution would be to just revert the patches, but I would like to avoid that outcome as RU-based liveness analysis is much faster, and MachineLICM was extremely expensive on AMDGPU prior to these patches due to how it used RegAliasIterator intensively.

I would like to first discuss other possibilities to sort this out. Ideally, Q registers would have something to represent the upper 64 bits that can be lost.

One option would be to add a fake high 64 register in TableGen that can't be selected by regalloc. Another option, which I tried in this branch, is to add another regunit to Q registers, but it seems to cause a lot of changes in codegen that I can't quite understand yet https://github.com/Pierre-vh/llvm-project/tree/rfc-self-ru

The miscompile has been fixed on trunk by making MachineLICM's handling of CSR regmasks overly conservative: #95926

@Pierre-vh
Copy link
Contributor Author

FWIW, I tried adding an extra regunitt in the AArch case, but it causes big changes to codegen. I think it changes regalloc somehow - perhaps allocation order changes? I will likely need the help of some people really familiar with LLVM RegAlloc infra to make it work.

@arsenm
Copy link
Contributor

arsenm commented Jul 3, 2024

FWIW, I tried adding an extra regunitt in the AArch case, but it causes big changes to codegen. I think it changes regalloc somehow - perhaps allocation order changes?

The allocation order is explicit and per register class. The order shouldn't have changed from adding a new unit

AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this issue Jul 9, 2024
Reverts the behavior introduced by 770393b while keeping the refactored
code.

Fixes a miscompile on AArch64, at the cost of a small regression on
AMDGPU.
llvm#96146 opened to investigate the issue
@vg0204
Copy link
Contributor

vg0204 commented May 20, 2025

I have a suggestion to deal with it, while doing codegening the regsiters, we can check whether explicitSubreg covers entire register based on size. If it is not, if somehow we can check for exactly for what location & size its missing, creating the addtional regUnit for them.

For example, in X86, in RAX (64-bit), the explicitSubregs is {EAX}. Since EAX covers only lower 32 bits can be checked somehow using SubRegIndex info. we can emit additional regUnit for uncovered parts of RAX at tablegening time. It will save as making changes for each tablegen for multiple targets, having such scenario where subregs & superregs cannot be distinguished using solely regunits, due to the fact the suprresg is not fully covered by subregs!

How does it sounds? Can it have some issues as I quite recently delved into this concept of register tableGen.

CC : @Pierre-vh , @jayfoad , @arsenm

@vg0204
Copy link
Contributor

vg0204 commented May 20, 2025

Adding to previous suggestion, even I have seen some superregs which is exactly composed of 1 subregs that exactly covers it entirely. For example, in RISCV, x0_w (32bit) is subreg of x0(also 32bit). So, why don't make it as an alias rather than subreg since now they both as aliases would have different regUnit list. As this case apparently seems me only that goes against the suggested solution above.

@jayfoad
Copy link
Contributor

jayfoad commented May 21, 2025

I have a suggestion to deal with it, while doing codegening the regsiters, we can check whether explicitSubreg covers entire register based on size. If it is not, if somehow we can check for exactly for what location & size its missing, creating the addtional regUnit for them.

That is difficult to implement because the tablegen definition of a register does not include its size.

@vg0204
Copy link
Contributor

vg0204 commented May 26, 2025

I have a suggestion to deal with it, while doing codegening the regsiters, we can check whether explicitSubreg covers entire register based on size. If it is not, if somehow we can check for exactly for what location & size its missing, creating the addtional regUnit for them.

That is difficult to implement because the tablegen definition of a register does not include its size.

What about the RegisterClass, they maintain the size. If somehow attach the regClass details on registers (which are respectively contained in the respective regClass), is it feasible then?

somehow we can check for exactly for what location & size its missing

This can be taken care by SubRegIdx(stores both size & relative location) associated with each subreg of superreg.

@arsenm
Copy link
Contributor

arsenm commented May 26, 2025

What about the RegisterClass, they maintain the size. If somehow attach the regClass details on registers (which are respectively contained in the respective regClass), is it feasible then?

Registers can belong to multiple register classes, which may not have the same size

@jayfoad
Copy link
Contributor

jayfoad commented May 27, 2025

There might be a couple of different options for fixing this:

  1. Define artificial subregs for each affected target, like 318c69d did for AArch64, until we get to a point where CoveredBySubRegs = 1 is true everywhere and we can remove support for CoveredBySubRegs = 0. Or:
  2. Change tablegen so that for any register that has subregs but CoveredBySubRegs = 0, we generate an extra regunit to represent the un-covered part(s). I'm not sure what lanemask this regunit would have.

@vg0204
Copy link
Contributor

vg0204 commented May 29, 2025

Define artificial subregs for each affected target, like 318c69d did for AArch64, until we get to a point where CoveredBySubRegs = 1 is true everywhere and we can remove support for CoveredBySubRegs = 0

Can anyone tell why the field CoveredBySubRegs comes into picture at first & how it contributes into tablegen for registers?

Change tablegen so that for any register that has subregs but CoveredBySubRegs = 0, we generate an extra regunit to represent the un-covered part(s). I'm not sure what lanemask this regunit would have.

Also, what root registers(leaf regs) it would have as in real there is no subreg covering it (since CoveredBySubRegs = 0). As for lanemask, cannot subregIndices help in that??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants