Skip to content

X86: Delete MMX types/intrinsics from LLVM IR/backends #98272

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 of 6 tasks
jyknight opened this issue Jul 10, 2024 · 3 comments
Open
3 of 6 tasks

X86: Delete MMX types/intrinsics from LLVM IR/backends #98272

jyknight opened this issue Jul 10, 2024 · 3 comments
Assignees

Comments

@jyknight
Copy link
Member

jyknight commented Jul 10, 2024

This issue is about removing IR and SelectionDAG/Codegen support. Assembler support should remain.

Previous discussion on discourse Proposal to remove MMX support

The idea is, at the LLVM level, to keep only minimal support for the inlineasm "y" constraints, and remove as much of the rest as possible.

Overall plan:

  1. Delete all support for 3dNow! (PR Remove support for 3DNow!, both intrinsics and builtins. #96246)
  2. Delete the x86_mmx type from IR. Can be done by using (at the IR level only) a standard vector type, <1 x i64>, instead. Notably, Clang already uses <1 x i64> for everything except where required to interface with mmx intrinsics and inline-asm. The conversion for those interfaces can be pushed down into SelectionDAG instead. (PR Remove the x86_mmx IR type. #98505) (PR Cleanup x86_mmx after removing IR type  #100646)
  3. Migrate Clang-side MMX builtins to be backed by SSE2 instead of MMX so they can continue to work without requiring the MMX intrinsic functions in IR (and as a bonus, they'll likely get faster, too). (bug [X86] Implement MMX intrinsics with SSE equivalents #41665, PR Clang: convert __m64 intrinsics to unconditionally use SSE2 instead of MMX. #96540)
  4. Delete the IR-side MMX intrinsic functions.
    • Open question: how to deal with bitcode backwards compatibility for these? Maybe simplest to convert the intrinsics to inline-asm in autoupgrade.
  5. Delete as much of the remaining X86 backend code related to MMX as possible, without breaking inline asm "y" constraints.
  6. Decide whether to do anything about inserting EMMS instructions (bug [X86] Add pass to insert EMMS/FEMMS instructions to separate MMX and X87 states #41664). At this point, it'd be only relevant to inline-asm. So we could potentially insert such an instruction directly after all inline-asm which have MMX register inputs/outputs/clobbers. Or we could also (continue to) not bother.

@efriedma-quic @phoebewang @topperc @RKSimon

@llvmbot
Copy link
Member

llvmbot commented Jul 10, 2024

@llvm/issue-subscribers-backend-x86

Author: James Y Knight (jyknight)

This issue is about removing IR and SelectionDAG/Codegen support. Assembler support should remain.

Previous discussion on discourse Proposal to remove MMX support

The idea is, at the LLVM level, to keep only minimal support for the inlineasm "y" constraints, and remove as much of the rest as possible.

Overall plan:

  1. Delete all support for 3dNow! (PR #96246)
  2. Delete the x86_mmx type from IR. Can be done by using (at the IR level only) a standard vector type, &lt;1 x i64&gt;, instead. Notably, Clang already uses &lt;1 x i64&gt; for everything except where required to interface with mmx intrinsics and inline-asm. The conversion for those interfaces can be pushed down into SelectionDAG instead. (PR for this pending)
  3. Migrate Clang-side MMX builtins to be backed by SSE2 instead of MMX (bug #41665, PR #96540), so they can continue to work without requiring the MMX intrinsic functions in IR (and as a bonus, they'll likely get faster, too).
  4. Delete the IR-side MMX intrinsic functions.
    • Open question: how to deal with bitcode backwards compatibility for these? Maybe simplest to convert the intrinsics to inline-asm in autoupgrade.
  5. Delete as much of the remaining X86 backend code related to MMX as possible, without breaking inline asm "y" constraints.
  6. Decide whether to do anything about inserting EMMS instructions (bug #41664). At this point, it'd be only relevant to inline-asm. So we could potentially insert such an instruction directly after all inline-asm which have MMX register inputs/outputs/clobbers. Or we could also (continue to) not bother.

@efriedma-quic @phoebewang @topperc @RKSimon

@jyknight jyknight self-assigned this Jul 10, 2024
jyknight added a commit that referenced this issue Jul 16, 2024
This set of instructions was only supported by AMD chips starting in
the K6-2 (introduced 1998), and before the "Bulldozer" family
(2011). They were never much used, as they were effectively superseded
by the more-widely-implemented SSE (first implemented on the AMD side
in Athlon XP in 2001).

This is being done as a predecessor towards general removal of MMX
register usage. Since there is almost no usage of the 3DNow!
intrinsics, and no modern hardware even implements them, simple
removal seems like the best option.

(Clang half originally uploaded in https://reviews.llvm.org/D94213)

Works towards issue #41665 and issue #98272.
@frobtech
Copy link
Contributor

The current state of things seems to be that the -mno-3dnow switch is still accepted by the Clang driver, but using it causes these messages:

'-3dnow' is not a recognized feature for this target (ignoring feature)
'-3dnowa' is not a recognized feature for this target (ignoring feature)

@RKSimon
Copy link
Collaborator

RKSimon commented Jul 17, 2024

@frobtech #99352 should address this

jyknight added a commit that referenced this issue Jul 24, 2024
… of MMX. (#96540)

The MMX instruction set is legacy, and the SSE2 variants are in every
way superior, when they are available -- and they have been available
since the Pentium 4 was released, 20 years ago.

Therefore, we are switching the "MMX" intrinsics to depend on SSE2,
unconditionally. This change entirely drops the ability to generate
vectorized code using compiler intrinsics for chips with MMX but without
SSE2: the Intel Pentium MMX, Pentium, II, and Pentium III (released
1997-1999), as well as AMD K6 and K7 series chips of around the same
timeframe. Targeting these older CPUs remains supported -- simply
without the ability to use MMX compiler intrinsics.

Migrating away from the use of MMX registers also fixes a rather
non-obvious requirement. The long-standing programming model for these
MMX intrinsics requires that the programmer be aware of the x87/MMX
mode-switching semantics, and manually call `_mm_empty()` between using
any MMX instruction and any x87 FPU instruction. If you neglect to, then
every future x87 operation will return a NaN result. This requirement is
not at all obvious to users of these these intrinsic functions, and
causes very difficult to detect bugs.

Worse, even if the user did write code that correctly calls
`_mm_empty()` in the right places, LLVM may sometimes reorder x87 and
mmx operations around each-other, unaware of this mode switching issue.

Eliminating the use of MMX registers eliminates this problem.

This change also deletes the now-unnecessary MMX `__builtin_ia32_*`
functions from Clang. Only 3 MMX-related builtins remain in use --
`__builtin_ia32_emms`, used by `_mm_empty`, and
`__builtin_ia32_vec_{ext,set}_v4si`, used by `_mm_insert_pi16` and
`_mm_extract_pi16`. Note particularly that the latter two lower to
generic, non-MMX, IR. Support for the LLVM intrinsics underlying these
removed builtins still remains, for the moment.

The file `clang/www/builtins.py` has been updated with mappings from the
newly-removed `__builtin_ia32` functions to the still-supported
equivalents in `mmintrin.h`.

(Originally uploaded at https://reviews.llvm.org/D86855 and
https://reviews.llvm.org/D94252)

Fixes issue #41665
Works towards #98272
jyknight added a commit that referenced this issue Jul 25, 2024
It is now translated to `<1 x i64>`, which allows the removal of a bunch
of special casing.

This _incompatibly_ changes the ABI of any LLVM IR function with
`x86_mmx` arguments or returns: instead of passing in mmx registers,
they will now be passed via integer registers. However, the real-world
incompatibility caused by this is expected to be minimal, because Clang
never uses the x86_mmx type -- it lowers `__m64` to either `<1 x i64>`
or `double`, depending on ABI.

This change does _not_ eliminate the SelectionDAG `MVT::x86mmx` type.
That type simply no longer corresponds to an IR type, and is used only
by MMX intrinsics and inline-asm operands.

Because SelectionDAGBuilder only knows how to generate the
operands/results of intrinsics based on the IR type, it thus now
generates the intrinsics with the type MVT::v1i64, instead of
MVT::x86mmx. We need to fix this before the DAG LegalizeTypes, and thus
have the X86 backend fix them up in DAGCombine. (This may be a
short-lived hack, if all the MMX intrinsics can be removed in upcoming
changes.)

Works towards issue #98272.
yuxuanchen1997 pushed a commit that referenced this issue Jul 25, 2024
This set of instructions was only supported by AMD chips starting in
the K6-2 (introduced 1998), and before the "Bulldozer" family
(2011). They were never much used, as they were effectively superseded
by the more-widely-implemented SSE (first implemented on the AMD side
in Athlon XP in 2001).

This is being done as a predecessor towards general removal of MMX
register usage. Since there is almost no usage of the 3DNow!
intrinsics, and no modern hardware even implements them, simple
removal seems like the best option.

(Clang half originally uploaded in https://reviews.llvm.org/D94213)

Works towards issue #41665 and issue #98272.
yuxuanchen1997 pushed a commit that referenced this issue Jul 25, 2024
… of MMX. (#96540)

Summary:
The MMX instruction set is legacy, and the SSE2 variants are in every
way superior, when they are available -- and they have been available
since the Pentium 4 was released, 20 years ago.

Therefore, we are switching the "MMX" intrinsics to depend on SSE2,
unconditionally. This change entirely drops the ability to generate
vectorized code using compiler intrinsics for chips with MMX but without
SSE2: the Intel Pentium MMX, Pentium, II, and Pentium III (released
1997-1999), as well as AMD K6 and K7 series chips of around the same
timeframe. Targeting these older CPUs remains supported -- simply
without the ability to use MMX compiler intrinsics.

Migrating away from the use of MMX registers also fixes a rather
non-obvious requirement. The long-standing programming model for these
MMX intrinsics requires that the programmer be aware of the x87/MMX
mode-switching semantics, and manually call `_mm_empty()` between using
any MMX instruction and any x87 FPU instruction. If you neglect to, then
every future x87 operation will return a NaN result. This requirement is
not at all obvious to users of these these intrinsic functions, and
causes very difficult to detect bugs.

Worse, even if the user did write code that correctly calls
`_mm_empty()` in the right places, LLVM may sometimes reorder x87 and
mmx operations around each-other, unaware of this mode switching issue.

Eliminating the use of MMX registers eliminates this problem.

This change also deletes the now-unnecessary MMX `__builtin_ia32_*`
functions from Clang. Only 3 MMX-related builtins remain in use --
`__builtin_ia32_emms`, used by `_mm_empty`, and
`__builtin_ia32_vec_{ext,set}_v4si`, used by `_mm_insert_pi16` and
`_mm_extract_pi16`. Note particularly that the latter two lower to
generic, non-MMX, IR. Support for the LLVM intrinsics underlying these
removed builtins still remains, for the moment.

The file `clang/www/builtins.py` has been updated with mappings from the
newly-removed `__builtin_ia32` functions to the still-supported
equivalents in `mmintrin.h`.

(Originally uploaded at https://reviews.llvm.org/D86855 and
https://reviews.llvm.org/D94252)

Fixes issue #41665
Works towards #98272

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60250580
yuxuanchen1997 pushed a commit that referenced this issue Jul 25, 2024
Summary:
It is now translated to `<1 x i64>`, which allows the removal of a bunch
of special casing.

This _incompatibly_ changes the ABI of any LLVM IR function with
`x86_mmx` arguments or returns: instead of passing in mmx registers,
they will now be passed via integer registers. However, the real-world
incompatibility caused by this is expected to be minimal, because Clang
never uses the x86_mmx type -- it lowers `__m64` to either `<1 x i64>`
or `double`, depending on ABI.

This change does _not_ eliminate the SelectionDAG `MVT::x86mmx` type.
That type simply no longer corresponds to an IR type, and is used only
by MMX intrinsics and inline-asm operands.

Because SelectionDAGBuilder only knows how to generate the
operands/results of intrinsics based on the IR type, it thus now
generates the intrinsics with the type MVT::v1i64, instead of
MVT::x86mmx. We need to fix this before the DAG LegalizeTypes, and thus
have the X86 backend fix them up in DAGCombine. (This may be a
short-lived hack, if all the MMX intrinsics can be removed in upcoming
changes.)

Works towards issue #98272.

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60250667
jyknight added a commit that referenced this issue Jul 28, 2024
After #98505, the textual IR keyword `x86_mmx` was temporarily made to
parse as `<1 x i64>`, so as not to require a lot of test update noise.

This completes the removal of the type, by removing the`x86_mmx` keyword
from the IR parser, and making the (now no-op) test updates via `sed -i
's/\bx86_mmx\b/<1 x i64>/g' $(git grep -l x86_mmx llvm/test/)`.
Resulting bitcasts from <1 x i64> to itself were then manually deleted.

Changes to llvm/test/Bitcode/compatibility-$VERSION.ll were reverted, as
they're intended to be equivalent to the .bc file, if parsed by old
LLVM, so shouldn't be updated.

A few tests were removed, as they're no longer testing anything, in the
following files:
- llvm/test/Transforms/GlobalOpt/x86_mmx_load.ll
- llvm/test/Transforms/InstCombine/cast.ll
- llvm/test/Transforms/InstSimplify/ConstProp/gep-zeroinit-vector.ll

Works towards issue #98272.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants