Skip to content

[X86] Implement MMX intrinsics with SSE equivalents #41665

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
RKSimon opened this issue Jun 19, 2019 · 3 comments
Closed

[X86] Implement MMX intrinsics with SSE equivalents #41665

RKSimon opened this issue Jun 19, 2019 · 3 comments
Labels
backend:X86 bugzilla Issues migrated from bugzilla

Comments

@RKSimon
Copy link
Collaborator

RKSimon commented Jun 19, 2019

Bugzilla Link 42320
Version trunk
OS Windows NT
CC @topperc,@efriedma-quic,@jyknight,@RKSimon,@rotateright

Extended Description

Similar to what's been proposed recently for gcc, we should investigate promoting MMX intrinsics to SSE equivalents:

https://gcc.gnu.org/ml/gcc-patches/2019-02/msg00061.html

This probably would be best handled in CGBuiltin.cpp - replacing the MMX builtins with SSE equivalents, although some can probably done in headers as well behind a suitable define.

NOTE: This will cause a high number of subvector insertions/extractions, we might need some mechanism to reduce this even without optimizations.

@efriedma-quic
Copy link
Collaborator

NOTE: This will cause a high number of subvector insertions/extractions, we
might need some mechanism to reduce this even without optimizations.

If we switch to widening 64-bit vectors by default, instead of promoting them, the conversions would be free, so it wouldn't really matter. Otherwise, yes, this could get messy; we might need special "fake-MMX" intrinsics.

@jyknight
Copy link
Member

jyknight commented Jan 9, 2021

Being implemented with:
https://reviews.llvm.org/D86855:
Convert __m64 intrinsics to unconditionally use SSE2 instead of MMX instructions

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
jyknight added a commit to jyknight/llvm-project that referenced this issue Jun 20, 2024
This set of instructions was only supported by AMD chips starting in
the K6-2 (introduced 1998), and before the "Bulldozer" family
(2011). They were never much used, as they were effectively superseded
by the more-widely-implemented SSE (first implemented on the AMD side
in Athlon XP in 2001).

This is being done as a predecessor towards general removal of MMX
register usage. Since there is almost no usage of the 3DNow!
intrinsics, and no modern hardware even implements them, simple
removal seems like the best option.

Support for the underlying LLVM intrinsics remains, for the
moment. They will be removed in a future patch.

(Originally uploaded in https://reviews.llvm.org/D94213)

Works towards issue llvm#41665.
jyknight added a commit to jyknight/llvm-project that referenced this issue Jun 20, 2024
This set of instructions was only supported by AMD chips starting in
the K6-2 (introduced 1998), and before the "Bulldozer" family
(2011). They were never much used, as they were effectively superseded
by the more-widely-implemented SSE (first implemented on the AMD side
in Athlon XP in 2001).

This is being done as a predecessor towards general removal of MMX
register usage. Since there is almost no usage of the 3DNow!
intrinsics, and no modern hardware even implements them, simple
removal seems like the best option.

Works towards issue llvm#41665.
jyknight added a commit to jyknight/llvm-project that referenced this issue Jun 24, 2024
of MMX instructions.

The MMX instruction set is legacy, and the SSE2 variants are in every
way superior, when they are available -- and they have been available
since the Pentium 4 was released, 20 years ago.

Therefore, we are switching the "MMX" intrinsics to depend on SSE2,
unconditionally. This change entirely drops the ability to generate
vectorized code using compiler intrinsics for chips with MMX but
without SSE2: the Intel Pentium MMX, Pentium, II, and Pentium III
(released 1997-1999), as well as AMD K6 and K7 series chips of around
the same timeframe. (Note that targeting these older CPUs remains
supported, simply without the ability to use MMX compiler intrinsics.)

Migrating away from the use of MMX also fixes a rather non-obvious
requirement for users of the intrinsics API. The long-standing
programming model for MMX requires that the programmer be aware of the
x87/MMX mode-switching semantics, and manually call _mm_empty()
between using any MMX instruction and any x87 FPU instruction. If you
neglect to, then every future x87 operation will return a NaN
result. This requirement is not at all obvious to users of these these
intrinsics, and causes very difficult to detect bugs.

Additionally, in some circumstanes LLVM may reorder x87 and mmx
operations around each-other, unaware of this mode switching
issue. So, even inserting _mm_empty() calls appropriately will not
always guarantee correct operation.

Eliminating the use of MMX instructions fixes both these latter
issues.

Works towards issue llvm#41665.
jyknight added a commit that referenced this issue Jul 16, 2024
This set of instructions was only supported by AMD chips starting in
the K6-2 (introduced 1998), and before the "Bulldozer" family
(2011). They were never much used, as they were effectively superseded
by the more-widely-implemented SSE (first implemented on the AMD side
in Athlon XP in 2001).

This is being done as a predecessor towards general removal of MMX
register usage. Since there is almost no usage of the 3DNow!
intrinsics, and no modern hardware even implements them, simple
removal seems like the best option.

(Clang half originally uploaded in https://reviews.llvm.org/D94213)

Works towards issue #41665 and issue #98272.
jyknight added a commit that referenced this issue Jul 24, 2024
… of MMX. (#96540)

The MMX instruction set is legacy, and the SSE2 variants are in every
way superior, when they are available -- and they have been available
since the Pentium 4 was released, 20 years ago.

Therefore, we are switching the "MMX" intrinsics to depend on SSE2,
unconditionally. This change entirely drops the ability to generate
vectorized code using compiler intrinsics for chips with MMX but without
SSE2: the Intel Pentium MMX, Pentium, II, and Pentium III (released
1997-1999), as well as AMD K6 and K7 series chips of around the same
timeframe. Targeting these older CPUs remains supported -- simply
without the ability to use MMX compiler intrinsics.

Migrating away from the use of MMX registers also fixes a rather
non-obvious requirement. The long-standing programming model for these
MMX intrinsics requires that the programmer be aware of the x87/MMX
mode-switching semantics, and manually call `_mm_empty()` between using
any MMX instruction and any x87 FPU instruction. If you neglect to, then
every future x87 operation will return a NaN result. This requirement is
not at all obvious to users of these these intrinsic functions, and
causes very difficult to detect bugs.

Worse, even if the user did write code that correctly calls
`_mm_empty()` in the right places, LLVM may sometimes reorder x87 and
mmx operations around each-other, unaware of this mode switching issue.

Eliminating the use of MMX registers eliminates this problem.

This change also deletes the now-unnecessary MMX `__builtin_ia32_*`
functions from Clang. Only 3 MMX-related builtins remain in use --
`__builtin_ia32_emms`, used by `_mm_empty`, and
`__builtin_ia32_vec_{ext,set}_v4si`, used by `_mm_insert_pi16` and
`_mm_extract_pi16`. Note particularly that the latter two lower to
generic, non-MMX, IR. Support for the LLVM intrinsics underlying these
removed builtins still remains, for the moment.

The file `clang/www/builtins.py` has been updated with mappings from the
newly-removed `__builtin_ia32` functions to the still-supported
equivalents in `mmintrin.h`.

(Originally uploaded at https://reviews.llvm.org/D86855 and
https://reviews.llvm.org/D94252)

Fixes issue #41665
Works towards #98272
yuxuanchen1997 pushed a commit that referenced this issue Jul 25, 2024
This set of instructions was only supported by AMD chips starting in
the K6-2 (introduced 1998), and before the "Bulldozer" family
(2011). They were never much used, as they were effectively superseded
by the more-widely-implemented SSE (first implemented on the AMD side
in Athlon XP in 2001).

This is being done as a predecessor towards general removal of MMX
register usage. Since there is almost no usage of the 3DNow!
intrinsics, and no modern hardware even implements them, simple
removal seems like the best option.

(Clang half originally uploaded in https://reviews.llvm.org/D94213)

Works towards issue #41665 and issue #98272.
yuxuanchen1997 pushed a commit that referenced this issue Jul 25, 2024
… of MMX. (#96540)

Summary:
The MMX instruction set is legacy, and the SSE2 variants are in every
way superior, when they are available -- and they have been available
since the Pentium 4 was released, 20 years ago.

Therefore, we are switching the "MMX" intrinsics to depend on SSE2,
unconditionally. This change entirely drops the ability to generate
vectorized code using compiler intrinsics for chips with MMX but without
SSE2: the Intel Pentium MMX, Pentium, II, and Pentium III (released
1997-1999), as well as AMD K6 and K7 series chips of around the same
timeframe. Targeting these older CPUs remains supported -- simply
without the ability to use MMX compiler intrinsics.

Migrating away from the use of MMX registers also fixes a rather
non-obvious requirement. The long-standing programming model for these
MMX intrinsics requires that the programmer be aware of the x87/MMX
mode-switching semantics, and manually call `_mm_empty()` between using
any MMX instruction and any x87 FPU instruction. If you neglect to, then
every future x87 operation will return a NaN result. This requirement is
not at all obvious to users of these these intrinsic functions, and
causes very difficult to detect bugs.

Worse, even if the user did write code that correctly calls
`_mm_empty()` in the right places, LLVM may sometimes reorder x87 and
mmx operations around each-other, unaware of this mode switching issue.

Eliminating the use of MMX registers eliminates this problem.

This change also deletes the now-unnecessary MMX `__builtin_ia32_*`
functions from Clang. Only 3 MMX-related builtins remain in use --
`__builtin_ia32_emms`, used by `_mm_empty`, and
`__builtin_ia32_vec_{ext,set}_v4si`, used by `_mm_insert_pi16` and
`_mm_extract_pi16`. Note particularly that the latter two lower to
generic, non-MMX, IR. Support for the LLVM intrinsics underlying these
removed builtins still remains, for the moment.

The file `clang/www/builtins.py` has been updated with mappings from the
newly-removed `__builtin_ia32` functions to the still-supported
equivalents in `mmintrin.h`.

(Originally uploaded at https://reviews.llvm.org/D86855 and
https://reviews.llvm.org/D94252)

Fixes issue #41665
Works towards #98272

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60250580
@jyknight
Copy link
Member

Fixed via the above PRS; MMX intrinsics now use SSE2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

3 participants