Skip to content

[CIR][CodeGen] Support for _mm_prefetch #1560

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2,452 commits into
base: main
Choose a base branch
from

Conversation

shrikardongre
Copy link
Contributor

After the new rebase had some difficulties to hard rebase locally , my apologies for that .
this is how I think i can avoid the fall over to the clflush case in clang/lib/CIR/CodeGen/CIRGenBuiltinX86.cpp .

The generated CIR is

!s32i = !cir.int<s, 32>
!s8i = !cir.int<s, 8>
!void = !cir.void
#fn_attr = #cir<extra({inline = #cir.inline<no>, optnone = #cir.optnone, uwtable = #cir.uwtable<async>})>
module @"/home/shrikardongre/clangir/clang/test/CIR/CodeGen/X86/new.cc" attributes {cir.lang = #cir.lang<cxx>, cir.sob = #cir.signed_overflow_behavior<undefined>, cir.triple = "x86_64-unknown-linux-gnu", cir.type_size_info = #cir.type_size_info<char = 8, int = 32, size_t = 64>, cir.uwtable = #cir.uwtable<async>, dlti.dl_spec = #dlti.dl_spec<!llvm.ptr<270> = dense<32> : vector<4xi64>, i64 = dense<64> : vector<2xi64>, i128 = dense<128> : vector<2xi64>, !llvm.ptr<271> = dense<32> : vector<4xi64>, f80 = dense<128> : vector<2xi64>, f64 = dense<64> : vector<2xi64>, f128 = dense<128> : vector<2xi64>, !llvm.ptr<272> = dense<64> : vector<4xi64>, i16 = dense<16> : vector<2xi64>, i32 = dense<32> : vector<2xi64>, f16 = dense<16> : vector<2xi64>, !llvm.ptr = dense<64> : vector<4xi64>, i1 = dense<8> : vector<2xi64>, i8 = dense<8> : vector<2xi64>, "dlti.stack_alignment" = 128 : i64, "dlti.endianness" = "little", "dlti.mangling_mode" = "e">} {
  cir.func @main() -> !s32i extra(#fn_attr) {
    %0 = cir.alloca !s32i, !cir.ptr<!s32i>, ["__retval"] {alignment = 4 : i64} loc(#loc2)
    %1 = cir.alloca !cir.ptr<!s8i>, !cir.ptr<!cir.ptr<!s8i>>, ["p"] {alignment = 8 : i64} loc(#loc9)
    %2 = cir.load %1 : !cir.ptr<!cir.ptr<!s8i>>, !cir.ptr<!s8i> loc(#loc5)
    %3 = cir.const #cir.int<0> : !s32i loc(#loc6)
    %4 = cir.cast(bitcast, %2 : !cir.ptr<!s8i>), !cir.ptr<!void> loc(#loc7)
    %5 = cir.prefetch(%4 : !cir.ptr<!void>) locality(0) read loc(#loc7)
    %6 = cir.load %0 : !cir.ptr<!s32i>, !s32i loc(#loc2)
    cir.return %6 : !s32i loc(#loc2)
  } loc(#loc8)
} loc(#loc)
#loc = loc("/home/shrikardongre/clangir/clang/test/CIR/CodeGen/X86/new.cc":0:0)
#loc1 = loc("new.cc":1:1)
#loc2 = loc("new.cc":4:1)
#loc3 = loc("new.cc":2:5)
#loc4 = loc("new.cc":2:17)clang/lib/CIR/CodeGen/CIRGenBuiltinX86.cpp
#loc5 = loc("new.cc":3:18)
#loc6 = loc("new.cc":3:20)
#loc7 = loc("new.cc":3:5)
#loc8 = loc(fused[#loc1, #loc2])
#loc9 = loc(fused[#loc3, #loc4])

for the file ,

int main(){
    char const *p;
    _mm_prefetch(p,0);
}

however I had a doubt on wether
//CIR: {{%.*}}= cir.prefetch({{%.*}} : !cir.ptr<!void>) locality(0) read
in the test file is correct as if nothing is provided initially it should be write if I am not wrong, please giude
Thanks :) .

ghehg and others added 30 commits April 9, 2025 11:14
There are two sets of intrinsics regarding Min and Max operations for
floating points

[Maximum](https://mlir.llvm.org/docs/Dialects/LLVM/#llvmintrmaximum-llvmmaximumop)
vs
[Maxnum](https://mlir.llvm.org/docs/Dialects/LLVM/#llvmintrmaxnum-llvmmaxnumop)

[Minimum](https://mlir.llvm.org/docs/Dialects/LLVM/#llvmintrminimum-llvmminimumop)
vs
[Minnum](https://mlir.llvm.org/docs/Dialects/LLVM/#llvmintrminnum-llvmminnumop)

[The difference is whether NaN should be propagated when one of the
inputs is
NaN](https://llvm.org/docs/LangRef.html#llvm-maximumnum-intrinsic)
Maxnum and Minnum would return number if one of inputs is NaN, and the
other is a number,
But 
Maximum and Minimum would return NaN (propagation of NaN)

And they are resolved to different ASM such as
[FMAX](https://developer.arm.com/documentation/ddi0596/2021-03/SIMD-FP-Instructions/FMAX--vector---Floating-point-Maximum--vector--?lang=en)
vs
[FMAXNM](https://developer.arm.com/documentation/ddi0596/2021-03/SIMD-FP-Instructions/FMAXNM--vector---Floating-point-Maximum-Number--vector--?lang=en)

Both have user cases, we already implemented Maxnum and Minnum
But Maximum and Minimum has user cases in [neon intrinsic
](https://developer.arm.com/architectures/instruction-sets/intrinsics/vmax_f32
)
and [__builtin_elementwise_maximum
](https://github.com/llvm/clangir/blob/a989ecb2c55da1fe28e4072c31af025cba6c4f0f/clang/test/CodeGen/strictfp-elementwise-bulitins.cpp#L53)
…lvm#1235)

Use iterator to visit std::initializer_list field reduce the readability
…d neon_vaddvq_f64 (llvm#1238)

[Neon intrinsic
definition](https://developer.arm.com/architectures/instruction-sets/intrinsics/vaddv_f32).
They are vector across operation which LLVM doesn't currently have a
generic intrinsic about it. As a side note for brainstorm, it might be
worth in the future for CIR to introduce Vector Across type operations
even though LLVM dialect doesn't have it yet. This would help to expose
opt opportunities.
E.g. a very trivial constant fold can happen if we are adding across a
constant vector.
…#1239)

This implementation is different from OG in the sense we chose to use
CIR op which eventually lowers to generic LLVM intrinsics instead of
llvm.aarch64.neon intrinsics
But down to the ASM level, [they are identical
](https://godbolt.org/z/Gbbos9z6Y).
…vm#1242)

This patch follows
llvm#1220 (comment) by
augmenting `CIR_Type` with a new field, `tbaaName`. Specifically, it
enables TableGen support for the `-gen-cir-tbaa-name-lowering` option,
allowing for the generation of `getTBAAName` functions based on the
`tbaaName`. This enhancement enables us to replace the hardcoded TBAA
names in the `getTypeName` function with the newly generated
`getTBAAName`.
This PR adds a bitcast when we rewrite globals type. Previously we just
set a new type and it worked.
But recently I started to test ClangIR with CSmith in order to find some
run time bugs and faced with the next problem.

```
typedef struct {
    int x : 15;   
    uint8_t y;
} S;

S g = { -12, 254};

int main() {    
    printf("%d\n", g.y);
    return 0;
}

```
The output for this program is  ... 127 but not 254!
The reason is that first global var is created with the type of struct
`S`, then `get_member` operation is generated with index `1`
and then after, the type of the global is rewritten - I assume because
of the anon struct created on the right side in the initialization.
But the `get_member` operation still wants to access to the field at
index `1` and get a wrong byte.
If we change the `y` type to `int` we will fail on the verification
stage. But in the example above it's a run time error!

This is why I suggest to add a bitcast once we have to rewrite the
global type.
We figure it would be nice to have a common place with all our known
crashes that is tracked by git and is actively verified whether or not
we can now support the crashes by lit. It can act as our source of truth
for known failures and also being potential good first tasks for new
developers.

Add a simple test case of a known crash that involves copying a struct
in a catch.

Reviewers: smeenai, bcardosolopes

Reviewed By: bcardosolopes

Pull Request: llvm#1243
…#1247)

Basically that is - the return value for `=` operator for bitfield
assignment is wrong now. For example, the next function returns `7` for
3 bit bit field, though it should be `-1`:
```
int get_a(T *t) {
  return (t->a = 7);
}
```

This PR fix it. Actually, the bug was in the lowering - the integer cast
is applied in the wrong place (in comparison with the original codegen).
This PR changes changes the lowering of `cir.bool` to `i1` in both
DorectToLLVM and ThroughMLIR. This dramatically simplifies the lowering
logic of most operations and the lowered code itself as it naturally
uses `i1` for anything boolean.

The change involves separating between type lowering when scalars are
involved and when memory is involved. This is a pattern that was
inspired by clang's codegen which directly emits `i1` from the AST
without intermediate higher level representation like CIR has.

This also paves the way to more complex lowerings that are implemented
in clang codegen through the three primitives added here: `Convert Type
For Memory`, `Emit For Memory` and `Emit To Memory`. They are used in
clang for non-trivial types like bitints but also extensible vectors.
Close llvm#1185

The patch itself seems slightly ad-hoc. As the issue tracked by
llvm#1178, the fundamental solution
may be to introduce two type systems to solve the inconsistent semantics
for Union between LLVM IR and CIR. This will be great to handle other
inconsistent semantics between LLVM IR and CIR if any.

Back to the patch itself, although the code looks not good initially to
me too. But I feel it may be a good workaround since
clang/test/CIR/Lowering/union-array.c is an example for array of unions
directly and clang/test/CIR/Lowering/nested-union-array.c gives an
example for array of unions indirectly (array of structs which contain
unions). So I feel we've already covered all the cases.

And generally it should be good to use some simple and solid workaround
before we introduce the formal full solution.
…16` and `vdivh_f16` (llvm#1258)

Lowering:

- `vaddh_f16`
- `vsubh_f16`
- `vmulh_f16`
- `vdivh_f16`
…llvm#1265)

[Neon
definiton](https://developer.arm.com/architectures/instruction-sets/intrinsics/#f:@navigationhierarchiessimdisa=[Neon]&q=vmaxv_s8)
[OG
implementation](https://github.com/llvm/clangir/blob/04d7dcfb2582753f3eccbf01ec900d60297cbf4b/clang/lib/CodeGen/CGBuiltin.cpp#L13202)
Implementation in this PR is different from OG as 
1. avoided code duplication by extracting out the common pattern
2. avoided using i32 as return type of the intrinsic call, so eliminated
the need for casting result of the intrinsic call. This way of OG's
implementation is quite unnecessary IMHO, this is MAX, not ADD or MUL.
After all, using the expected type as return type of intrinsic call
produces [the same ASM code](https://godbolt.org/z/3nKG7fxPb).
I continue to use `csmith` and catch run time bags. Now it's time to fix
the layout for the const structs.

There is a divergence between const structs generated by CIR and the
original codegen. And this PR makes one more step to eliminate it. There
are cases where the extra padding is required - and here is a fix for
some of them. I did not write extra tests, since the fixes in the
existing already covers the code I added. The point is that now the
layout for all of these structs in the LLVM IR with and without CIR is
the same.
Class `CIRGenFunction` contained three identical functions that
converted from a Clang AST type (`clang::QualType`) to a ClangIR type
(`mlir::Type`): `convertType`, `ConvertType`, and `getCIRType`. This
embarrassment of duplication needed to be fixed, along with cleaning up
other functions that convert from Clang types to ClangIR types.

The three functions `CIRGenFunction::ConvertType`,
`CIRGenFunction::convertType`, and `CIRGenFunction::getCIRType` were
combined into a single function `CIRGenFunction::convertType`. Other
functions were renamed as follows:
- `CIRGenTypes::ConvertType` to `CIRGenTypes::convertType`
- `CIRGenTypes::ConvertFunctionTypeInternal` to
`CIRGenTypes::convertFunctionTypeInternal`
- `CIRGenModule::getCIRType` to `CIRGenModule::convertType`
- `ConstExprEmitter::ConvertType` to `ConstExprEmitter::convertType`
- `ScalarExprEmitter::ConvertType` to `ScalarExprEmitter::convertType`

Many cases of `getTypes().convertType(t)` and
`getTypes().convertTypeForMem(t)` were changed to just `convertType(t)`
and `convertTypeForMem(t)`, respectively, because the forwarding
functions in `CIRGenModule` and `CIRGenFunction` make the explicit call
to `getTypes()` unnecessary.
Reland previously reverted attempt now that this passes ASANified `ninja
check-clang-cir`.

Original message:
We are missing cleanups all around, more incremental progress towards fixing
that. This is supposed to be NFC intended, but we have to start changing some
bits in order to properly match cleanup bits in OG.

Start tagging places with more MissingFeatures to allow us to incrementally
improve the situation.
…#1262)

This patch adds support for the following GCC function attributes:

  - `__attribute__((const))`
  - `__attribute__((pure))`

The side effect information is attached to the call operations during
CIRGen. During LLVM lowering, these information is consumed to further
emit appropriate LLVM metadata on LLVM call instructions.
AdUhTkJm and others added 18 commits April 9, 2025 15:49
This also removes some unused `#include`s.

I choose to allow lowering to LLVM dialect when we're lowering CIR to
MLIR core dialects, because some operations don't have their
counterparts in these dialects (for example, `UnreachableOp ->
llvm.unreachable` and `LLVMIntrinsicCallOp -> cir.llvm.intr.xxx`). I
don't think we can delay them to the MLIR->LLVM pass as it seems we
assume all CIR operations have been lowered after CIR->MLIR conversion.

Co-authored-by: Yue Huang <[email protected]>
Update getZeroInitAttr to cover more FP types, Backport from
(llvm/llvm-project#133100)
Adds implementation for TanOp's lowering via ThroughMLIR.
Upstream commit changed behavior
llvm/llvm-project@ff585fe
I tested in original clang and it produced the same IR with clangir.
Related: llvm#1497
…nmvq_f32 (llvm#1546)

ower vminnmv_f32, vminnmvq_f64 and vminnmvq_f32
…llvm#1547)

When a pointer variable is initialized with a null pointer, the AST
contains a NullToPointer cast. If we just emit a null pointer of the
correct type, we will miss any side effects that occur in the
initializer. This change adds code to emit the initializer expression if
it is not a simple constant.

This results in an extra null pointer constant being emitted when the
expression has side effects, but the constant emitted while visiting the
expression does not have the correct type, so the alternative would be
to capture the emitted constant and bitcast it to the correct type. An
extra constant seems less intrusive than an unnecessary bitcast.
llvm#1552)

These entries exist in OG but are not present in CIR codegen.
Added:
- `cos`
- `floor`
- `round`
- `rint`
- `nearbyint`
- `sin`
- `sqrt`
- `tan`
- `trunc`
During review of a patch for upstreaming the cir.struct type support,
Erich Keane observed that the code we use for creating our type name for
structures with templates was likely to be error prone. He recommended
using QualType::print with the appropriate printing policy instead. This
change does that.

Erich also pointed out that RecordDecls always have a DeclContext so a
few other lines could be eliminated where that was checked.
A lot of the unary math op lowerings follow the same template -- we can
templatize this to remove redundant code and make things a little more
neater. (Similar to what we do
[here](https://github.com/llvm/clangir/blob/e4b8a48fb4d9a72a85e38f5439bcfb0673b4bea2/clang/lib/CIR/Lowering/ThroughMLIR/LowerCIRToMLIR.cpp#L502))

I've checked all existing LIT tests via `ninja clang-check-cir` and they
seem to be passing fine.
@shrikardongre
Copy link
Contributor Author

#1522 earlier conversation .

@bcardosolopes
Copy link
Member

Please fix the test failures

@shrikardongre
Copy link
Contributor Author

Hello , sorry for the delay in reply I would like to highlight the doubts and errors that I am facing :

  1. PrefetchOp does not return anything , so if I try to modify the CIROps.td like in this PR the existing implementation fails .

  2. I tried to imitate the existing implementation of _builtin_prefetch() in the file clang/lib/CIR/CodeGen/CIRGenBuiltin.cpp

image

Doubt : Why the Address hasnt been casted to a void* because ,
image
if we see my modified implementation above I did not cast the Address and ignoring the return part for now ,
and then i get the error :

image

Doubt whenever i try to change the value of second argument in the test file nothing really happens the locality changes as the value changes and the IsWrite remains on read only . So there must be a flaw in the logic that I am not able to identify .

Any help is appreciated . Trying my best to solve this with all the resources I have Thank you 😄

@bcardosolopes
Copy link
Member

my modified implementation above I did not cast the Address

Looks like you need to add a cast

So there must be a flaw in the logic that I am not able to identify

I suggest you run the compiler under a debugger, so you can inspect values and better try to understand what's going on

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.