Skip to content

Conversation

frabert
Copy link
Collaborator

@frabert frabert commented Jun 5, 2025

This is a hack in order to make the lencod test compile

NOTE(ekilmer, June 9): Let's hold off on merging this until we understand the consequences of how this affects static analysis results. We can skip the affected tests in the meantime.

FantasqueX and others added 30 commits April 9, 2025 14:53
This patch introduces support for pointer TBAA, which can be enabled
using the `-fpointer-tbaa` flag. By default, this feature is now
enabled.

To ensure test compatibility, the tests (`tbaa-enum.cpp`, `tbaa-enum.c`,
and `tbaa-struct.cpp`) have been updated to include the
`-fno-pointer-tbaa` flag.

Related Pull Requests of OG:
- llvm#76612
- llvm#116991
This implements the missing feature `cir::setTargetAttributes`.

Although other targets might also need attributes, this PR focuses on
the CUDA-specific ones. For CUDA kernels (on device side, not stubs),
they must have a calling convention of `ptx_kernel`. It is added here.

CUDA kernels, as well as global variables, also involves lots of NVVM
metadata, which is intended to be dealt with at the same place. It's
marked with a new missing feature here.
This PR implements \_\_constant\_\_ variables.

llvm#1438 only implements \_\_device\_\_ and \_\_shared\_\_ variables, 

~~This PR depends on llvm#1445~~
This is part 2 of CUDA lowering. Still more to come!

This PR generates `__cuda_register_globals` for functions only, without
touching variables.

It also fixes two discrepancies mentioned in Part 1, namely:
- Now CIR will not generate registration code if there's nothing to
register;
- `__cuda_fatbin_wrapper` now becomes a constant.
This PR deals with several issues currently present in CUDA CodeGen.
Each of them requires only a few lines to fix, so they're combined in a
single PR.

**Bug 1.**

Suppose we write
```cpp
__global__ void kernel(int a, int b);
```

Then when we call this kernel with `cudaLaunchKernel`, the 4th argument
to that function is something of the form `void *kernel_args[2] = {&a,
&b}`. OG allocates the space of it with `alloca ptr, i32 2`, but that
doesn't seem to be feasible in CIR, so we allocated `alloca [2 x ptr],
i32 1`. This means there must be an extra GEP as compared to OG.

In CIR, it means we must add an `array_to_ptrdecay` cast before trying
to accessing the array elements. I missed that out in llvm#1332 .

**Bug 2.**

We missed a load instruction for 6th argument to `cudaLaunchKernel`.
It's added back in this PR.

**Bug 3.** 

When we launch a kernel, we first retrieve the return value of
`__cudaPopCallConfiguration`. If it's zero, then the call succeeds and
we should proceed to call the device stub. In llvm#1348 we did exactly the
opposite, calling the device stub only if it's not zero. It's fixed
here.

**Issue 4.**

CallConvLowering is required to make `cudaLaunchKernel` correct. The
codepath is unblocked by adding a `getIndirectResult` at the same place
as OG does -- the function is already implemented so we can just call
it.


After this (and other pending PRs), CIR is now able to compile real CUDA
programs. There are still missing features, which will be followed up
later.
This is Part 3 of registration function generation.

This generates `__cuda_module_dtor`. It cannot be placed in global dtors
list, as treating it as a normal destructor will result in double-free
in recent CUDA versions (see comments in OG). Rather, the function is
passed as callback of `atexit`, which is called at the end of
`__cuda_module_ctor`.
Traditional clang implementation:
https://github.com/llvm/clangir/blob/a1ab6bf6cd3b83d0982c16f29e8c98958f69c024/clang/lib/CodeGen/CGBuiltin.cpp#L3618-L3632

The problem here is that `__builtin_clz` allows undefined result, while
`__lzcnt` doesn't. As a result, I have to create a new CIR for
`__lzcnt`. Since the return type of those two builtin differs, I decided
to change return type of current `CIR_BitOp` to allow new `CIR_LzcntOp`
to inherit from it.

I would like to hear your suggestions. C.c. @Lancern
This PR adds support for compiling builtin variables like `threadIdx`
down to the appropriate intrinsic.

---------

Co-authored-by: Aidan Wong <[email protected]>
Co-authored-by: anominos <[email protected]>
I have now fixed the test. Earlier I made some commits with other
changes because we were testing something on my fork. This should be
resolved now
CIR is currently ignoring the `signext` and `zeroext` for function
arguments and return types produced by CallConvLowering.

This PR lowers them to LLVM IR.
I realized I committed a new file with CRLF before. Really sorry about
that >_<

Related: llvm/clangir#1404
The choice of adding a separate file imitates that of OG.
This PR removes a useless argument `convertToInt` and removes hardcoded
`Sint32Type`.

I realized I committed a new file with CRLF before. Really sorry about
that >_<
There are some subtleties here.

This is the code in OG:
```cpp
// note: this is different from default ABI
if (!RetTy->isScalarType())
  return ABIArgInfo::getDirect();
```
which says we should return structs directly. It's correct, has have the
same behaviour as `nvcc`, and it obeys the PTX ABI as well.
The comment dates back to 2013 (see [this
commit](llvm@f9329ff)
-- it didn't provide any explanation either), so I believe it's
outdated. I didn't include this comment in the PR.
…lvm#1486)

The pattern `call {{.*}} i32` mismatches `call i32` due to double spaces
surrounding `{{.*}}`. This patch removes the first space to fix the
failure.
…1487)

This PR resolves an assertion failure in
`CIRGenTypes::isFuncParamTypeConvertible`, which is involved when trying
to emit a vtable entry to a virtual function whose type includes a
pointer-to-member-function.
Jezurko and others added 22 commits May 23, 2025 14:18
…vm#135772) (#36)

Add APIntParameter with custom implementation for comparison and use it
in llvm.constant_range attribute. This is necessary because the default
equality operator of APInt asserts when the bit widths of the compared
APInts differ. The comparison is used by StorageUniquer when hashes of
two ranges with different bit widths collide.

Co-authored-by: Robert Konicar <[email protected]>
… circular dependency (#37)

* [CIR-link] move link interface implementation into dialect to prevent circular dependency

* expose interfaces

* expose conlict resolution

* [CIR] Initialize flag for target dialect (#38)

---------

Co-authored-by: 2over12 <[email protected]>
Co-authored-by: Andrew Pan <[email protected]>
This is a workaround for importing multiple modules into a single
context. The downside of this solution is that it accepts invalid IRs
where there is a redefinition of the named type, but solving that in the
parser is not possible.
* [CIR-link] Refactor nest/updateState

* Formatting
…ll (llvm#135895) (#40)

LLVM IR currently [accepts](https://godbolt.org/z/nqnEsW1ja):
```
define void @incompatible_call_and_callee_types() {
  call void @callee(i64 0)
  ret void
}

define void @callee({ptr, i64}, i32) {
  ret void
}
```

This currently fails to import. Even though these constructs are
dangerous and probably indicate some ODR violation (or optimization
bug), they are "valid" and should be imported into LLVM IR dialect. This
PR implements that by using an indirect call to represent it.
Translation already works nicely and outputs the same source llvm IR
file.

The error is now a warning, the tests in
`mlir/test/Target/LLVMIR/Import/import-failure.ll` already use `CHECK`
lines, so no need to add extra diagnostic tests.

Co-authored-by: Bruno Cardoso Lopes <[email protected]>
…op linkage fix (#42)

* [MLIR][LLVMIR] Add support for the full form of global_{ctor,dtor} (llvm#133176)

Currently only ctor/dtor list and their priorities are supported. This
PR adds support for the missing data field.

Few implementation notes:
- The assembly printer has a fixed form because previous `attr_dict`
will sort the dict by key name, making global_dtor and global_ctor
differ in the order of printed arguments.
- LLVM's `ptr null` is being converted to `#llvm.zero` otherwise we'd
have to create a region to use the default operation conversion from
`ptr null`, which is silly given that the field only support null or a
symbol.

* [MLIR][LLVM] Add weak_odr to allowed linkage for alias (llvm#132840)

I missed this when originally introduced the feature (note the verifier
message already contains it), this fixes a small bug.

---------

Co-authored-by: Bruno Cardoso Lopes <[email protected]>
…lvm#138986) (#43)

`GlobalOp` was parsing `thread_local` after `unnamed_addr`, but printing in the reverse order.

While here, make `AliasOp` match the same behavior and share common parts of global and alias printing.

Co-authored-by: Bruno Cardoso Lopes <[email protected]>
@frabert frabert marked this pull request as ready for review June 5, 2025 12:09
@frabert frabert marked this pull request as draft June 5, 2025 12:23
@frabert
Copy link
Collaborator Author

frabert commented Jun 5, 2025

Arrrgh found some regressions

EDIT: it was a misunderstanding on my part

@frabert
Copy link
Collaborator Author

frabert commented Jun 5, 2025

This is a (partial?) list of failing builds when passing -k 0 using Henrik's branch against instafix-llvm main:

FAILED: MultiSource/Applications/ALAC/decode/alacconvert-decode
FAILED: MultiSource/Applications/ALAC/encode/alacconvert-encode
FAILED: MultiSource/Applications/hbd/hbd
FAILED: MultiSource/Applications/hexxagon/hexxagon
FAILED: MultiSource/Applications/JM/lencod/lencod
FAILED: MultiSource/Applications/kimwitu++/kc
FAILED: MultiSource/Applications/lambda-0.1.3/lambda
FAILED: MultiSource/Applications/minisat/minisat
FAILED: MultiSource/Applications/sgefa/sgefa
FAILED: MultiSource/Applications/spiff/spiff
FAILED: MultiSource/Applications/sqlite3/sqlite3
FAILED: MultiSource/Benchmarks/7zip/7zip-benchmark
FAILED: MultiSource/Benchmarks/Bullet/bullet
FAILED: MultiSource/Benchmarks/DOE-ProxyApps-C/RSBench/rsbench
FAILED: MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CLAMR
FAILED: MultiSource/Benchmarks/DOE-ProxyApps-C++/HACCKernels/HACCKernels
FAILED: MultiSource/Benchmarks/DOE-ProxyApps-C++/HPCCG/HPCCG
FAILED: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE
FAILED: MultiSource/Benchmarks/DOE-ProxyApps-C++/PENNANT/PENNANT
FAILED: MultiSource/Benchmarks/Fhourstones/fhourstones
FAILED: MultiSource/Benchmarks/MallocBench/cfrac/cfrac
FAILED: MultiSource/Benchmarks/MallocBench/espresso/espresso
FAILED: MultiSource/Benchmarks/MallocBench/gs/gs
FAILED: MultiSource/Benchmarks/McCat/08-main/main
FAILED: MultiSource/Benchmarks/McCat/09-vor/vor
FAILED: MultiSource/Benchmarks/mediabench/g721/g721encode/encode
FAILED: MultiSource/Benchmarks/MiBench/automotive-basicmath/automotive-basicmath
FAILED: MultiSource/Benchmarks/Olden/voronoi/voronoi
FAILED: MultiSource/Benchmarks/PAQ8p/paq8p
FAILED: MultiSource/Benchmarks/Prolangs-C++/city/city
FAILED: MultiSource/Benchmarks/Prolangs-C++/employ/employ
FAILED: MultiSource/Benchmarks/Prolangs-C++/life/life
FAILED: MultiSource/Benchmarks/Prolangs-C++/ocean/ocean
FAILED: MultiSource/Benchmarks/Prolangs-C++/primes/primes
FAILED: MultiSource/Benchmarks/Prolangs-C++/simul/simul
FAILED: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4
FAILED: tools/not

This is the same list, but against my instafix-llvm branch:

FAILED: MultiSource/Applications/ALAC/decode/alacconvert-decode
FAILED: MultiSource/Applications/ALAC/encode/alacconvert-encode
FAILED: MultiSource/Applications/hbd/hbd
FAILED: MultiSource/Applications/hexxagon/hexxagon
FAILED: MultiSource/Applications/kimwitu++/kc
FAILED: MultiSource/Applications/lambda-0.1.3/lambda
FAILED: MultiSource/Applications/minisat/minisat
FAILED: MultiSource/Applications/sqlite3/sqlite3
FAILED: MultiSource/Benchmarks/7zip/7zip-benchmark
FAILED: MultiSource/Benchmarks/Bullet/bullet
FAILED: MultiSource/Benchmarks/DOE-ProxyApps-C/RSBench/rsbench
FAILED: MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CLAMR
FAILED: MultiSource/Benchmarks/DOE-ProxyApps-C++/HACCKernels/HACCKernels
FAILED: MultiSource/Benchmarks/DOE-ProxyApps-C++/HPCCG/HPCCG
FAILED: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE
FAILED: MultiSource/Benchmarks/DOE-ProxyApps-C++/PENNANT/PENNANT
FAILED: MultiSource/Benchmarks/McCat/08-main/main
FAILED: MultiSource/Benchmarks/McCat/09-vor/vor
FAILED: MultiSource/Benchmarks/MiBench/automotive-basicmath/automotive-basicmath
FAILED: MultiSource/Benchmarks/Olden/voronoi/voronoi
FAILED: MultiSource/Benchmarks/PAQ8p/paq8p
FAILED: MultiSource/Benchmarks/Prolangs-C++/city/city
FAILED: MultiSource/Benchmarks/Prolangs-C++/employ/employ
FAILED: MultiSource/Benchmarks/Prolangs-C++/life/life
FAILED: MultiSource/Benchmarks/Prolangs-C++/ocean/ocean
FAILED: MultiSource/Benchmarks/Prolangs-C++/primes/primes
FAILED: MultiSource/Benchmarks/Prolangs-C++/simul/simul
FAILED: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4
FAILED: tools/not

These tests fail to build on main but succeed on my branch:

FAILED: MultiSource/Applications/JM/lencod/lencod
FAILED: MultiSource/Applications/sgefa/sgefa
FAILED: MultiSource/Applications/spiff/spiff
FAILED: MultiSource/Benchmarks/Fhourstones/fhourstones
FAILED: MultiSource/Benchmarks/MallocBench/cfrac/cfrac
FAILED: MultiSource/Benchmarks/MallocBench/espresso/espresso
FAILED: MultiSource/Benchmarks/MallocBench/gs/gs
FAILED: MultiSource/Benchmarks/mediabench/g721/g721encode/encode

No packages fail to build on my branch but succeed on main AFAICT

@frabert frabert marked this pull request as ready for review June 5, 2025 14:19
ekilmer added a commit to trail-of-forks/instafix-llvm-test-suite that referenced this pull request Jun 6, 2025
trailofbits/instafix-llvm#44 is an attempted workaround
for some ill-formed C programs that we still need to support building, but it
will cause way more LLVM indirect calls to be generated, and calls to be
generated with the wrong number of arguments against the original declaration

Essentially, the issue is that some programs do

```c
int foo() { ... }
```

Then somewhere else:

```c
extern int foo(int bar);

foo(0);
```
ekilmer added a commit to trail-of-forks/instafix-llvm-test-suite that referenced this pull request Jun 6, 2025
trailofbits/instafix-llvm#44 is an attempted workaround
for some ill-formed C programs that we still need to support building, but it
will cause way more LLVM indirect calls to be generated, and calls to be
generated with the wrong number of arguments against the original declaration

Essentially, the issue is that some programs do

```c
int foo() { ... }
```

Then somewhere else:

```c
extern int foo(int bar);

foo(0);
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.