Prefer emitting indirect calls if possible #44

frabert · 2025-06-05T12:07:24Z

This is a hack in order to make the lencod test compile

NOTE(ekilmer, June 9): Let's hold off on merging this until we understand the consequences of how this affects static analysis results. We can skip the affected tests in the meantime.

This resolve issue llvm#1442 .

Lower vcagtd_f64

Lower neon vaddlvq_s32

Lower neon vaddlv_u32

This patch introduces support for pointer TBAA, which can be enabled using the `-fpointer-tbaa` flag. By default, this feature is now enabled. To ensure test compatibility, the tests (`tbaa-enum.cpp`, `tbaa-enum.c`, and `tbaa-struct.cpp`) have been updated to include the `-fno-pointer-tbaa` flag. Related Pull Requests of OG: - llvm#76612 - llvm#116991

Similar PR: llvm/clangir#1434

Based on https://github.com/llvm/clangir/blob/7f66a204c4ba1f674cfe0e16e2c9c6b65ca70bc8/clang/lib/Basic/Targets/NVPTX.h#L27, the current address space values are incorrect. This PR fixes these values.

Lower neon vcages_f32

This implements the missing feature `cir::setTargetAttributes`. Although other targets might also need attributes, this PR focuses on the CUDA-specific ones. For CUDA kernels (on device side, not stubs), they must have a calling convention of `ptx_kernel`. It is added here. CUDA kernels, as well as global variables, also involves lots of NVVM metadata, which is intended to be dealt with at the same place. It's marked with a new missing feature here.

Lower neon vmaxv_f32

This PR implements \_\_constant\_\_ variables. llvm#1438 only implements \_\_device\_\_ and \_\_shared\_\_ variables, ~~This PR depends on llvm#1445~~

This is part 2 of CUDA lowering. Still more to come! This PR generates `__cuda_register_globals` for functions only, without touching variables. It also fixes two discrepancies mentioned in Part 1, namely: - Now CIR will not generate registration code if there's nothing to register; - `__cuda_fatbin_wrapper` now becomes a constant.

This PR deals with several issues currently present in CUDA CodeGen. Each of them requires only a few lines to fix, so they're combined in a single PR. **Bug 1.** Suppose we write ```cpp __global__ void kernel(int a, int b); ``` Then when we call this kernel with `cudaLaunchKernel`, the 4th argument to that function is something of the form `void *kernel_args[2] = {&a, &b}`. OG allocates the space of it with `alloca ptr, i32 2`, but that doesn't seem to be feasible in CIR, so we allocated `alloca [2 x ptr], i32 1`. This means there must be an extra GEP as compared to OG. In CIR, it means we must add an `array_to_ptrdecay` cast before trying to accessing the array elements. I missed that out in llvm#1332 . **Bug 2.** We missed a load instruction for 6th argument to `cudaLaunchKernel`. It's added back in this PR. **Bug 3.** When we launch a kernel, we first retrieve the return value of `__cudaPopCallConfiguration`. If it's zero, then the call succeeds and we should proceed to call the device stub. In llvm#1348 we did exactly the opposite, calling the device stub only if it's not zero. It's fixed here. **Issue 4.** CallConvLowering is required to make `cudaLaunchKernel` correct. The codepath is unblocked by adding a `getIndirectResult` at the same place as OG does -- the function is already implemented so we can just call it. After this (and other pending PRs), CIR is now able to compile real CUDA programs. There are still missing features, which will be followed up later.

Lower neon vaddlv_s32

This is Part 3 of registration function generation. This generates `__cuda_module_dtor`. It cannot be placed in global dtors list, as treating it as a normal destructor will result in double-free in recent CUDA versions (see comments in OG). Rather, the function is passed as callback of `atexit`, which is called at the end of `__cuda_module_ctor`.

@Lancern

Traditional clang implementation: https://github.com/llvm/clangir/blob/a1ab6bf6cd3b83d0982c16f29e8c98958f69c024/clang/lib/CodeGen/CGBuiltin.cpp#L3618-L3632 The problem here is that `__builtin_clz` allows undefined result, while `__lzcnt` doesn't. As a result, I have to create a new CIR for `__lzcnt`. Since the return type of those two builtin differs, I decided to change return type of current `CIR_BitOp` to allow new `CIR_LzcntOp` to inherit from it. I would like to hear your suggestions. C.c. @Lancern

This PR adds support for compiling builtin variables like `threadIdx` down to the appropriate intrinsic. --------- Co-authored-by: Aidan Wong <[email protected]> Co-authored-by: anominos <[email protected]>

I have now fixed the test. Earlier I made some commits with other changes because we were testing something on my fork. This should be resolved now

CIR is currently ignoring the `signext` and `zeroext` for function arguments and return types produced by CallConvLowering. This PR lowers them to LLVM IR.

I realized I committed a new file with CRLF before. Really sorry about that >_< Related: llvm/clangir#1404

The choice of adding a separate file imitates that of OG.

This PR removes a useless argument `convertToInt` and removes hardcoded `Sint32Type`. I realized I committed a new file with CRLF before. Really sorry about that >_<

There are some subtleties here. This is the code in OG: ```cpp // note: this is different from default ABI if (!RetTy->isScalarType()) return ABIArgInfo::getDirect(); ``` which says we should return structs directly. It's correct, has have the same behaviour as `nvcc`, and it obeys the PTX ABI as well. The comment dates back to 2013 (see [this commit](llvm@f9329ff) -- it didn't provide any explanation either), so I believe it's outdated. I didn't include this comment in the PR.

…lvm#1486) The pattern `call {{.*}} i32` mismatches `call i32` due to double spaces surrounding `{{.*}}`. This patch removes the first space to fix the failure.

…1487) This PR resolves an assertion failure in `CIRGenTypes::isFuncParamTypeConvertible`, which is involved when trying to emit a vtable entry to a virtual function whose type includes a pointer-to-member-function.

Lower neon vabsd_s64

…vm#135772) (#36) Add APIntParameter with custom implementation for comparison and use it in llvm.constant_range attribute. This is necessary because the default equality operator of APInt asserts when the bit widths of the compared APInts differ. The comparison is used by StorageUniquer when hashes of two ranges with different bit widths collide. Co-authored-by: Robert Konicar <[email protected]>

… circular dependency (#37) * [CIR-link] move link interface implementation into dialect to prevent circular dependency * expose interfaces * expose conlict resolution * [CIR] Initialize flag for target dialect (#38) --------- Co-authored-by: 2over12 <[email protected]> Co-authored-by: Andrew Pan <[email protected]>

This is a workaround for importing multiple modules into a single context. The downside of this solution is that it accepts invalid IRs where there is a redefinition of the named type, but solving that in the parser is not possible.

* [CIR-link] Refactor nest/updateState * Formatting

…39)

@callee

…ll (llvm#135895) (#40) LLVM IR currently [accepts](https://godbolt.org/z/nqnEsW1ja): ``` define void @incompatible_call_and_callee_types() { call void @callee(i64 0) ret void } define void @callee({ptr, i64}, i32) { ret void } ``` This currently fails to import. Even though these constructs are dangerous and probably indicate some ODR violation (or optimization bug), they are "valid" and should be imported into LLVM IR dialect. This PR implements that by using an indirect call to represent it. Translation already works nicely and outputs the same source llvm IR file. The error is now a warning, the tests in `mlir/test/Target/LLVMIR/Import/import-failure.ll` already use `CHECK` lines, so no need to add extra diagnostic tests. Co-authored-by: Bruno Cardoso Lopes <[email protected]>

…op linkage fix (#42) * [MLIR][LLVMIR] Add support for the full form of global_{ctor,dtor} (llvm#133176) Currently only ctor/dtor list and their priorities are supported. This PR adds support for the missing data field. Few implementation notes: - The assembly printer has a fixed form because previous `attr_dict` will sort the dict by key name, making global_dtor and global_ctor differ in the order of printed arguments. - LLVM's `ptr null` is being converted to `#llvm.zero` otherwise we'd have to create a region to use the default operation conversion from `ptr null`, which is silly given that the field only support null or a symbol. * [MLIR][LLVM] Add weak_odr to allowed linkage for alias (llvm#132840) I missed this when originally introduced the feature (note the verifier message already contains it), this fixes a small bug. --------- Co-authored-by: Bruno Cardoso Lopes <[email protected]>

…lvm#138986) (#43) `GlobalOp` was parsing `thread_local` after `unnamed_addr`, but printing in the reverse order. While here, make `AliasOp` match the same behavior and share common parts of global and alias printing. Co-authored-by: Bruno Cardoso Lopes <[email protected]>

frabert · 2025-06-05T12:23:31Z

~~Arrrgh found some regressions~~

EDIT: it was a misunderstanding on my part

frabert · 2025-06-05T13:46:55Z

This is a (partial?) list of failing builds when passing -k 0 using Henrik's branch against instafix-llvm main:

FAILED: MultiSource/Applications/ALAC/decode/alacconvert-decode
FAILED: MultiSource/Applications/ALAC/encode/alacconvert-encode
FAILED: MultiSource/Applications/hbd/hbd
FAILED: MultiSource/Applications/hexxagon/hexxagon
FAILED: MultiSource/Applications/JM/lencod/lencod
FAILED: MultiSource/Applications/kimwitu++/kc
FAILED: MultiSource/Applications/lambda-0.1.3/lambda
FAILED: MultiSource/Applications/minisat/minisat
FAILED: MultiSource/Applications/sgefa/sgefa
FAILED: MultiSource/Applications/spiff/spiff
FAILED: MultiSource/Applications/sqlite3/sqlite3
FAILED: MultiSource/Benchmarks/7zip/7zip-benchmark
FAILED: MultiSource/Benchmarks/Bullet/bullet
FAILED: MultiSource/Benchmarks/DOE-ProxyApps-C/RSBench/rsbench
FAILED: MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CLAMR
FAILED: MultiSource/Benchmarks/DOE-ProxyApps-C++/HACCKernels/HACCKernels
FAILED: MultiSource/Benchmarks/DOE-ProxyApps-C++/HPCCG/HPCCG
FAILED: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE
FAILED: MultiSource/Benchmarks/DOE-ProxyApps-C++/PENNANT/PENNANT
FAILED: MultiSource/Benchmarks/Fhourstones/fhourstones
FAILED: MultiSource/Benchmarks/MallocBench/cfrac/cfrac
FAILED: MultiSource/Benchmarks/MallocBench/espresso/espresso
FAILED: MultiSource/Benchmarks/MallocBench/gs/gs
FAILED: MultiSource/Benchmarks/McCat/08-main/main
FAILED: MultiSource/Benchmarks/McCat/09-vor/vor
FAILED: MultiSource/Benchmarks/mediabench/g721/g721encode/encode
FAILED: MultiSource/Benchmarks/MiBench/automotive-basicmath/automotive-basicmath
FAILED: MultiSource/Benchmarks/Olden/voronoi/voronoi
FAILED: MultiSource/Benchmarks/PAQ8p/paq8p
FAILED: MultiSource/Benchmarks/Prolangs-C++/city/city
FAILED: MultiSource/Benchmarks/Prolangs-C++/employ/employ
FAILED: MultiSource/Benchmarks/Prolangs-C++/life/life
FAILED: MultiSource/Benchmarks/Prolangs-C++/ocean/ocean
FAILED: MultiSource/Benchmarks/Prolangs-C++/primes/primes
FAILED: MultiSource/Benchmarks/Prolangs-C++/simul/simul
FAILED: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4
FAILED: tools/not

This is the same list, but against my instafix-llvm branch:

FAILED: MultiSource/Applications/ALAC/decode/alacconvert-decode
FAILED: MultiSource/Applications/ALAC/encode/alacconvert-encode
FAILED: MultiSource/Applications/hbd/hbd
FAILED: MultiSource/Applications/hexxagon/hexxagon
FAILED: MultiSource/Applications/kimwitu++/kc
FAILED: MultiSource/Applications/lambda-0.1.3/lambda
FAILED: MultiSource/Applications/minisat/minisat
FAILED: MultiSource/Applications/sqlite3/sqlite3
FAILED: MultiSource/Benchmarks/7zip/7zip-benchmark
FAILED: MultiSource/Benchmarks/Bullet/bullet
FAILED: MultiSource/Benchmarks/DOE-ProxyApps-C/RSBench/rsbench
FAILED: MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CLAMR
FAILED: MultiSource/Benchmarks/DOE-ProxyApps-C++/HACCKernels/HACCKernels
FAILED: MultiSource/Benchmarks/DOE-ProxyApps-C++/HPCCG/HPCCG
FAILED: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE
FAILED: MultiSource/Benchmarks/DOE-ProxyApps-C++/PENNANT/PENNANT
FAILED: MultiSource/Benchmarks/McCat/08-main/main
FAILED: MultiSource/Benchmarks/McCat/09-vor/vor
FAILED: MultiSource/Benchmarks/MiBench/automotive-basicmath/automotive-basicmath
FAILED: MultiSource/Benchmarks/Olden/voronoi/voronoi
FAILED: MultiSource/Benchmarks/PAQ8p/paq8p
FAILED: MultiSource/Benchmarks/Prolangs-C++/city/city
FAILED: MultiSource/Benchmarks/Prolangs-C++/employ/employ
FAILED: MultiSource/Benchmarks/Prolangs-C++/life/life
FAILED: MultiSource/Benchmarks/Prolangs-C++/ocean/ocean
FAILED: MultiSource/Benchmarks/Prolangs-C++/primes/primes
FAILED: MultiSource/Benchmarks/Prolangs-C++/simul/simul
FAILED: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4
FAILED: tools/not

These tests fail to build on main but succeed on my branch:

FAILED: MultiSource/Applications/JM/lencod/lencod
FAILED: MultiSource/Applications/sgefa/sgefa
FAILED: MultiSource/Applications/spiff/spiff
FAILED: MultiSource/Benchmarks/Fhourstones/fhourstones
FAILED: MultiSource/Benchmarks/MallocBench/cfrac/cfrac
FAILED: MultiSource/Benchmarks/MallocBench/espresso/espresso
FAILED: MultiSource/Benchmarks/MallocBench/gs/gs
FAILED: MultiSource/Benchmarks/mediabench/g721/g721encode/encode

No packages fail to build on my branch but succeed on main AFAICT

trailofbits/instafix-llvm#44 is an attempted workaround for some ill-formed C programs that we still need to support building, but it will cause way more LLVM indirect calls to be generated, and calls to be generated with the wrong number of arguments against the original declaration Essentially, the issue is that some programs do ```c int foo() { ... } ``` Then somewhere else: ```c extern int foo(int bar); foo(0); ```

FantasqueX and others added 30 commits April 9, 2025 14:53

[CIR][CIRGen][Neon] Make vrndns emit RoundEvenOp directly (llvm#1434)

0aa4f77

[CIR][CIRGen][TBAA] Add support for BitInt (llvm#1443)

1093db8

[CIR] Fix attributes lowering for GlobalOp (llvm#1447)

da94019

This resolve issue llvm#1442 .

[CIR][CIRGen][Builtin][Neon] Lower vcagtd_f64 (llvm#1448)

887c6d1

Lower vcagtd_f64

[CIR][CIRGen][Builtin][Neon] Lower neon vaddlvq_s32 (llvm#1450)

d9f3e99

Lower neon vaddlvq_s32

[CIR][CIRGen][Builtin][Neon] Lower neon vaddlv_u32 (llvm#1451)

74b72f2

Lower neon vaddlv_u32

[CIR][CIRGen][Neon] Make vrnda emit RoundOp directly (llvm#1453)

7e7a821

Similar PR: llvm/clangir#1434

[CIR][CUDA] Support for built-in CUDA surface type (llvm#1455)

01c27e9

[CIR][CUDA] Fix address space values for NVPTX (llvm#1445)

01a1748

Based on https://github.com/llvm/clangir/blob/7f66a204c4ba1f674cfe0e16e2c9c6b65ca70bc8/clang/lib/Basic/Targets/NVPTX.h#L27, the current address space values are incorrect. This PR fixes these values.

[CIR][CIRGen][Builtin][Neon] Lower neon vcages_f32 (llvm#1449)

f26fff3

Lower neon vcages_f32

[CIR][CIRGen][Builtin][Neon] Lower neon vmaxv_f32 (llvm#1460)

5f03b07

Lower neon vmaxv_f32

[CIR][CUDA] implement cuda constant variables (llvm#1444)

cadb738

This PR implements \_\_constant\_\_ variables. llvm#1438 only implements \_\_device\_\_ and \_\_shared\_\_ variables, ~~This PR depends on llvm#1445~~

[CIR][CIRGen][Builtin][Neon] Lower neon vaddlv_s32 (llvm#1464)

49383ee

Lower neon vaddlv_s32

[CIR][CUDA] Support builtin CUDA variables (llvm#1458)

fe2206a

This PR adds support for compiling builtin variables like `threadIdx` down to the appropriate intrinsic. --------- Co-authored-by: Aidan Wong <[email protected]> Co-authored-by: anominos <[email protected]>

[CIR][CUDA] Support for inbuilt texture types (llvm#1469)

bf52b48

I have now fixed the test. Earlier I made some commits with other changes because we were testing something on my fork. This should be resolved now

[CIR] Lower signext and zeroext attributes (llvm#1473)

9b0b837

CIR is currently ignoring the `signext` and `zeroext` for function arguments and return types produced by CallConvLowering. This PR lowers them to LLVM IR.

[CIR][CIRGen][TBAA] Add support for vtable pointer (llvm#1463)

f896192

[CIR][CIRGen][builtin][X86] handle _mm_lfence (llvm#1474)

6e31c6e

I realized I committed a new file with CRLF before. Really sorry about that >_< Related: llvm/clangir#1404

[CIR][CUDA] Support device-side printf (llvm#1475)

0aef9ed

The choice of adding a separate file imitates that of OG.

[CIR][CIRGen][builtin] handle __popcnt (llvm#1479)

867d736

This PR removes a useless argument `convertToInt` and removes hardcoded `Sint32Type`. I realized I committed a new file with CRLF before. Really sorry about that >_<

[CIR][NFC] Fix test failures caused by double spaces in check line (l…

9e8806a

…lvm#1486) The pattern `call {{.*}} i32` mismatches `call i32` due to double spaces surrounding `{{.*}}`. This patch removes the first space to fix the failure.

[CIR][CIRGen][Builtin][Neon] Lower neon vabsd_s64 (llvm#1489)

bce7507

Lower neon vabsd_s64

Jezurko and others added 22 commits May 23, 2025 14:18

[MLIR][mlir-link] Import tests from llvm-link.

150b445

[MLIR][mlir-link] Implement visibility resolution

2ee602b

[MLIR][mlir-link] Enable visibility tests

c1aa300

[CIR][cir-link] Setup CIRLinkerInterface

6f26790

[CIR][cir-link] Setup cir-link tool

fa66586

[MLIR][mlir-link] Refactor out SymbolAttrLinkerInterface

9087ead

[MLIR][mlir-link] Extract common llvm linker interface

32f426d

[CIR][cir-link] Implement CIRSymbolLinkerInterface

23c5102

[CIR][cir-link] Add simple cir function linker test

e4199ab

[MLIR][mlir-link] Add adapted tests for unnamed_addr

eca2073

[MLIR][mlir-link] Implement unnamed_addr support

0e7fc25

[CIR][cir-link] Add stubs for UnnamedAddr

3b4a775

[CIR][cir-link] Add missing MLIRLinker dependency to cmake

b6a8051

[CIR-link] Refactor nest/updateState (#35)

535025d

* [CIR-link] Refactor nest/updateState * Formatting

[MLIR][mlir-link] Properly resolve LLVM dialect bitwidth for linkage (#…

48f3f84

…39)

[LLVMIR][HACK] Prefer emitting indirect calls if possible

cea283a

frabert marked this pull request as ready for review June 5, 2025 12:09

frabert marked this pull request as draft June 5, 2025 12:23

frabert marked this pull request as ready for review June 5, 2025 14:19

frabert force-pushed the main branch from fe615c5 to cae0585 Compare September 8, 2025 16:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prefer emitting indirect calls if possible #44

Prefer emitting indirect calls if possible #44

Uh oh!

frabert commented Jun 5, 2025 •

edited by ekilmer

Loading

Uh oh!

frabert commented Jun 5, 2025 •

edited

Loading

Uh oh!

frabert commented Jun 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

Prefer emitting indirect calls if possible #44

Are you sure you want to change the base?

Prefer emitting indirect calls if possible #44

Uh oh!

Conversation

frabert commented Jun 5, 2025 • edited by ekilmer Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

frabert commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

frabert commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

frabert commented Jun 5, 2025 •

edited by ekilmer

Loading

frabert commented Jun 5, 2025 •

edited

Loading

frabert commented Jun 5, 2025 •

edited

Loading