forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 0
Prefer emitting indirect calls if possible #44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
frabert
wants to merge
2,610
commits into
main
Choose a base branch
from
frabert/emit-indirect-calls
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This resolve issue llvm#1442 .
Lower vcagtd_f64
Lower neon vaddlvq_s32
Lower neon vaddlv_u32
This patch introduces support for pointer TBAA, which can be enabled using the `-fpointer-tbaa` flag. By default, this feature is now enabled. To ensure test compatibility, the tests (`tbaa-enum.cpp`, `tbaa-enum.c`, and `tbaa-struct.cpp`) have been updated to include the `-fno-pointer-tbaa` flag. Related Pull Requests of OG: - llvm#76612 - llvm#116991
Based on https://github.com/llvm/clangir/blob/7f66a204c4ba1f674cfe0e16e2c9c6b65ca70bc8/clang/lib/Basic/Targets/NVPTX.h#L27, the current address space values are incorrect. This PR fixes these values.
Lower neon vcages_f32
This implements the missing feature `cir::setTargetAttributes`. Although other targets might also need attributes, this PR focuses on the CUDA-specific ones. For CUDA kernels (on device side, not stubs), they must have a calling convention of `ptx_kernel`. It is added here. CUDA kernels, as well as global variables, also involves lots of NVVM metadata, which is intended to be dealt with at the same place. It's marked with a new missing feature here.
Lower neon vmaxv_f32
This is part 2 of CUDA lowering. Still more to come! This PR generates `__cuda_register_globals` for functions only, without touching variables. It also fixes two discrepancies mentioned in Part 1, namely: - Now CIR will not generate registration code if there's nothing to register; - `__cuda_fatbin_wrapper` now becomes a constant.
This PR deals with several issues currently present in CUDA CodeGen. Each of them requires only a few lines to fix, so they're combined in a single PR. **Bug 1.** Suppose we write ```cpp __global__ void kernel(int a, int b); ``` Then when we call this kernel with `cudaLaunchKernel`, the 4th argument to that function is something of the form `void *kernel_args[2] = {&a, &b}`. OG allocates the space of it with `alloca ptr, i32 2`, but that doesn't seem to be feasible in CIR, so we allocated `alloca [2 x ptr], i32 1`. This means there must be an extra GEP as compared to OG. In CIR, it means we must add an `array_to_ptrdecay` cast before trying to accessing the array elements. I missed that out in llvm#1332 . **Bug 2.** We missed a load instruction for 6th argument to `cudaLaunchKernel`. It's added back in this PR. **Bug 3.** When we launch a kernel, we first retrieve the return value of `__cudaPopCallConfiguration`. If it's zero, then the call succeeds and we should proceed to call the device stub. In llvm#1348 we did exactly the opposite, calling the device stub only if it's not zero. It's fixed here. **Issue 4.** CallConvLowering is required to make `cudaLaunchKernel` correct. The codepath is unblocked by adding a `getIndirectResult` at the same place as OG does -- the function is already implemented so we can just call it. After this (and other pending PRs), CIR is now able to compile real CUDA programs. There are still missing features, which will be followed up later.
Lower neon vaddlv_s32
This is Part 3 of registration function generation. This generates `__cuda_module_dtor`. It cannot be placed in global dtors list, as treating it as a normal destructor will result in double-free in recent CUDA versions (see comments in OG). Rather, the function is passed as callback of `atexit`, which is called at the end of `__cuda_module_ctor`.
Traditional clang implementation: https://github.com/llvm/clangir/blob/a1ab6bf6cd3b83d0982c16f29e8c98958f69c024/clang/lib/CodeGen/CGBuiltin.cpp#L3618-L3632 The problem here is that `__builtin_clz` allows undefined result, while `__lzcnt` doesn't. As a result, I have to create a new CIR for `__lzcnt`. Since the return type of those two builtin differs, I decided to change return type of current `CIR_BitOp` to allow new `CIR_LzcntOp` to inherit from it. I would like to hear your suggestions. C.c. @Lancern
This PR adds support for compiling builtin variables like `threadIdx` down to the appropriate intrinsic. --------- Co-authored-by: Aidan Wong <[email protected]> Co-authored-by: anominos <[email protected]>
I have now fixed the test. Earlier I made some commits with other changes because we were testing something on my fork. This should be resolved now
CIR is currently ignoring the `signext` and `zeroext` for function arguments and return types produced by CallConvLowering. This PR lowers them to LLVM IR.
I realized I committed a new file with CRLF before. Really sorry about that >_< Related: llvm/clangir#1404
The choice of adding a separate file imitates that of OG.
There are some subtleties here. This is the code in OG: ```cpp // note: this is different from default ABI if (!RetTy->isScalarType()) return ABIArgInfo::getDirect(); ``` which says we should return structs directly. It's correct, has have the same behaviour as `nvcc`, and it obeys the PTX ABI as well. The comment dates back to 2013 (see [this commit](llvm@f9329ff) -- it didn't provide any explanation either), so I believe it's outdated. I didn't include this comment in the PR.
…lvm#1486) The pattern `call {{.*}} i32` mismatches `call i32` due to double spaces surrounding `{{.*}}`. This patch removes the first space to fix the failure.
…1487) This PR resolves an assertion failure in `CIRGenTypes::isFuncParamTypeConvertible`, which is involved when trying to emit a vtable entry to a virtual function whose type includes a pointer-to-member-function.
Lower neon vabsd_s64
…vm#135772) (#36) Add APIntParameter with custom implementation for comparison and use it in llvm.constant_range attribute. This is necessary because the default equality operator of APInt asserts when the bit widths of the compared APInts differ. The comparison is used by StorageUniquer when hashes of two ranges with different bit widths collide. Co-authored-by: Robert Konicar <[email protected]>
… circular dependency (#37) * [CIR-link] move link interface implementation into dialect to prevent circular dependency * expose interfaces * expose conlict resolution * [CIR] Initialize flag for target dialect (#38) --------- Co-authored-by: 2over12 <[email protected]> Co-authored-by: Andrew Pan <[email protected]>
This is a workaround for importing multiple modules into a single context. The downside of this solution is that it accepts invalid IRs where there is a redefinition of the named type, but solving that in the parser is not possible.
* [CIR-link] Refactor nest/updateState * Formatting
…ll (llvm#135895) (#40) LLVM IR currently [accepts](https://godbolt.org/z/nqnEsW1ja): ``` define void @incompatible_call_and_callee_types() { call void @callee(i64 0) ret void } define void @callee({ptr, i64}, i32) { ret void } ``` This currently fails to import. Even though these constructs are dangerous and probably indicate some ODR violation (or optimization bug), they are "valid" and should be imported into LLVM IR dialect. This PR implements that by using an indirect call to represent it. Translation already works nicely and outputs the same source llvm IR file. The error is now a warning, the tests in `mlir/test/Target/LLVMIR/Import/import-failure.ll` already use `CHECK` lines, so no need to add extra diagnostic tests. Co-authored-by: Bruno Cardoso Lopes <[email protected]>
…op linkage fix (#42) * [MLIR][LLVMIR] Add support for the full form of global_{ctor,dtor} (llvm#133176) Currently only ctor/dtor list and their priorities are supported. This PR adds support for the missing data field. Few implementation notes: - The assembly printer has a fixed form because previous `attr_dict` will sort the dict by key name, making global_dtor and global_ctor differ in the order of printed arguments. - LLVM's `ptr null` is being converted to `#llvm.zero` otherwise we'd have to create a region to use the default operation conversion from `ptr null`, which is silly given that the field only support null or a symbol. * [MLIR][LLVM] Add weak_odr to allowed linkage for alias (llvm#132840) I missed this when originally introduced the feature (note the verifier message already contains it), this fixes a small bug. --------- Co-authored-by: Bruno Cardoso Lopes <[email protected]>
…lvm#138986) (#43) `GlobalOp` was parsing `thread_local` after `unnamed_addr`, but printing in the reverse order. While here, make `AliasOp` match the same behavior and share common parts of global and alias printing. Co-authored-by: Bruno Cardoso Lopes <[email protected]>
EDIT: it was a misunderstanding on my part |
This is a (partial?) list of failing builds when passing -k 0 using Henrik's branch against instafix-llvm main:
This is the same list, but against my instafix-llvm branch:
These tests fail to build on main but succeed on my branch:
No packages fail to build on my branch but succeed on main AFAICT |
ekilmer
added a commit
to trail-of-forks/instafix-llvm-test-suite
that referenced
this pull request
Jun 6, 2025
trailofbits/instafix-llvm#44 is an attempted workaround for some ill-formed C programs that we still need to support building, but it will cause way more LLVM indirect calls to be generated, and calls to be generated with the wrong number of arguments against the original declaration Essentially, the issue is that some programs do ```c int foo() { ... } ``` Then somewhere else: ```c extern int foo(int bar); foo(0); ```
ekilmer
added a commit
to trail-of-forks/instafix-llvm-test-suite
that referenced
this pull request
Jun 6, 2025
trailofbits/instafix-llvm#44 is an attempted workaround for some ill-formed C programs that we still need to support building, but it will cause way more LLVM indirect calls to be generated, and calls to be generated with the wrong number of arguments against the original declaration Essentially, the issue is that some programs do ```c int foo() { ... } ``` Then somewhere else: ```c extern int foo(int bar); foo(0); ```
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a hack in order to make the
lencod
test compileNOTE(ekilmer, June 9): Let's hold off on merging this until we understand the consequences of how this affects static analysis results. We can skip the affected tests in the meantime.