[pull] main from llvm:main #5641

pull · 2025-10-02T13:14:30Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

…GISel.inc. NFC (#162720)

…ynthesis (#162576) Don't emit a warning when an Objective-C property is defined using copy or strong semantics.

…te pattern. (#162700) Addresses my comment here #162036 (comment)

When --use-old-text fails, we are emitting all code meant for the original `.text` section into the new section. This could be more bytes compared to those emitted under no `--use-old-text`, especially under `--lite`. As a result, `--use-old-text` results in a larger binary, not smaller which could be confusing to the user. Add more information to the warning, including recommendation to rebuild without `--use-old-text` for smaller binary size.

…-around-statements check (#162698) The check 'readability-braces-around-statements' do offer fixes!

Observed in GCC-produced binary. Emit a warning for the user. Test Plan: added bolt/test/X86/fragment-alias.s

…is unreachable (#162677) Fixes #162585. #161000 changed `br i1 true, label %if, label %else` to `br label %if`, so we should remove one more incoming value.

Removed all the caching maps (BB, Inst) in `Embedder` as we don't want to cache embeddings in general. Our earlier experiments on Symbolic embeddings show recomputation of embeddings is cheaper than cache lookups. OTOH, Flow-Aware embeddings would benefit from instruction level caching, as computing the embedding for an instruction would depend on the embeddings of other instructions in a function. So, retained instruction embedding caching logic only for Flow-Aware computation. This also necessitates an `invalidate` method that would clean up the cache when the embeddings would become invalid due to transformations.

…#162526) For a pattern like this: Pat<(MyOp $x, $x), (...), [(MyCheck $x)]>; The old implementation generates: Pat<(MyOp $x0, $x1), (...), [(MyCheck $x0), ($x0 == $x1)]>; This is not very straightforward, because the $x name appears in the source pattern; it's attempting to assume equality check will be performed as part of the source pattern matching. This commit moves the equality checks before the other constraints, i.e.: Pat<(MyOp $x0, $x1), (...), [($x0 == $x1), (MyCheck $x0)]>;

…r-like type (#162030) Verify that the operation passed to resize_and_overwrite returns an integer-like type, matching the behavior of other standard library implementations like GCC's libstdc++ Fixes #160577

Based on review feedback in #160026. This makes the substitution a lot more clear now that there is no documentation around %T. --------- Co-authored-by: Louis Dionne <[email protected]>

This patch introduces some missing s.barrier instructions in the ROCDL dialect handling named barriers Specifically: ``` @llvm.amdgcn.s.barrier.init - s_barrier_init @llvm.amdgcn.s.barrier.join - s_barrier_join @llvm.amdgcn.s.barrier.leave - s_barrier_leave @llvm.amdgcn.s.barrier.signal.isfirst - s_barrier_signal_isfirst @llvm.amdgcn.s.get.barrier.state - s_get_barrier_state ```

In gcc11 we're getting an error with a `using Req = Req` statement. This changes the name of the types in JSONTransportTest from `Req` > `Request`, `Evt` > `Event`, `Resp` > `Response`.

…ance (#162399) sifive-x390 and sifive-x280 both share the SiFIve7 scheduling model, yet the former has a limited FP64 vector performance. Right now we account for it by instantiating two separate scheduling models (throttled v.s. non-throttled) from the base SiFive7 model. However, this approach (which is also used in other performance features like fast vrgather in SiFive7) does not scale if we add more of these performance features in the future -- the number of scheduling models will simply become unmanageable. The new solution I've been working on is to let a _single_ scheduling model be configured by subtarget features on performance features like these, such that we no longer need to create those derived models. This patch creates the subtarget feature that'll ultimately replace the `isFP64Throttled` knob in SiFive7 scheduling model mentioned earlier. There will be a follow-up patch to integrate this into the scheduling model.

In "Debugging C++ Coroutines", we provide a gdb script to aid with debugging C++ coroutines in gdb. This commit updates said script to make it easier to use and more robust. The commit contains the following user-facing changes: * `show-coro-frame` was replaced by a pretty-printer for `std::coroutine_handle`. This is much easier to use than a custom command since it works out-of-the-box with `p` and in my IDE's variable view (tested using VS-Code) * the new `get_coro_{frame,promise}` functions can be called from expressions to access nested members. Example: `p get_coro_promise(fib.coro_hdl)->current_state` * `async-bt` was replaced by a frame filter. This way, the builtin `bt` command directly shows all the async coroutine frames. Under the covers, the script became more robust: * For devirtualization, we now look up the `__coro_frame` variable in the resume function instead of relying on the `.coro_frame_ty` naming convention. Thereby, devirtualization works slightly better also on gcc-compiled binaries (however, there is still more work to be done). * We use the LLVM-generated `__coro_resume_<N>` labels to get the exact line at which a coroutine was suspended. * The continuation handle is now looked up by name instead of via dereferencing a calculated pointer. Thereby, the script should be simpler to adjust for various coroutine libraries without requiring pointer arithmetic hacks. Other sections of the documentation were adjusted accordingly to reflect the newly added features of the gdb script.

…h subtarget feature (#162400) This patch teaches the SiFive7 scheduling model to configure / toggle the throttled FP64 vector feature with subtarget feature rather than hard-coded TableGen parameter, which inevitably forces us to instantiate a new scheduling model for every performance features like this.

This test, with a corefile created via yaml2macho-core plus an ObjectFileJSON binary with symbol addresses and ranges, was failing on some machines/CI because the wrong ABI was being picked. The bytes of the functions were not included in the yaml or .json binary. The unwind falls back to using the ABI plugin default unwind plans. We have two armv7 ABIs - the Darwin ABI that always uses r7 as the frame pointer, and the AAPCS ABI which uses r11 code. In reality, armv7 code uses r11 in arm mode, r7 in thumb code. But the ABI ArchDefaultUnwindPlan doesn't have any access to the Target's ArchSpec or Process register state, to determine the correct processor state (arm or thumb). And in fact, on Cortex-M targets, the instructions are always thumb, so the arch default unwind plan (hardcoded r11) is always wrong. The corefile doesn't specify a vendor/os, only a cpu. The object file json specifies the armv7m-apple-* triple, which will select the correct ABI plugin, and the test runs. In some cases, it looks like the Process ABI was fetched after opening the corefile, but before the binary.json was loaded and corrected the Target's ArchSpec. And we never re-evaluate the ABI once it is set, in a Process. When we picked the AAPCS armv7 ABI, we would try to use r11 as frame pointer, and the unwind would stop after one stack frame. I'm stepping around this problem by (1) adding the register bytes of the prologues of every test function in the backtrace, and (2) shortening the function ranges (in binary.json) to specify that the functions are all just long enough for the prologue where execution is stopped. The instruction emulation plugin will fail if it can't get all of the bytes from the function instructions, so I hacked the function sizes in the .json to cover the prologue plus one and changed the addresses in the backtrace to fit within those ranges. [ updated this commit to keep the @skipIfRemote on the API test because two remote CI bots are failing for reasons I don't quite see. ]

Releases of Ubuntu that do not support the GNU hash style are long unsupported.

This patch adds support for fneg/fabs operations. For other bit manipulation operations (select/copysign), we don't need new APIs.

Avoids exposing the implementation detail of uintptr_t to the constructor. This is a replacement of b738f63 which avoids needing tablegen to know the underlying storage type.

Make sure to apply the option+number of register logic from the selection pattrn.

These patterns are for setcc with scalar result type and vector operands or shifts with vector result and scalar shift amount.

The shift amount may have a different scalar size than the result, but they should have the same number of elements or they should both be scalar.

…161809) Check RegisterClassInfo if any registers of the new class are actually available for use. Currently AMDGPU overrides shouldCoalesce to avoid this situation. The target hook does not have access to the dynamic register class counts, but ideally the target hook would only be used for profitability concerns. The new test doesn't change, due to the AMDGPU shouldCoalesce override, but would be unallocatable if we dropped the override and switched to the default implementation. The existing limit-coalesce.mir already tests the behavior of this override, but it's too conservative and isn't checking the case where the new class is unallocatable. Add this check so it can be relaxed.

…e equivalences (#162736)

…162714) This renames some attribute list related functions, to make callers think about whether they want to append or prepend to the list, instead of defaulting to prepending which is often not the desired behaviour (for the cases where it matters, sometimes we're just adding to an empty list). Then it adjusts some of these calls to append where they were previously prepending. This has the effect of making `err_attributes_are_not_compatible` consistent in emitting diagnostics as `<new-attr> and <existing-attr> are not compatible`, regardless of the syntax used to apply the attributes.

…tions If the non-commutative user has several same operands and at least one of them (but not the first) is copyable, need to consider this opportunity when calculating the number of dependencies. Otherwise, the schedule bundle might be not scheduled correctly and cause a compiler crash Fixes #162925

…#163017) Follow-up as suggested in #162617. Just use an APInt for DividesBy, as the existing code already operates on APInt and thus handles the case of DividesBy being 1. PR: #163017

Corrects the spelling of 'IsGlobaLinkage' to 'IsGlobalLinkage' in XCOFF-related code, comments, and tests across the codebase.

Currently we cannot vectorize loops with latch blocks terminated by a switch. In the future this could be handled by materializing appropriate compares. Fixes #156894.

… is PHI Need to insert the vector value for the postponed gather/buildvector node after all uses non only if the vector value of the user node is phi, but also if the user node itself is PHI node, which may produce vector phi + shuffle. Fixes #162799

Generally G_UADDE, G_UADDO, G_USUBE, G_USUBO are used together and it was enough to simply define EFLAGS. But if extractvalue is used, we end up with a copy of EFLAGS into GPR. Always generate SETB instruction to put the carry bit on GPR and CMP to set the carry bit back. It gives the correct lowering in all the cases. Closes #120029

Otherwise debug-info is stripped, which influences the language of the current frame. Also, set explicit breakpoint because Windows seems to not obey the debugtrap. Log from failing test on Windows: ``` (lldb) command source -s 0 'lit-lldb-init-quiet' Executing commands in 'D:\test\lit-lldb-init-quiet'. (lldb) command source -C --silent-run true lit-lldb-init (lldb) target create "main.out" Current executable set to 'D:\test\main.out' (x86_64). (lldb) settings set interpreter.stop-command-source-on-error false (lldb) command source -s 0 'with-target.input' Executing commands in 'D:\test\with-target.input'. (lldb) expr blah ^ error: use of undeclared identifier 'blah' note: Falling back to default language. Ran expression as 'Objective C++'. (lldb) run Process 29404 launched: 'D:\test\main.out' (x86_64) Process 29404 stopped * thread #1, stop reason = Exception 0x80000003 encountered at address 0x7ff7b3df7189 frame #0: 0x00007ff7b3df718a main.out -> 0x7ff7b3df718a: xorl %eax, %eax 0x7ff7b3df718c: popq %rcx 0x7ff7b3df718d: retq 0x7ff7b3df718e: int3 (lldb) expr blah ^ error: use of undeclared identifier 'blah' note: Falling back to default language. Ran expression as 'Objective C++'. (lldb) expr -l objc -- blah ^ error: use of undeclared identifier 'blah' note: Expression evaluation in pure Objective-C not supported. Ran expression as 'Objective C++'. (lldb) expr -l c -- blah ^ error: use of undeclared identifier 'blah' note: Expression evaluation in pure C not supported. Ran expression as 'ISO C++'. ```

Add unittest for `DataBreakpointInfoArguments`

…tializers (#163005) `UnwrappedLineParser::parseBracedList` had no explicit handling for the `requires` keyword, so it would just call `nextToken()` instead of properly parsing the `requires` expression. This fix adds a case for `tok::kw_requires` in `parseBracedList`, calling `parseRequiresExpression` to handle it correctly, matching the existing behavior in `parseParens`. Fixes #162984.

)

…163114) This commit renames the "finalize" operation to "initialize", and "deallocate" to "deinitialize". The new names are chosen to better fit the point of view of the ORC-runtime and executor-process: After memory is *reserved* it can be *initialized* with some content, and *deinitialized* to return that memory to the reserved region. This seems more understandable to me than the original scheme, which named these operations after the controller-side JITLinkMemoryManager operations that they partially implemented. I.e. SimpleNativeMemoryMap::finalize implemented the final step of JITLinkMemoryManager::finalize, initializing the memory in the executor; and SimpleNativeMemoryMap::deallocate implemented the final step of JITLinkMemoryManager::deallocate, running dealloc actions and releasing the finalized region. The proper way to think of the relationship between these operations now is that: 1. The final step of finalization is to initialize the memory in the executor. 2. The final step of deallocation is to deinitialize the memory in the executor.

We can add 's' or 'u' before the hexadecimal constants to denote its signedness. See https://llvm.org/docs/LangRef.html#simple-constants for reference.

…163043) This ensures that we are not including any branches on main that are not in the current user's branch in the diff. We can add this to the command now that --diff_from_common_commit (or at least the fixed version) has landed in a release (21.1.1).

The Unsupported case is special and doesn't have an entry in the vector, and is directly emitted as the 0 case. This should be harmless as it is, but could break if the right number of new libcalls is added.

Closes #95219

FWIW, this [[nodiscard]] led to the discovery of #161625.

[[fallthrough]] is now part of C++17, so we don't need to use LLVM_FALLTHROUGH.

) llvm::to_underlying, forward ported from C++23, conveniently packages static_cast and std::underlying_type_t like so: static_cast<std::underlying_type_t<EnumTy>>(E)

llvm::to_underlying, forward ported from C++23, conveniently packages static_cast and std::underlying_type_t like so: static_cast<std::underlying_type_t<EnumTy>>(E)

Identified with bugprone-unused-local-non-trivial-variable.

This would have failed during compilation post generation later, trim and use raw string literals to avoid such failures. Probably a few more places where similar failures could occur, but this was unexpected failure user ran into.

… patterns (#163080) This is a follow-up PR for #162699. Currently, in the function where we define rewrite patterns, the `op` we receive is of type `ir.Operation` rather than a specific `OpView` type (such as `arith.AddIOp`). This means we can’t conveniently access certain parts of the operation — for example, we need to use `op.operands[0]` instead of `op.lhs`. The following example code illustrates this situation. ```python def to_muli(op, rewriter): # op is typed ir.Operation instead of arith.AddIOp pass patterns.add(arith.AddIOp, to_muli) ``` In this PR, we convert the operation to its corresponding `OpView` subclass before invoking the rewrite pattern callback, making it much easier to write patterns. --------- Co-authored-by: Maksim Levental <[email protected]>

https://docs.nvidia.com/hpc-sdk/compilers/cuda-fortran-prog-guide/#load-and-store-functions-using-bulk-tma-operations

…2547) SparseForwardDataFlowAnalysis, with the comments specifying that StateT must be subclassing AbstractSparseLattice, also places a static assert in the class itself. This commit adds the same missing assert for SparseBackwardDataFlowAnalysis.

…write patterns in bindings (#163123) The MLIR Python bindings now support defining new passes, new rewrite patterns (through either `RewritePatternSet` or `PDLModule`), as well as new dialects using the IRDL bindings. Adding a dedicated section to document these features would make it easier for users to discover and understand the full capabilities of the Python bindings.

Add patterns which reduce or operations to register sequences when combining i16 values to i32. This removes many intermediate VGPRs and reduces registers pressure.

pull bot locked and limited conversation to collaborators Oct 2, 2025

pull bot added the ⤵️ pull label Oct 2, 2025

topperc and others added 28 commits October 9, 2025 14:16

[RISCV] Add an explicit i32 to QC_SETWMI pattern to reduce RISCVGenDA…

e02c645

…GISel.inc. NFC (#162720)

[alpha.webkit.NoUnretainedMemberChecker] Allow a retaining property s…

b7e256d

…ynthesis (#162576) Don't emit a warning when an Objective-C property is defined using copy or strong semantics.

[X86] Add missing isel pattern for VCVTTPD2UDQSZ128rm. Remove duplica…

ddd7fc3

…te pattern. (#162700) Addresses my comment here #162036 (comment)

[clang-tidy][NFC] Update list.rst fix-its info for readability-braces…

06cffb7

…-around-statements check (#162698) The check 'readability-braces-around-statements' do offer fixes!

[BOLT] Support fragment symbol mapped to the parent address (#162727)

ab897ae

Observed in GCC-produced binary. Emit a warning for the user. Test Plan: added bolt/test/X86/fragment-alias.s

[SimplifyCFG] Remove all incoming values from OtherDest if OtherDest …

e9205ca

…is unreachable (#162677) Fixes #162585. #161000 changed `br i1 true, label %if, label %else` to `br label %if`, so we should remove one more incoming value.

[libc++][string] Assert resize_and_overwrite operation returns intege…

9146ef5

…r-like type (#162030) Verify that the operation passed to resize_and_overwrite returns an integer-like type, matching the behavior of other standard library implementations like GCC's libstdc++ Fixes #160577

[libcxx] Use %{temp} instead of %T (#162323)

0c2913a

Based on review feedback in #160026. This makes the substitution a lot more clear now that there is no documentation around %T. --------- Co-authored-by: Louis Dionne <[email protected]>

[lldb] Adjusting the naming for gcc11. (#162693)

7a391e3

In gcc11 we're getting an error with a `using Req = Req` statement. This changes the name of the types in JSONTransportTest from `Req` > `Request`, `Evt` > `Event`, `Resp` > `Response`.

[libc++] Fix a few incorrect find-and-replace in the %{temp} change

c8afc6a

[Driver] Remove special handling for older Ubuntu (#162518)

07ca4db

Releases of Ubuntu that do not support the GNU hash style are long unsupported.

[ConstantFPRange] Add support for fneg/fabs (#162690)

4df4b36

This patch adds support for fneg/fabs operations. For other bit manipulation operations (select/copysign), we don't need new APIs.

ADT: Add constructor from uint64_t array for Bitset (#162703)

a16477a

Avoids exposing the implementation detail of uintptr_t to the constructor. This is a replacement of b738f63 which avoids needing tablegen to know the underlying storage type.

AMDGPU/GlobalISel: Fix using wrong regbank for smfmac (#162762)

f949804

Make sure to apply the option+number of register logic from the selection pattrn.

[gn build] Port a16477a

958f320

[Hexagon] Remove unreachable isel patterns. NFC (#162754)

c8205d6

These patterns are for setcc with scalar result type and vector operands or shifts with vector result and scalar shift amount.

[SelectionDAG] Add SDTCisSameNumEltsAs to SDTIntShiftOp. (#162756)

be9e747

The shift amount may have a different scalar size than the result, but they should have the same number of elements or they should both be scalar.

[InstCombine][profcheck] Preserve profile when folding constant valu…

a1b5e97

…e equivalences (#162736)

alexey-bataev and others added 30 commits October 12, 2025 10:28

[SCEV] Use APInt for DividesBy when collecting loop guard info (NFC). (…

0d1f2f4

…#163017) Follow-up as suggested in #162617. Just use an APInt for DividesBy, as the existing code already operates on APInt and thus handles the case of DividesBy being 1. PR: #163017

Fix typo: IsGlobaLinkage -> IsGlobalLinkage in XCOFF (#161960)

e6358ab

Corrects the spelling of 'IsGlobaLinkage' to 'IsGlobalLinkage' in XCOFF-related code, comments, and tests across the codebase.

[LV] Bail out on loops with switch as latch terminator.

5e3ac2a

Currently we cannot vectorize loops with latch blocks terminated by a switch. In the future this could be handled by materializing appropriate compares. Fixes #156894.

[lldb-dap] DataBreakpointInfoArguments make frameId optional. (#162845)

92f1af3

Add unittest for `DataBreakpointInfoArguments`

[llvm][LoongArch] Replace unnecessary ZERO_EXTEND to ANY_EXTEND (#162593

8d03a37

)

[Utils][vim] Match hexadecimal constants with u or s prefixes (#162613)

5a76e14

We can add 's' or 'u' before the hexadecimal constants to denote its signedness. See https://llvm.org/docs/LangRef.html#simple-constants for reference.

TableGen: Account for Unsupporte LibcallImpl in bitset size (#163083)

aaf5493

The Unsupported case is special and doesn't have an entry in the vector, and is directly emitted as the 0 case. This should be harmless as it is, but could break if the right number of new libcalls is added.

AMDGPU: Use ELF mangling in data layout (#163011)

853760b

Closes #95219

[ADT] Add [[nodiscard]] to makeQuiet (NFC) (#161776)

cd1f94c

FWIW, this [[nodiscard]] led to the discovery of #161625.

[llvm] Proofread HowToBuildOnARM.rst (#163039)

f06824d

[clang] Use [[fallthrough]] instead of LLVM_FALLTHROUGH (NFC) (#163085)

4412cfa

[[fallthrough]] is now part of C++17, so we don't need to use LLVM_FALLTHROUGH.

[llvm] Use [[fallthrough]] instead of LLVM_FALLTHROUGH (NFC) (#163086)

6f13b94

[[fallthrough]] is now part of C++17, so we don't need to use LLVM_FALLTHROUGH.

[Support] Use llvm::to_underlying in BinaryStreamWriter.h (NFC) (#163087

59ac5b7

) llvm::to_underlying, forward ported from C++23, conveniently packages static_cast and std::underlying_type_t like so: static_cast<std::underlying_type_t<EnumTy>>(E)

[Support] Use llvm::to_underlying in ScopedPrinter.h (NFC) (#163088)

cf8943a

llvm::to_underlying, forward ported from C++23, conveniently packages static_cast and std::underlying_type_t like so: static_cast<std::underlying_type_t<EnumTy>>(E)

[llvm] Proofread AArch64SME.rst (#163103)

b86a4e1

[clang] Remove unused local variables (NFC) (#163104)

8bd915d

Identified with bugprone-unused-local-non-trivial-variable.

[mlir] Remove unused local variables (NFC) (#163105)

a2a146b

Identified with bugprone-unused-local-non-trivial-variable.

[mlir][tblgen] Avoid compilation failure (#161545)

6785c4f

This would have failed during compilation post generation later, trim and use raw string literals to avoid such failures. Probably a few more places where similar failures could occur, but this was unexpected failure user ran into.

[flang][cuda] Add interface and lowering for tma_bulk_g2s (#163034)

47e9df8

https://docs.nvidia.com/hpc-sdk/compilers/cuda-fortran-prog-guide/#load-and-store-functions-using-bulk-tma-operations

[AMDGPU][True16][CodeGen] Add patterns to reduce intermediates (#162047)

1d0a85a

Add patterns which reduce or operations to register sequences when combining i16 values to i32. This removes many intermediate VGPRs and reduces registers pressure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[pull] main from llvm:main #5641

[pull] main from llvm:main #5641

pull bot commented Oct 2, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

128 participants

[pull] main from llvm:main #5641

Are you sure you want to change the base?

[pull] main from llvm:main #5641

Conversation

pull bot commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

128 participants

pull bot commented Oct 2, 2025 •

edited

Loading