Skip to content

Conversation

pull[bot]
Copy link

@pull pull bot commented Oct 2, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

@pull pull bot locked and limited conversation to collaborators Oct 2, 2025
@pull pull bot added the ⤵️ pull label Oct 2, 2025
topperc and others added 28 commits October 9, 2025 14:16
…ynthesis (#162576)

Don't emit a warning when an Objective-C property is defined using copy
or strong semantics.
When --use-old-text fails, we are emitting all code meant for the
original `.text` section into the new section. This could be more bytes
compared to those emitted under no `--use-old-text`, especially under
`--lite`. As a result, `--use-old-text` results in a larger binary, not
smaller which could be confusing to the user.

Add more information to the warning, including recommendation to rebuild
without `--use-old-text` for smaller binary size.
…-around-statements check (#162698)

The check 'readability-braces-around-statements' do offer fixes!
Observed in GCC-produced binary. Emit a warning for the user.

Test Plan: added bolt/test/X86/fragment-alias.s
…is unreachable (#162677)

Fixes #162585.

#161000 changed `br i1 true, label %if, label %else` to `br label %if`,
so we should remove one more incoming value.
Removed all the caching maps (BB, Inst) in `Embedder` as we don't want
to cache embeddings in general. Our earlier experiments on Symbolic
embeddings show recomputation of embeddings is cheaper than cache
lookups.

OTOH, Flow-Aware embeddings would benefit from instruction level
caching, as computing the embedding for an instruction would depend on
the embeddings of other instructions in a function. So, retained
instruction embedding caching logic only for Flow-Aware computation.
This also necessitates an `invalidate` method that would clean up the
cache when the embeddings would become invalid due to transformations.
…#162526)

For a pattern like this:

    Pat<(MyOp $x, $x),
        (...),
        [(MyCheck $x)]>;

The old implementation generates:

    Pat<(MyOp $x0, $x1),
        (...),
        [(MyCheck $x0),
         ($x0 == $x1)]>;

This is not very straightforward, because the $x name appears in the
source pattern; it's attempting to assume equality check will be
performed as part of the source pattern matching.

This commit moves the equality checks before the other constraints,
i.e.:

    Pat<(MyOp $x0, $x1),
        (...),
        [($x0 == $x1),
         (MyCheck $x0)]>;
…r-like type (#162030)

Verify that the operation passed to resize_and_overwrite returns an
integer-like type, matching the behavior of other standard library
implementations like GCC's libstdc++

Fixes #160577
Based on review feedback in #160026.

This makes the substitution a lot more clear now that there is no
documentation around %T.

---------

Co-authored-by: Louis Dionne <[email protected]>
This patch introduces some missing s.barrier instructions in the ROCDL
dialect handling named barriers

Specifically:
```
@llvm.amdgcn.s.barrier.init - s_barrier_init
@llvm.amdgcn.s.barrier.join - s_barrier_join
@llvm.amdgcn.s.barrier.leave - s_barrier_leave
@llvm.amdgcn.s.barrier.signal.isfirst - s_barrier_signal_isfirst
@llvm.amdgcn.s.get.barrier.state - s_get_barrier_state
```
In gcc11 we're getting an error with a `using Req = Req` statement. This
changes the name of the types in JSONTransportTest from `Req` >
`Request`, `Evt` > `Event`, `Resp` > `Response`.
…ance (#162399)

sifive-x390 and sifive-x280 both share the SiFIve7 scheduling model, yet
the former has a limited FP64 vector performance. Right now we account
for it by instantiating two separate scheduling models (throttled v.s.
non-throttled) from the base SiFive7 model. However, this approach
(which is also used in other performance features like fast vrgather in
SiFive7) does not scale if we add more of these performance features in
the future -- the number of scheduling models will simply become
unmanageable.

The new solution I've been working on is to let a _single_ scheduling
model be configured by subtarget features on performance features like
these, such that we no longer need to create those derived models. This
patch creates the subtarget feature that'll ultimately replace the
`isFP64Throttled` knob in SiFive7 scheduling model mentioned earlier.
There will be a follow-up patch to integrate this into the scheduling
model.
In "Debugging C++ Coroutines", we provide a gdb script to aid with
debugging C++ coroutines in gdb. This commit updates said script to make
it easier to use and more robust.

The commit contains the following user-facing changes:
* `show-coro-frame` was replaced by a pretty-printer for
  `std::coroutine_handle`. This is much easier to use than a custom
  command since it works out-of-the-box with `p` and in my IDE's variable
  view (tested using VS-Code)
* the new `get_coro_{frame,promise}` functions can be called from
  expressions to access nested members. Example: `p
  get_coro_promise(fib.coro_hdl)->current_state`
* `async-bt` was replaced by a frame filter. This way, the builtin `bt`
  command directly shows all the async coroutine frames.

Under the covers, the script became more robust:
* For devirtualization, we now look up the `__coro_frame` variable in
  the resume function instead of relying on the `.coro_frame_ty` naming
  convention. Thereby, devirtualization works slightly better also on
  gcc-compiled binaries (however, there is still more work to be done).
* We use the LLVM-generated `__coro_resume_<N>` labels to get the exact
  line at which a coroutine was suspended.
* The continuation handle is now looked up by name instead of via
  dereferencing a calculated pointer. Thereby, the script should be
  simpler to adjust for various coroutine libraries without requiring
  pointer arithmetic hacks.

Other sections of the documentation were adjusted accordingly to reflect
the newly added features of the gdb script.
…h subtarget feature (#162400)

This patch teaches the SiFive7 scheduling model to configure / toggle
the throttled FP64 vector feature with subtarget feature rather than
hard-coded TableGen parameter, which inevitably forces us to instantiate
a new scheduling model for every performance features like this.
This test, with a corefile created via yaml2macho-core plus an
ObjectFileJSON binary with symbol addresses and ranges, was failing
on some machines/CI because the wrong ABI was being picked.

The bytes of the functions were not included in the yaml or .json
binary.  The unwind falls back to using the ABI plugin default
unwind plans.  We have two armv7 ABIs - the Darwin ABI that always
uses r7 as the frame pointer, and the AAPCS ABI which uses r11 code.
In reality, armv7 code uses r11 in arm mode, r7 in thumb code.  But
the ABI ArchDefaultUnwindPlan doesn't have any access to the Target's
ArchSpec or Process register state, to determine the correct processor
state (arm or thumb).  And in fact, on Cortex-M targets, the
instructions are always thumb, so the arch default unwind plan
(hardcoded r11) is always wrong.

The corefile doesn't specify a vendor/os, only a cpu.
The object file json specifies the armv7m-apple-* triple, which will
select the correct ABI plugin, and the test runs.

In some cases, it looks like the Process ABI was fetched after
opening the corefile, but before the binary.json was loaded and
corrected the Target's ArchSpec.  And we never re-evaluate the ABI
once it is set, in a Process.  When we picked the AAPCS armv7 ABI,
we would try to use r11 as frame pointer, and the unwind would stop
after one stack frame.

I'm stepping around this problem by (1) adding the register bytes of
the prologues of every test function in the backtrace, and (2)
shortening the function ranges (in binary.json) to specify that the
functions are all just long enough for the prologue where execution
is stopped.  The instruction emulation plugin will fail if it can't
get all of the bytes from the function instructions, so I hacked
the function sizes in the .json to cover the prologue plus one and
changed the addresses in the backtrace to fit within those ranges.

[ updated this commit to keep the @skipIfRemote on the API test
because two remote CI bots are failing for reasons I don't quite
see. ]
Releases of Ubuntu that do not support the GNU hash style
are long unsupported.
This patch adds support for fneg/fabs operations. For other bit
manipulation operations (select/copysign), we don't need new APIs.
Avoids exposing the implementation detail of uintptr_t to
the constructor.

This is a replacement of b738f63
which avoids needing tablegen to know the underlying storage type.
Make sure to apply the option+number of register logic from
the selection pattrn.
These patterns are for setcc with scalar result type and vector operands
or shifts with vector result and scalar shift amount.
The shift amount may have a different scalar size than the result, but
they should have the same number of elements or they should both
be scalar.
…161809)

Check RegisterClassInfo if any registers of the new class are
actually available for use. Currently AMDGPU overrides shouldCoalesce
to avoid this situation. The target hook does not have access to the
dynamic register class counts, but ideally the target hook would
only be used for profitability concerns.

The new test doesn't change, due to the AMDGPU shouldCoalesce override,
but would be unallocatable if we dropped the override and switched
to the default implementation. The existing limit-coalesce.mir already
tests the behavior of this override, but it's too conservative and
isn't checking the case where the new class is unallocatable. Add
this check so it can be relaxed.
…162714)

This renames some attribute list related functions, to make callers
think about whether they want to append or prepend to the list, instead
of defaulting to prepending which is often not the desired behaviour
(for the cases where it matters, sometimes we're just adding to an empty
list). Then it adjusts some of these calls to append where they were
previously prepending. This has the effect of making
`err_attributes_are_not_compatible` consistent in emitting diagnostics
as `<new-attr> and <existing-attr> are not compatible`, regardless of
the syntax used to apply the attributes.
alexey-bataev and others added 30 commits October 12, 2025 10:28
…tions

If the non-commutative user has several same operands and at least one
of them (but not the first) is copyable, need to consider this
opportunity when calculating the number of dependencies. Otherwise, the
schedule bundle might be not scheduled correctly and cause a compiler
crash

Fixes #162925
…#163017)

Follow-up as suggested in
#162617.

Just use an APInt for DividesBy, as the existing code already operates
on APInt and thus handles the case of DividesBy being 1.

PR: #163017
Corrects the spelling of 'IsGlobaLinkage' to 'IsGlobalLinkage' in
XCOFF-related code, comments, and tests across the codebase.
Currently we cannot vectorize loops with latch blocks terminated by a
switch. In the future this could be handled by materializing appropriate
compares.

Fixes #156894.
… is PHI

Need to insert the vector value for the postponed gather/buildvector
node after all uses non only if the vector value of the user node is
phi, but also if the user node itself is PHI node, which may produce
vector phi + shuffle.

Fixes #162799
Generally G_UADDE, G_UADDO, G_USUBE, G_USUBO are used together and it
was enough to simply define EFLAGS. But if extractvalue is used, we end
up with a copy of EFLAGS into GPR.

Always generate SETB instruction to put the carry bit on GPR and CMP to
set the carry bit back. It gives the correct lowering in all the cases.

Closes #120029
Otherwise debug-info is stripped, which influences the language of the
current frame.

Also, set explicit breakpoint because Windows seems to not obey the
debugtrap.

Log from failing test on Windows:
```
(lldb) command source -s 0 'lit-lldb-init-quiet'
Executing commands in 'D:\test\lit-lldb-init-quiet'.
(lldb) command source -C --silent-run true lit-lldb-init
(lldb) target create "main.out"
Current executable set to 'D:\test\main.out' (x86_64).
(lldb) settings set interpreter.stop-command-source-on-error false
(lldb) command source -s 0 'with-target.input'
Executing commands in 'D:\test\with-target.input'.
(lldb) expr blah
            ^
            error: use of undeclared identifier 'blah'
note: Falling back to default language. Ran expression as 'Objective C++'.
(lldb) run
Process 29404 launched: 'D:\test\main.out' (x86_64)
Process 29404 stopped
* thread #1, stop reason = Exception 0x80000003 encountered at address 0x7ff7b3df7189
    frame #0: 0x00007ff7b3df718a main.out
->  0x7ff7b3df718a: xorl   %eax, %eax
    0x7ff7b3df718c: popq   %rcx
    0x7ff7b3df718d: retq
    0x7ff7b3df718e: int3
(lldb) expr blah
            ^
            error: use of undeclared identifier 'blah'
note: Falling back to default language. Ran expression as 'Objective C++'.
(lldb) expr -l objc -- blah
                       ^
                       error: use of undeclared identifier 'blah'
note: Expression evaluation in pure Objective-C not supported. Ran expression as 'Objective C++'.
(lldb) expr -l c -- blah
                    ^
                    error: use of undeclared identifier 'blah'
note: Expression evaluation in pure C not supported. Ran expression as 'ISO C++'.
```
Add unittest for `DataBreakpointInfoArguments`
…tializers (#163005)

`UnwrappedLineParser::parseBracedList` had no
explicit handling for the `requires` keyword, so it would just call
`nextToken()` instead of properly parsing the `requires` expression.

This fix adds a case for `tok::kw_requires` in `parseBracedList`,
calling `parseRequiresExpression` to handle it correctly, matching the
existing behavior in `parseParens`.

Fixes #162984.
…163114)

This commit renames the "finalize" operation to "initialize", and
"deallocate" to "deinitialize".

The new names are chosen to better fit the point of view of the
ORC-runtime and executor-process: After memory is *reserved* it can be
*initialized* with some content, and *deinitialized* to return that
memory to the reserved region.

This seems more understandable to me than the original scheme, which
named these operations after the controller-side JITLinkMemoryManager
operations that they partially implemented. I.e.
SimpleNativeMemoryMap::finalize implemented the final step of
JITLinkMemoryManager::finalize, initializing the memory in the executor;
and SimpleNativeMemoryMap::deallocate implemented the final step of
JITLinkMemoryManager::deallocate, running dealloc actions and releasing
the finalized region.

The proper way to think of the relationship between these operations now
is that:

1. The final step of finalization is to initialize the memory in the
executor.

2. The final step of deallocation is to deinitialize the memory in the
executor.
We can add 's' or 'u' before the hexadecimal constants to denote its
signedness.

See https://llvm.org/docs/LangRef.html#simple-constants for reference.
…163043)

This ensures that we are not including any branches on main that are not
in the current user's branch in the diff. We can add this to the command
now that --diff_from_common_commit (or at least the fixed version) has
landed in a release (21.1.1).
The Unsupported case is special and doesn't have an entry in the
vector, and is directly emitted as the 0 case. This should be
harmless as it is, but could break if the right number of new
libcalls is added.
FWIW, this [[nodiscard]] led to the discovery of #161625.
[[fallthrough]] is now part of C++17, so we don't need to use
LLVM_FALLTHROUGH.
[[fallthrough]] is now part of C++17, so we don't need to use
LLVM_FALLTHROUGH.
)

llvm::to_underlying, forward ported from C++23, conveniently packages
static_cast and std::underlying_type_t like so:

  static_cast<std::underlying_type_t<EnumTy>>(E)
llvm::to_underlying, forward ported from C++23, conveniently packages
static_cast and std::underlying_type_t like so:

  static_cast<std::underlying_type_t<EnumTy>>(E)
Identified with bugprone-unused-local-non-trivial-variable.
Identified with bugprone-unused-local-non-trivial-variable.
This would have failed during compilation post generation later, trim
and use raw string literals to avoid such failures.

Probably a few more places where similar failures could occur, but this
was unexpected failure user ran into.
… patterns (#163080)

This is a follow-up PR for #162699.

Currently, in the function where we define rewrite patterns, the `op` we
receive is of type `ir.Operation` rather than a specific `OpView` type
(such as `arith.AddIOp`). This means we can’t conveniently access
certain parts of the operation — for example, we need to use
`op.operands[0]` instead of `op.lhs`. The following example code
illustrates this situation.

```python
def to_muli(op, rewriter):
  # op is typed ir.Operation instead of arith.AddIOp
  pass

patterns.add(arith.AddIOp, to_muli)
```

In this PR, we convert the operation to its corresponding `OpView`
subclass before invoking the rewrite pattern callback, making it much
easier to write patterns.

---------

Co-authored-by: Maksim Levental <[email protected]>
…2547)

SparseForwardDataFlowAnalysis, with the comments specifying that StateT
must be subclassing AbstractSparseLattice, also places a static assert
in the class itself.

This commit adds the same missing assert for
SparseBackwardDataFlowAnalysis.
…write patterns in bindings (#163123)

The MLIR Python bindings now support defining new passes, new rewrite
patterns (through either `RewritePatternSet` or `PDLModule`), as well as
new dialects using the IRDL bindings. Adding a dedicated section to
document these features would make it easier for users to discover and
understand the full capabilities of the Python bindings.
Add patterns which reduce or operations to register sequences when
combining i16 values to i32. This removes many intermediate VGPRs and
reduces registers pressure.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.