Skip to content

Conversation

folkertdev
Copy link
Contributor

tracking issue: #44930
split out from: #144549

The va_list is created in the compiler itself when the variable argument list ... is desugared, and hence the lifetime end is not inserted automatically. The value can't outlive the function in which it was created, so it is correct to end the lifetime here. Ending the lifetime explicitly also appears to give slightly better codegen in #144549.

I also included a little drive-by improvement to not cast pointers to integers and back again.

r? codegen

@rustbot rustbot added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Aug 31, 2025
@rustbot
Copy link
Collaborator

rustbot commented Aug 31, 2025

Some changes occurred in compiler/rustc_codegen_ssa

cc @WaffleLapkin

@saethlin
Copy link
Member

saethlin commented Sep 1, 2025

Interesting. I'm surprised LLVM needs the lifetime marker.

@bors r+ rollup=iffy
(If this becomes part of a rollup that has a perf change, this PR looks like a candidate to investigate, but I am decently confident it has no compile time impact)

@bors
Copy link
Collaborator

bors commented Sep 1, 2025

📌 Commit 213bb87 has been approved by saethlin

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 1, 2025
@bors
Copy link
Collaborator

bors commented Sep 2, 2025

⌛ Testing commit 213bb87 with merge 05abce5...

@bors
Copy link
Collaborator

bors commented Sep 2, 2025

☀️ Test successful - checks-actions
Approved by: saethlin
Pushing 05abce5 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Sep 2, 2025
@bors bors merged commit 05abce5 into rust-lang:master Sep 2, 2025
11 checks passed
@rustbot rustbot added this to the 1.91.0 milestone Sep 2, 2025
Copy link
Contributor

github-actions bot commented Sep 2, 2025

What is this? This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.

Comparing 75ee9ff (parent) -> 05abce5 (this PR)

Test differences

Show 2 test diffs

2 doctest diffs were found. These are ignored, as they are noisy.

Test dashboard

Run

cargo run --manifest-path src/ci/citool/Cargo.toml -- \
    test-dashboard 05abce5d058db0de3abd10f32f1a442d0f699b30 --output-dir test-dashboard

And then open test-dashboard/index.html in your browser to see an overview of all executed tests.

Job duration changes

  1. dist-aarch64-linux: 5962.3s -> 8553.5s (43.5%)
  2. dist-x86_64-apple: 6325.8s -> 7118.1s (12.5%)
  3. dist-riscv64-linux: 4681.4s -> 5117.9s (9.3%)
  4. dist-aarch64-apple: 5995.6s -> 6531.7s (8.9%)
  5. dist-aarch64-msvc: 5847.5s -> 5394.4s (-7.7%)
  6. aarch64-apple: 6075.6s -> 5659.6s (-6.8%)
  7. x86_64-gnu-llvm-19-1: 3773.5s -> 3553.7s (-5.8%)
  8. dist-i586-gnu-i586-i686-musl: 5754.0s -> 5433.9s (-5.6%)
  9. i686-gnu-2: 6094.7s -> 5761.9s (-5.5%)
  10. dist-various-1: 4421.6s -> 4192.5s (-5.2%)
How to interpret the job duration changes?

Job durations can vary a lot, based on the actual runner instance
that executed the job, system noise, invalidated caches, etc. The table above is provided
mostly for t-infra members, for simpler debugging of potential CI slow-downs.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (05abce5): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results (secondary -4.8%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-4.8% [-4.8%, -4.8%] 1
All ❌✅ (primary) - - 0

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 467.688s -> 467.236s (-0.10%)
Artifact size: 388.42 MiB -> 388.42 MiB (-0.00%)

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Sep 8, 2025
match clang's `va_arg` assembly on arm targets

tracking issue: rust-lang#44930

For this example

```rust
#![feature(c_variadic)]

#[unsafe(no_mangle)]
unsafe extern "C" fn variadic(a: f64, mut args: ...) -> f64 {
    let b = args.arg::<f64>();
    let c = args.arg::<f64>();

    a + b + c
}
```

We currently generate (via llvm):

```asm
variadic:
    sub     sp, sp, rust-lang#12
    stmib   sp, {r2, r3}
    vmov    d0, r0, r1
    add     r0, sp, rust-lang#4
    vldr    d1, [sp, rust-lang#4]
    add     r0, r0, rust-lang#15
    bic     r0, r0, rust-lang#7
    vadd.f64        d0, d0, d1
    add     r1, r0, rust-lang#8
    str     r1, [sp]
    vldr    d1, [r0]
    vadd.f64        d0, d0, d1
    vmov    r0, r1, d0
    add     sp, sp, rust-lang#12
    bx      lr
```

LLVM is not doing a good job. In fact, it's well-known that LLVM's implementation of `va_arg` is kind of bad, and we implement it ourselves (based on clang) for many targets already. For arm,  our own `emit_ptr_va_arg` saves 3 instructions.

Next, it turns out it's important for LLVM to explicitly start and end the lifetime of the `va_list`. In rust-lang#146059 I already end the lifetime, but when looking at this again, I noticed that it is important to also start it, see https://godbolt.org/z/EGqvKTTsK: failing to explicitly start the lifetime uses an extra register.

So, the combination of `emit_ptr_va_arg` with starting/ending the lifetime makes rustc emit exactly the instructions that clang generates::

```asm
variadic:
    sub     sp, sp, rust-lang#12
    stmib   sp, {r2, r3}
    vmov    d16, r0, r1
    vldr    d17, [sp, rust-lang#4]
    vadd.f64        d16, d16, d17
    vldr    d17, [sp, rust-lang#12]
    vadd.f64        d16, d16, d17
    vmov    r0, r1, d16
    add     sp, sp, rust-lang#12
    bx      lr
```

The arguments to `emit_ptr_va_arg` are based on [the clang implementation](https://github.com/llvm/llvm-project/blob/03dc2a41f3d9a500e47b513de5c5008c06860d65/clang/lib/CodeGen/Targets/ARM.cpp#L798-L844).

r? `@workingjubilee` (I can re-roll if your queue is too full, but you do seem like the right person here)
bors added a commit that referenced this pull request Sep 12, 2025
match clang's `va_arg` assembly on arm targets

tracking issue: #44930

For this example

```rust
#![feature(c_variadic)]

#[unsafe(no_mangle)]
unsafe extern "C" fn variadic(a: f64, mut args: ...) -> f64 {
    let b = args.arg::<f64>();
    let c = args.arg::<f64>();

    a + b + c
}
```

We currently generate (via llvm):

```asm
variadic:
    sub     sp, sp, #12
    stmib   sp, {r2, r3}
    vmov    d0, r0, r1
    add     r0, sp, #4
    vldr    d1, [sp, #4]
    add     r0, r0, #15
    bic     r0, r0, #7
    vadd.f64        d0, d0, d1
    add     r1, r0, #8
    str     r1, [sp]
    vldr    d1, [r0]
    vadd.f64        d0, d0, d1
    vmov    r0, r1, d0
    add     sp, sp, #12
    bx      lr
```

LLVM is not doing a good job. In fact, it's well-known that LLVM's implementation of `va_arg` is kind of bad, and we implement it ourselves (based on clang) for many targets already. For arm,  our own `emit_ptr_va_arg` saves 3 instructions.

Next, it turns out it's important for LLVM to explicitly start and end the lifetime of the `va_list`. In #146059 I already end the lifetime, but when looking at this again, I noticed that it is important to also start it, see https://godbolt.org/z/EGqvKTTsK: failing to explicitly start the lifetime uses an extra register.

So, the combination of `emit_ptr_va_arg` with starting/ending the lifetime makes rustc emit exactly the instructions that clang generates::

```asm
variadic:
    sub     sp, sp, #12
    stmib   sp, {r2, r3}
    vmov    d16, r0, r1
    vldr    d17, [sp, #4]
    vadd.f64        d16, d16, d17
    vldr    d17, [sp, #12]
    vadd.f64        d16, d16, d17
    vmov    r0, r1, d16
    add     sp, sp, #12
    bx      lr
```

The arguments to `emit_ptr_va_arg` are based on [the clang implementation](https://github.com/llvm/llvm-project/blob/03dc2a41f3d9a500e47b513de5c5008c06860d65/clang/lib/CodeGen/Targets/ARM.cpp#L798-L844).

r? `@workingjubilee` (I can re-roll if your queue is too full, but you do seem like the right person here)

try-job: armhf-gnu
Zalathar added a commit to Zalathar/rust that referenced this pull request Sep 12, 2025
match clang's `va_arg` assembly on arm targets

tracking issue: rust-lang#44930

For this example

```rust
#![feature(c_variadic)]

#[unsafe(no_mangle)]
unsafe extern "C" fn variadic(a: f64, mut args: ...) -> f64 {
    let b = args.arg::<f64>();
    let c = args.arg::<f64>();

    a + b + c
}
```

We currently generate (via llvm):

```asm
variadic:
    sub     sp, sp, rust-lang#12
    stmib   sp, {r2, r3}
    vmov    d0, r0, r1
    add     r0, sp, rust-lang#4
    vldr    d1, [sp, rust-lang#4]
    add     r0, r0, rust-lang#15
    bic     r0, r0, rust-lang#7
    vadd.f64        d0, d0, d1
    add     r1, r0, rust-lang#8
    str     r1, [sp]
    vldr    d1, [r0]
    vadd.f64        d0, d0, d1
    vmov    r0, r1, d0
    add     sp, sp, rust-lang#12
    bx      lr
```

LLVM is not doing a good job. In fact, it's well-known that LLVM's implementation of `va_arg` is kind of bad, and we implement it ourselves (based on clang) for many targets already. For arm,  our own `emit_ptr_va_arg` saves 3 instructions.

Next, it turns out it's important for LLVM to explicitly start and end the lifetime of the `va_list`. In rust-lang#146059 I already end the lifetime, but when looking at this again, I noticed that it is important to also start it, see https://godbolt.org/z/EGqvKTTsK: failing to explicitly start the lifetime uses an extra register.

So, the combination of `emit_ptr_va_arg` with starting/ending the lifetime makes rustc emit exactly the instructions that clang generates::

```asm
variadic:
    sub     sp, sp, rust-lang#12
    stmib   sp, {r2, r3}
    vmov    d16, r0, r1
    vldr    d17, [sp, rust-lang#4]
    vadd.f64        d16, d16, d17
    vldr    d17, [sp, rust-lang#12]
    vadd.f64        d16, d16, d17
    vmov    r0, r1, d16
    add     sp, sp, rust-lang#12
    bx      lr
```

The arguments to `emit_ptr_va_arg` are based on [the clang implementation](https://github.com/llvm/llvm-project/blob/03dc2a41f3d9a500e47b513de5c5008c06860d65/clang/lib/CodeGen/Targets/ARM.cpp#L798-L844).

r? `@workingjubilee` (I can re-roll if your queue is too full, but you do seem like the right person here)

try-job: armhf-gnu
Zalathar added a commit to Zalathar/rust that referenced this pull request Sep 12, 2025
match clang's `va_arg` assembly on arm targets

tracking issue: rust-lang#44930

For this example

```rust
#![feature(c_variadic)]

#[unsafe(no_mangle)]
unsafe extern "C" fn variadic(a: f64, mut args: ...) -> f64 {
    let b = args.arg::<f64>();
    let c = args.arg::<f64>();

    a + b + c
}
```

We currently generate (via llvm):

```asm
variadic:
    sub     sp, sp, rust-lang#12
    stmib   sp, {r2, r3}
    vmov    d0, r0, r1
    add     r0, sp, rust-lang#4
    vldr    d1, [sp, rust-lang#4]
    add     r0, r0, rust-lang#15
    bic     r0, r0, rust-lang#7
    vadd.f64        d0, d0, d1
    add     r1, r0, rust-lang#8
    str     r1, [sp]
    vldr    d1, [r0]
    vadd.f64        d0, d0, d1
    vmov    r0, r1, d0
    add     sp, sp, rust-lang#12
    bx      lr
```

LLVM is not doing a good job. In fact, it's well-known that LLVM's implementation of `va_arg` is kind of bad, and we implement it ourselves (based on clang) for many targets already. For arm,  our own `emit_ptr_va_arg` saves 3 instructions.

Next, it turns out it's important for LLVM to explicitly start and end the lifetime of the `va_list`. In rust-lang#146059 I already end the lifetime, but when looking at this again, I noticed that it is important to also start it, see https://godbolt.org/z/EGqvKTTsK: failing to explicitly start the lifetime uses an extra register.

So, the combination of `emit_ptr_va_arg` with starting/ending the lifetime makes rustc emit exactly the instructions that clang generates::

```asm
variadic:
    sub     sp, sp, rust-lang#12
    stmib   sp, {r2, r3}
    vmov    d16, r0, r1
    vldr    d17, [sp, rust-lang#4]
    vadd.f64        d16, d16, d17
    vldr    d17, [sp, rust-lang#12]
    vadd.f64        d16, d16, d17
    vmov    r0, r1, d16
    add     sp, sp, rust-lang#12
    bx      lr
```

The arguments to `emit_ptr_va_arg` are based on [the clang implementation](https://github.com/llvm/llvm-project/blob/03dc2a41f3d9a500e47b513de5c5008c06860d65/clang/lib/CodeGen/Targets/ARM.cpp#L798-L844).

r? ``@workingjubilee`` (I can re-roll if your queue is too full, but you do seem like the right person here)

try-job: armhf-gnu
rust-timer added a commit that referenced this pull request Sep 12, 2025
Rollup merge of #144549 - folkertdev:va-arg-arm, r=saethlin

match clang's `va_arg` assembly on arm targets

tracking issue: #44930

For this example

```rust
#![feature(c_variadic)]

#[unsafe(no_mangle)]
unsafe extern "C" fn variadic(a: f64, mut args: ...) -> f64 {
    let b = args.arg::<f64>();
    let c = args.arg::<f64>();

    a + b + c
}
```

We currently generate (via llvm):

```asm
variadic:
    sub     sp, sp, #12
    stmib   sp, {r2, r3}
    vmov    d0, r0, r1
    add     r0, sp, #4
    vldr    d1, [sp, #4]
    add     r0, r0, #15
    bic     r0, r0, #7
    vadd.f64        d0, d0, d1
    add     r1, r0, #8
    str     r1, [sp]
    vldr    d1, [r0]
    vadd.f64        d0, d0, d1
    vmov    r0, r1, d0
    add     sp, sp, #12
    bx      lr
```

LLVM is not doing a good job. In fact, it's well-known that LLVM's implementation of `va_arg` is kind of bad, and we implement it ourselves (based on clang) for many targets already. For arm,  our own `emit_ptr_va_arg` saves 3 instructions.

Next, it turns out it's important for LLVM to explicitly start and end the lifetime of the `va_list`. In #146059 I already end the lifetime, but when looking at this again, I noticed that it is important to also start it, see https://godbolt.org/z/EGqvKTTsK: failing to explicitly start the lifetime uses an extra register.

So, the combination of `emit_ptr_va_arg` with starting/ending the lifetime makes rustc emit exactly the instructions that clang generates::

```asm
variadic:
    sub     sp, sp, #12
    stmib   sp, {r2, r3}
    vmov    d16, r0, r1
    vldr    d17, [sp, #4]
    vadd.f64        d16, d16, d17
    vldr    d17, [sp, #12]
    vadd.f64        d16, d16, d17
    vmov    r0, r1, d16
    add     sp, sp, #12
    bx      lr
```

The arguments to `emit_ptr_va_arg` are based on [the clang implementation](https://github.com/llvm/llvm-project/blob/03dc2a41f3d9a500e47b513de5c5008c06860d65/clang/lib/CodeGen/Targets/ARM.cpp#L798-L844).

r? ``@workingjubilee`` (I can re-roll if your queue is too full, but you do seem like the right person here)

try-job: armhf-gnu
github-actions bot pushed a commit to rust-lang/miri that referenced this pull request Sep 13, 2025
match clang's `va_arg` assembly on arm targets

tracking issue: rust-lang/rust#44930

For this example

```rust
#![feature(c_variadic)]

#[unsafe(no_mangle)]
unsafe extern "C" fn variadic(a: f64, mut args: ...) -> f64 {
    let b = args.arg::<f64>();
    let c = args.arg::<f64>();

    a + b + c
}
```

We currently generate (via llvm):

```asm
variadic:
    sub     sp, sp, #12
    stmib   sp, {r2, r3}
    vmov    d0, r0, r1
    add     r0, sp, #4
    vldr    d1, [sp, #4]
    add     r0, r0, #15
    bic     r0, r0, #7
    vadd.f64        d0, d0, d1
    add     r1, r0, #8
    str     r1, [sp]
    vldr    d1, [r0]
    vadd.f64        d0, d0, d1
    vmov    r0, r1, d0
    add     sp, sp, #12
    bx      lr
```

LLVM is not doing a good job. In fact, it's well-known that LLVM's implementation of `va_arg` is kind of bad, and we implement it ourselves (based on clang) for many targets already. For arm,  our own `emit_ptr_va_arg` saves 3 instructions.

Next, it turns out it's important for LLVM to explicitly start and end the lifetime of the `va_list`. In rust-lang/rust#146059 I already end the lifetime, but when looking at this again, I noticed that it is important to also start it, see https://godbolt.org/z/EGqvKTTsK: failing to explicitly start the lifetime uses an extra register.

So, the combination of `emit_ptr_va_arg` with starting/ending the lifetime makes rustc emit exactly the instructions that clang generates::

```asm
variadic:
    sub     sp, sp, #12
    stmib   sp, {r2, r3}
    vmov    d16, r0, r1
    vldr    d17, [sp, #4]
    vadd.f64        d16, d16, d17
    vldr    d17, [sp, #12]
    vadd.f64        d16, d16, d17
    vmov    r0, r1, d16
    add     sp, sp, #12
    bx      lr
```

The arguments to `emit_ptr_va_arg` are based on [the clang implementation](https://github.com/llvm/llvm-project/blob/03dc2a41f3d9a500e47b513de5c5008c06860d65/clang/lib/CodeGen/Targets/ARM.cpp#L798-L844).

r? ``@workingjubilee`` (I can re-roll if your queue is too full, but you do seem like the right person here)

try-job: armhf-gnu
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants