Skip to content

Enabling LTO causes miscompilation on x86_64-apple-darwin #141306

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
briansmith opened this issue May 20, 2025 · 17 comments · May be fixed by #142447
Open

Enabling LTO causes miscompilation on x86_64-apple-darwin #141306

briansmith opened this issue May 20, 2025 · 17 comments · May be fixed by #142447
Labels
C-bug Category: This is a bug. C-external-bug Category: issue that is caused by bugs in software beyond our control llvm-fixed-upstream Issue expected to be fixed by the next major LLVM upgrade, or backported fixes O-macos Operating system: macOS P-critical Critical priority regression-untriaged Untriaged performance or correctness regression. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@briansmith
Copy link
Contributor

briansmith commented May 20, 2025

This is from https://github.com/briansmith/ring 's CI, testing the tip of the main branch. The CI passes on 1.86 and fails on multiple targets when building with 1.87. In at least one case, building and linking passes but the test fails, indicating potentially the code (computing HMAC) or the test of the code is being miscompiled.

Unfortunately, I have limited access to systems that would help me reduce these today. I have noticed another project has run into a similar linker error recently. I apologize in advance for such a poor bug report. I hope I, or ideally others, could fill in the details, as I have limited availability today.

x86_64-apple-darwin

https://github.com/briansmith/ring/actions/runs/15143648178/job/42573984175

running 2 tests
test hmac_debug ... ok
test hmac_tests ... FAILED

failures:

---- hmac_tests stdout ----

thread 'hmac_tests' panicked at tests/hmac_tests.rs:83:9:
assertion `left == right` failed
  left: true
 right: false
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

(The test is checking whether the output of the HMAC computation is correct.)

@briansmith briansmith added C-bug Category: This is a bug. regression-untriaged Untriaged performance or correctness regression. labels May 20, 2025
@rustbot rustbot added needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels May 20, 2025
@briansmith
Copy link
Contributor Author

See tikv/tikv#18465 for another project that seems to have encountered the same error with the same unrecognized relocation type error on ARM64 Linux, using nightly-2025-04-04.

@briansmith
Copy link
Contributor Author

The problem seems to be caused by LTO, as when I remove "lto = true" from the release profile in Cargo.toml, things are working. See https://github.com/briansmith/ring/actions/runs/15145798156.

@briansmith briansmith changed the title Failures to link due to unknown relocation type in switch tables on multiple targets; possible miscompile When LTO is enabled, miscomputes (x86_64-apple-darwin, at least) and failures to link due to unknown relocation type in switch tables (AArch64 Linux/Android, at least) May 20, 2025
@briansmith briansmith changed the title When LTO is enabled, miscomputes (x86_64-apple-darwin, at least) and failures to link due to unknown relocation type in switch tables (AArch64 Linux/Android, at least) When LTO is enabled, miscompiles (x86_64-apple-darwin, at least) and failures to link due to unknown relocation type in switch tables (AArch64 Linux/Android, at least) May 20, 2025
briansmith added a commit to briansmith/ring that referenced this issue May 20, 2025
Rust 1.87 seems to miscompile the tests and/or *ring* itself when LTO
is enabled. See rust-lang/rust#141306.
@lqd
Copy link
Member

lqd commented May 20, 2025

Can you bisect the PR causing this with cargo-bisect-rustc? Thanks!

@briansmith
Copy link
Contributor Author

Can you bisect the PR causing this with cargo-bisect-rustc? Thanks!

If you are asking me, then no. It is very unlikely I will get to it, though in normal circumstances I would try.

@lqd
Copy link
Member

lqd commented May 20, 2025

The bisection should be helpful regardless of who's doing it, it doesn't have to be you, no worries.

@lqd lqd added E-needs-bisection Call for participation: This issue needs bisection: https://github.com/rust-lang/cargo-bisect-rustc E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example labels May 20, 2025
briansmith added a commit to briansmith/ring that referenced this issue May 20, 2025
Rust 1.87 seems to miscompile the tests and/or *ring* itself when LTO
is enabled. See rust-lang/rust#141306.
@apiraino
Copy link
Contributor

Assigning priority (discussion on Zulip).

@rustbot label -I-prioritize +P-critical +T-compiler

@rustbot rustbot added P-critical Critical priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels May 21, 2025
@briansmith
Copy link
Contributor Author

My guess is that, for the x86_64-apple-darwin miscompile, there may be two issues: #140686, which we believe to be a linker bug (not sure if that is a bug in a very old linker, or a bug that also affects the linker that is used b by default in macos-13 GitHub runners, which are x86-64) and a codegen change that causes the linker bug to be hit more often. If so, reducing and bisecting might be more difficult than typical.

@saethlin saethlin added A-linkage Area: linking into static, shared libraries and binaries and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels May 22, 2025
@jieyouxu jieyouxu added the E-help-wanted Call for participation: Help is requested to fix this issue. label May 22, 2025
ti-chi-bot bot pushed a commit to tikv/tikv that referenced this issue May 27, 2025
close #18465

downgrade rust toolchain to fix arm64 build(It's a workaround of rust bug rust-lang/rust#141306)

Signed-off-by: glorv <[email protected]>
@dianqk
Copy link
Member

dianqk commented May 29, 2025

A minimum reproduction: https://rust.godbolt.org/z/vd4nTc3oj

@rustbot label -E-needs-mcve

@justsmth
Copy link

As another data point, aws-lc-rs is seeing the same runtime failure on x86_64-apple-darwin on (essentially) that same test: https://github.com/aws/aws-lc-rs/actions/runs/15308838405/job/43076967009?pr=816#step:7:648

test hmac_debug ... ok
test hmac_traits ... ok
test hmac_tests ... FAILED
test hmac_thread_safeness ... ok

failures:

---- hmac_tests stdout ----

thread 'hmac_tests' panicked at aws-lc-rs/tests/hmac_test.rs:64:9:
assertion `left == right` failed
  left: true
 right: false

However, we're not seeing any failures during linking on the other platforms.

@workingjubilee workingjubilee added E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example and removed S-has-mcve Status: A Minimal Complete and Verifiable Example has been found for this issue A-linkage Area: linking into static, shared libraries and binaries labels May 29, 2025
@workingjubilee workingjubilee changed the title When LTO is enabled, miscompiles (x86_64-apple-darwin, at least) and failures to link due to unknown relocation type in switch tables (AArch64 Linux/Android, at least) Enabling LTO causes miscompilation on x86_64-apple-darwin May 29, 2025
@workingjubilee
Copy link
Member

I have split out the linkage issue to #141737

This issue only tracks the miscompilation now.

@workingjubilee
Copy link
Member

workingjubilee commented May 29, 2025

We investigated this during the T-compiler meeting on 2025-05-29 and determined

@workingjubilee
Copy link
Member

Let this be a reminder to everyone that every miscompilation issue is its own unique, sparkling star, and it is possible for two such stars, alike in dignity yet still standing in distinction, to fall from the heavenly tapestry of fate and strike the Rust toolchain at about the same time.

@dianqk
Copy link
Member

dianqk commented May 31, 2025

Edit: All issues are temporarily resolved by llvm/llvm-project#142304, which will be included in the next LLVM update.

I have documented everything about #140686, #141306 and #141737 on https://github.com/dianqk/rust-140686-141306-141737. @apiraino We can mention it at the next meeting.

For #140686, I don't have any suggestion for cross-building for macOS on Linux. :
For #141306 (this one), P-critical probably can be removed after the next meeting, since Xcode 15.3 has fixed it.
For #141737, I have request reverting llvm/llvm-project#72584: llvm/llvm-project#72584 (comment).

@dianqk dianqk added C-external-bug Category: issue that is caused by bugs in software beyond our control O-macos Operating system: macOS and removed E-help-wanted Call for participation: Help is requested to fix this issue. E-needs-bisection Call for participation: This issue needs bisection: https://github.com/rust-lang/cargo-bisect-rustc E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example labels May 31, 2025
@nikic
Copy link
Contributor

nikic commented May 31, 2025

@dianqk Are all issues here related to relative lookup tables?

@dianqk
Copy link
Member

dianqk commented May 31, 2025

@dianqk Are all issues here related to relative lookup tables?

Yes, I didn't double-check #141306, but it's using the same minimal reproduction as #141737

@briansmith
Copy link
Contributor Author

For #141306 (this one), P-critical can be removed after the next meeting, since upgrading to Xcode 15.3 is easy!

Is it easy? Many people are using macos-13 runners to ensure that cargo test runs their tests on macOS on native x86_64 CPUs. Last time I checked, macos-13 GitHub Runner images are the latest macOS x86_64 runners. And according to the documentation, 15.2 is the latest version installed on those runners. So it may not be "easy."

Also, Xcode 15.2 is the default Xcode version on those runners.

If we don't want to maintain compatibility with Xcode 15.2, that's fine. But, instead of people's code silently being miscompiled, we should at a minimum ensure that the build stops with an error when Xcode 15.2 (and earlier?) is used, or that LTO isn't used with its linker, or maybe something else. Anything other than silently miscompiling their code.

@dianqk
Copy link
Member

dianqk commented Jun 1, 2025

Is it easy? Many people are using macos-13 runners to ensure that cargo test runs their tests on macOS on native x86_64 CPUs. Last time I checked, macos-13 GitHub Runner images are the latest macOS x86_64 runners. And according to the documentation, 15.2 is the latest version installed on those runners. So it may not be "easy."

Yes, you are right. Xcode 15.2 requires macOS 13.5+, but Xcode 15.3 requires macOS 14.0+.

dianqk added a commit to llvm/llvm-project that referenced this issue Jun 1, 2025
…REL relocations (#142304)

Follow
#72584 (comment),
the patch will drop the `unnamed_addr` attribute when generating
relative lookup tables. I'm not very confident about this patch, but it
does resolve rust-lang/rust#140686,
rust-lang/rust#141306 and
rust-lang/rust#141737.

But I don't think this will result in worse problems.

> LLVM provides that the calculation of such a constant initializer will
not overflow at link time under the medium code model if x is an
unnamed_addr function. However, it does not provide this guarantee for a
constant initializer folded into a function body. This intrinsic can be
used to avoid the possibility of overflows when loading from such a
constant. ([‘llvm.load.relative’
Intrinsic](https://llvm.org/docs/LangRef.html#id2592))

This is my concern. I'm not sure how unnamed_addr provides this
guarantee, and I haven't found any test cases.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this issue Jun 1, 2025
…ating GOTPCREL relocations (#142304)

Follow
llvm/llvm-project#72584 (comment),
the patch will drop the `unnamed_addr` attribute when generating
relative lookup tables. I'm not very confident about this patch, but it
does resolve rust-lang/rust#140686,
rust-lang/rust#141306 and
rust-lang/rust#141737.

But I don't think this will result in worse problems.

> LLVM provides that the calculation of such a constant initializer will
not overflow at link time under the medium code model if x is an
unnamed_addr function. However, it does not provide this guarantee for a
constant initializer folded into a function body. This intrinsic can be
used to avoid the possibility of overflows when loading from such a
constant. ([‘llvm.load.relative’
Intrinsic](https://llvm.org/docs/LangRef.html#id2592))

This is my concern. I'm not sure how unnamed_addr provides this
guarantee, and I haven't found any test cases.
dianqk added a commit to dianqk/llvm-project that referenced this issue Jun 1, 2025
…REL relocations (llvm#142304)

Follow
llvm#72584 (comment),
the patch will drop the `unnamed_addr` attribute when generating
relative lookup tables. I'm not very confident about this patch, but it
does resolve rust-lang/rust#140686,
rust-lang/rust#141306 and
rust-lang/rust#141737.

But I don't think this will result in worse problems.

> LLVM provides that the calculation of such a constant initializer will
not overflow at link time under the medium code model if x is an
unnamed_addr function. However, it does not provide this guarantee for a
constant initializer folded into a function body. This intrinsic can be
used to avoid the possibility of overflows when loading from such a
constant. ([‘llvm.load.relative’
Intrinsic](https://llvm.org/docs/LangRef.html#id2592))

This is my concern. I'm not sure how unnamed_addr provides this
guarantee, and I haven't found any test cases.

(cherry picked from commit aa09dbb)
tstellar pushed a commit to dianqk/llvm-project that referenced this issue Jun 3, 2025
…REL relocations (llvm#142304)

Follow
llvm#72584 (comment),
the patch will drop the `unnamed_addr` attribute when generating
relative lookup tables. I'm not very confident about this patch, but it
does resolve rust-lang/rust#140686,
rust-lang/rust#141306 and
rust-lang/rust#141737.

But I don't think this will result in worse problems.

> LLVM provides that the calculation of such a constant initializer will
not overflow at link time under the medium code model if x is an
unnamed_addr function. However, it does not provide this guarantee for a
constant initializer folded into a function body. This intrinsic can be
used to avoid the possibility of overflows when loading from such a
constant. ([‘llvm.load.relative’
Intrinsic](https://llvm.org/docs/LangRef.html#id2592))

This is my concern. I'm not sure how unnamed_addr provides this
guarantee, and I haven't found any test cases.

(cherry picked from commit aa09dbb)
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this issue Jun 3, 2025
…ating GOTPCREL relocations (#142304)

Follow
llvm/llvm-project#72584 (comment),
the patch will drop the `unnamed_addr` attribute when generating
relative lookup tables. I'm not very confident about this patch, but it
does resolve rust-lang/rust#140686,
rust-lang/rust#141306 and
rust-lang/rust#141737.

But I don't think this will result in worse problems.

> LLVM provides that the calculation of such a constant initializer will
not overflow at link time under the medium code model if x is an
unnamed_addr function. However, it does not provide this guarantee for a
constant initializer folded into a function body. This intrinsic can be
used to avoid the possibility of overflows when loading from such a
constant. ([‘llvm.load.relative’
Intrinsic](https://llvm.org/docs/LangRef.html#id2592))

This is my concern. I'm not sure how unnamed_addr provides this
guarantee, and I haven't found any test cases.

(cherry picked from commit aa09dbb)
@madsmtm madsmtm added the llvm-fixed-upstream Issue expected to be fixed by the next major LLVM upgrade, or backported fixes label Jun 4, 2025
DhruvSrivastavaX pushed a commit to DhruvSrivastavaX/lldb-for-aix that referenced this issue Jun 12, 2025
…REL relocations (llvm#142304)

Follow
llvm#72584 (comment),
the patch will drop the `unnamed_addr` attribute when generating
relative lookup tables. I'm not very confident about this patch, but it
does resolve rust-lang/rust#140686,
rust-lang/rust#141306 and
rust-lang/rust#141737.

But I don't think this will result in worse problems.

> LLVM provides that the calculation of such a constant initializer will
not overflow at link time under the medium code model if x is an
unnamed_addr function. However, it does not provide this guarantee for a
constant initializer folded into a function body. This intrinsic can be
used to avoid the possibility of overflows when loading from such a
constant. ([‘llvm.load.relative’
Intrinsic](https://llvm.org/docs/LangRef.html#id2592))

This is my concern. I'm not sure how unnamed_addr provides this
guarantee, and I haven't found any test cases.
@dianqk dianqk linked a pull request Jun 13, 2025 that will close this issue
bors added a commit that referenced this issue Jun 13, 2025
Update to LLVM 20.1.7

Closes #141306, closes #140686, closes #141737, closes #140933.
bors added a commit that referenced this issue Jun 14, 2025
Update to LLVM 20.1.7

Closes #141306, closes #140686, closes #141737, closes #140933.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. C-external-bug Category: issue that is caused by bugs in software beyond our control llvm-fixed-upstream Issue expected to be fixed by the next major LLVM upgrade, or backported fixes O-macos Operating system: macOS P-critical Critical priority regression-untriaged Untriaged performance or correctness regression. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.