Skip to content

[SPIRV] Investigate usage of report_fatal_error in SPIRVInstructionSelector #124045

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 of 4 tasks
inbelic opened this issue Jan 23, 2025 · 10 comments
Closed
1 of 4 tasks
Assignees
Labels
backend:SPIR-V crash Prefer [crash-on-valid] or [crash-on-invalid]

Comments

@inbelic
Copy link
Contributor

inbelic commented Jan 23, 2025

Throughout SPIRVInstructionSelector there are many uses of report_fatal_error to indicate various errors, however, none of these have any associated testing.

As part of #123853, this pattern was propagated to report an error and a test was added to ensure the correct error message was received. This caused the hwasan build bot to fail.

It is assumed that all of the other report_fatal_error instances would cause a similar failure if executed on the build-bot, or at the least, the other instance that outputs a diagnostics message printing out the current instruction here.

This issue is to track an investigation of which uses of report_fatal_error cause the sanitization build-bot to fail, and apply a fix correspondingly.

Of the top of my head some potential things to check are:
We don't seem to return false after reporting a fatal error
We have allocated a diag message to output through a locally constructed output_stream.

AC:

@EugeneZelenko EugeneZelenko added crash Prefer [crash-on-valid] or [crash-on-invalid] backend:SPIR-V and removed new issue labels Jan 23, 2025
inbelic added a commit that referenced this issue Jan 23, 2025
Reverts #123853

The introduction of `reflect-error.ll` surfaced a bug with the use of
`report_fatal_error` in `SPIRVInstructionSelector` that was propagated
into the pr. This has caused a build-bot breakage, and the work to solve
the underlying issue is tracked here:
#124045. We can re-apply this
commit when the underlying issue is resolved.
github-actions bot pushed a commit to arm/arm-toolchain that referenced this issue Jan 23, 2025
…on"" (#124046)

Reverts llvm/llvm-project#123853

The introduction of `reflect-error.ll` surfaced a bug with the use of
`report_fatal_error` in `SPIRVInstructionSelector` that was propagated
into the pr. This has caused a build-bot breakage, and the work to solve
the underlying issue is tracked here:
llvm/llvm-project#124045. We can re-apply this
commit when the underlying issue is resolved.
@Icohedron
Copy link
Contributor

I will investigate this issue

@damyanp
Copy link
Contributor

damyanp commented Feb 5, 2025

@Icohedron
Copy link
Contributor

I have successfully reproduced the error on a local machine.

The tests for report_fatal_error messages can be modified to succeed under hwasan using one of two methods:

  1. Using not --crash in the RUN lines of the test and changing report_fatal_error(..., /*GenCrashDiag=*/false) to report_fatal_error(..., /*GenCrashDiag=*/true) is one way to modify the test so that it passses hwasan.

  2. Defining HWASAN_OPTIONS="abort_on_error=0" in the environment variables before running the test.

Case 2 is interesting because, according to buildbot_functions.sh in the llvm-zorg repo, hwasan is the only sanitizer that defines HWASAN_OPTIONS="abort_on_error=1" in environment variables.

@Icohedron
Copy link
Contributor

Running the reflect-error.ll test by itself produces an interesting error upon terminating:

> /workspace/llvm-project/build/bin/llc -verify-machineinstrs -O0 -mtriple=spirv64-unknown-unknown /workspace/llvm-project/llvm/test/CodeGen/SPIRV/opencl/reflect-error.ll -o /dev/null
LLVM ERROR: %6:id(<2 x s64>) = G_INTRINSIC intrinsic(@llvm.spv.reflect), %0:vfid(<2 x s64>), %1:vfid(<2 x s64>) is only supported with the GLSL extended instruction set.

pure virtual method called
terminate called without an active exception
Aborted

Running it through lldb, the backtrace is as follows:

(lldb) bt
* thread #1, name = 'llc', stop reason = signal SIGABRT
  * frame #0: 0x0000fffff7a10794 libc.so.6`__pthread_kill_implementation + 340
    frame #1: 0x0000fffff79bb15c libc.so.6`raise + 28
    frame #2: 0x0000fffff79a5a00 libc.so.6`abort + 244
    frame #3: 0x0000aaaaace540d4 llc`__sanitizer::Abort() + 20
    frame #4: 0x0000aaaaace51b50 llc`__sanitizer::Die() + 80
    frame #5: 0x0000aaaaace6866c llc`__lsan::CheckForLeaks() + 676
    frame #6: 0x0000aaaaace686d4 llc`__lsan::DoLeakCheck() + 104
    frame #7: 0x0000fffff79bd9d8 libc.so.6`__cxa_finalize + 472
    frame #8: 0x0000aaaaace1a3bc llc`__do_global_dtors_aux + 48
    frame #9: 0x0000fffff7fbede8 ld-linux-aarch64.so.1`_dl_call_fini + 104
    frame #10: 0x0000fffff7fc1c00 ld-linux-aarch64.so.1`_dl_fini + 416
    frame #11: 0x0000fffff79bdfe8 libc.so.6`__run_exit_handlers + 328
    frame #12: 0x0000fffff79be0c4 libc.so.6`exit + 28
    frame #13: 0x0000aaaab077e07c llc`llvm::report_fatal_error(Reason=<unavailable>, GenCrashDiag=fa
    frame #14: 0x0000aaaab077de24 llc`llvm::report_fatal_error(Reason=<unavailable>, GenCrashDiag=<u
    frame #15: 0x0000aaaaad142dec llc`(anonymous namespace)::SPIRVInstructionSelector::selectExtInst
    frame #16: 0x0000aaaaad121e88 llc`(anonymous namespace)::SPIRVInstructionSelector::selectIntrinsic(this=0x3e00ed9dfffe0000, ResVReg=(Reg = 2147483654), ResType=0xad00edfdfffe87d0, I=0xad00edfdfffe8f98) const at SPIRVInstructionSelector.cpp:3040:12
    frame #17: 0x0000aaaaad0faea4 llc`(anonymous namespace)::SPIRVInstructionSelector::select(llvm::MachineInstr&) [inlined] (anonymous namespace)::SPIRVInstructionSelector::spvSelect(this=0x3e00ed9dfffe0000, ResVReg=(Reg = 2147483654), ResType=0xad00edfdfffe87d0, I=0xad00edfdfffe8f98) const at SPIRVInstructionSelector.cpp:550:12
    frame #18: 0x0000aaaaad0fae68 llc`(anonymous namespace)::SPIRVInstructionSelector::select(this=<unavailable>, I=<unavailable>) at SPIRVInstructionSelector.cpp:494:7
    frame #19: 0x0000aaaab13555b4 llc`llvm::InstructionSelect::selectMachineFunction(this=0x0300ec5dfffe0480, MF=0x4e00ed8dffff3100) at InstructionSelect.cpp:216:14
    frame #20: 0x0000aaaab1354464 llc`llvm::InstructionSelect::runOnMachineFunction(this=0x0300ec5dfffe0480, MF=0x4e00ed8dffff3100) at InstructionSelect.cpp:156:10
    frame #21: 0x0000aaaaae556328 llc`llvm::MachineFunctionPass::runOnFunction(this=<unavailable>, F=0xd900ec8dfffe0098) at MachineFunctionPass.cpp:94:13
    frame #22: 0x0000aaaaaf17ad38 llc`llvm::FPPassManager::runOnFunction(this=<unavailable>, F=0xd900ec8dfffe0098) at LegacyPassManager.cpp:1406:27
    frame #23: 0x0000aaaaaf18c8dc llc`llvm::FPPassManager::runOnModule(this=0xa000ed2dfffe0540, M=<unavailable>) at LegacyPassManager.cpp:1452:16
    frame #24: 0x0000aaaaaf17bc70 llc`llvm::legacy::PassManagerImpl::run(llvm::Module&) [inlined] (anonymous namespace)::MPPassManager::runOnModule(this=<unavailable>, M=<unavailable>) at LegacyPassManager.cpp:1521:27
    frame #25: 0x0000aaaaaf17ba24 llc`llvm::legacy::PassManagerImpl::run(this=<unavailable>, M=0xfc00ed6dfffe0e00) at LegacyPassManager.cpp:539:44
    frame #26: 0x0000aaaaace7a494 llc`compileModule(argv=0x0000ffffffffc028, Context=0x4800ffffffffbcb0) at llc.cpp:751:8
    frame #27: 0x0000aaaaace750d4 llc`main(argc=7, argv=0x0000ffffffffc028) at llc.cpp:411:22
    frame #28: 0x0000fffff79a62b4 libc.so.6`__libc_start_call_main + 116
    frame #29: 0x0000fffff79a6398 libc.so.6`__libc_start_main@@GLIBC_2.34 + 152
    frame #30: 0x0000aaaaace1a2f0 llc`_start + 48

@Icohedron
Copy link
Contributor

Icohedron commented Feb 19, 2025

Running the reflect-error.ll test by itself produces an interesting error upon terminating:

> /workspace/llvm-project/build/bin/llc -verify-machineinstrs -O0 -mtriple=spirv64-unknown-unknown /workspace/llvm-project/llvm/test/CodeGen/SPIRV/opencl/reflect-error.ll -o /dev/null
LLVM ERROR: %6:id(<2 x s64>) = G_INTRINSIC intrinsic(@llvm.spv.reflect), %0:vfid(<2 x s64>), %1:vfid(<2 x s64>) is only supported with the GLSL extended instruction set.

pure virtual method called
terminate called without an active exception
Aborted

Running it through lldb, the backtrace is as follows:

(lldb) bt
* thread #1, name = 'llc', stop reason = signal SIGABRT
  * frame #0: 0x0000fffff7a10794 libc.so.6`__pthread_kill_implementation + 340
    frame #1: 0x0000fffff79bb15c libc.so.6`raise + 28
    frame #2: 0x0000fffff79a5a00 libc.so.6`abort + 244
    frame #3: 0x0000aaaaace540d4 llc`__sanitizer::Abort() + 20
    frame #4: 0x0000aaaaace51b50 llc`__sanitizer::Die() + 80
    frame #5: 0x0000aaaaace6866c llc`__lsan::CheckForLeaks() + 676
    frame #6: 0x0000aaaaace686d4 llc`__lsan::DoLeakCheck() + 104
    frame #7: 0x0000fffff79bd9d8 libc.so.6`__cxa_finalize + 472
    frame #8: 0x0000aaaaace1a3bc llc`__do_global_dtors_aux + 48
    frame #9: 0x0000fffff7fbede8 ld-linux-aarch64.so.1`_dl_call_fini + 104
    frame #10: 0x0000fffff7fc1c00 ld-linux-aarch64.so.1`_dl_fini + 416
    frame #11: 0x0000fffff79bdfe8 libc.so.6`__run_exit_handlers + 328
    frame #12: 0x0000fffff79be0c4 libc.so.6`exit + 28
    frame #13: 0x0000aaaab077e07c llc`llvm::report_fatal_error(Reason=<unavailable>, GenCrashDiag=fa
    frame #14: 0x0000aaaab077de24 llc`llvm::report_fatal_error(Reason=<unavailable>, GenCrashDiag=<u
    frame #15: 0x0000aaaaad142dec llc`(anonymous namespace)::SPIRVInstructionSelector::selectExtInst
    frame #16: 0x0000aaaaad121e88 llc`(anonymous namespace)::SPIRVInstructionSelector::selectIntrinsic(this=0x3e00ed9dfffe0000, ResVReg=(Reg = 2147483654), ResType=0xad00edfdfffe87d0, I=0xad00edfdfffe8f98) const at SPIRVInstructionSelector.cpp:3040:12
    frame #17: 0x0000aaaaad0faea4 llc`(anonymous namespace)::SPIRVInstructionSelector::select(llvm::MachineInstr&) [inlined] (anonymous namespace)::SPIRVInstructionSelector::spvSelect(this=0x3e00ed9dfffe0000, ResVReg=(Reg = 2147483654), ResType=0xad00edfdfffe87d0, I=0xad00edfdfffe8f98) const at SPIRVInstructionSelector.cpp:550:12
    frame #18: 0x0000aaaaad0fae68 llc`(anonymous namespace)::SPIRVInstructionSelector::select(this=<unavailable>, I=<unavailable>) at SPIRVInstructionSelector.cpp:494:7
    frame #19: 0x0000aaaab13555b4 llc`llvm::InstructionSelect::selectMachineFunction(this=0x0300ec5dfffe0480, MF=0x4e00ed8dffff3100) at InstructionSelect.cpp:216:14
    frame #20: 0x0000aaaab1354464 llc`llvm::InstructionSelect::runOnMachineFunction(this=0x0300ec5dfffe0480, MF=0x4e00ed8dffff3100) at InstructionSelect.cpp:156:10
    frame #21: 0x0000aaaaae556328 llc`llvm::MachineFunctionPass::runOnFunction(this=<unavailable>, F=0xd900ec8dfffe0098) at MachineFunctionPass.cpp:94:13
    frame #22: 0x0000aaaaaf17ad38 llc`llvm::FPPassManager::runOnFunction(this=<unavailable>, F=0xd900ec8dfffe0098) at LegacyPassManager.cpp:1406:27
    frame #23: 0x0000aaaaaf18c8dc llc`llvm::FPPassManager::runOnModule(this=0xa000ed2dfffe0540, M=<unavailable>) at LegacyPassManager.cpp:1452:16
    frame #24: 0x0000aaaaaf17bc70 llc`llvm::legacy::PassManagerImpl::run(llvm::Module&) [inlined] (anonymous namespace)::MPPassManager::runOnModule(this=<unavailable>, M=<unavailable>) at LegacyPassManager.cpp:1521:27
    frame #25: 0x0000aaaaaf17ba24 llc`llvm::legacy::PassManagerImpl::run(this=<unavailable>, M=0xfc00ed6dfffe0e00) at LegacyPassManager.cpp:539:44
    frame #26: 0x0000aaaaace7a494 llc`compileModule(argv=0x0000ffffffffc028, Context=0x4800ffffffffbcb0) at llc.cpp:751:8
    frame #27: 0x0000aaaaace750d4 llc`main(argc=7, argv=0x0000ffffffffc028) at llc.cpp:411:22
    frame #28: 0x0000fffff79a62b4 libc.so.6`__libc_start_call_main + 116
    frame #29: 0x0000fffff79a6398 libc.so.6`__libc_start_main@@GLIBC_2.34 + 152
    frame #30: 0x0000aaaaace1a2f0 llc`_start + 48

The "pure virtual method called" error does not appear when -o /dev/null is replaced with -o -.

The reflect-error.ll test passes by making the same modifications to the RUN lines. So this is another method by which the test can be made to pass under hwasan.

@farzonl
Copy link
Member

farzonl commented Feb 19, 2025

terminate called without an active exception
Sounds like a threading issue causing a nondeterministic exit. Might just be because aborts don't call destructors. But I bet you could cause this same problem with exit(1) not just abort()

The pure virtual method called is probably because SPIRVInstructionSelector::select is a virtual method.

@Icohedron
Copy link
Contributor

terminate called without an active exception Sounds like a threading issue causing a nondeterministic exit. Might just be because aborts don't call destructors. But I bet you could cause this same problem with exit(1) not just abort()

The pure virtual method called is probably because SPIRVInstructionSelector::select is a virtual method.

The pure virtual method called terminate called without an active exception was produced with report_fatal_error(..., false) which is an exit(1). I haven't yet been able to produce the same error with an abort().

@farzonl
Copy link
Member

farzonl commented Feb 20, 2025

produced with report_fatal_error(..., false) which is an exit(1)

Ah that makes sense looks like someone fixed a variation of this bug like so 4fed39d

@Icohedron
Copy link
Contributor

Icohedron commented Feb 25, 2025

Running the reflect-error.ll test by itself produces an interesting error upon terminating:

> /workspace/llvm-project/build/bin/llc -verify-machineinstrs -O0 -mtriple=spirv64-unknown-unknown /workspace/llvm-project/llvm/test/CodeGen/SPIRV/opencl/reflect-error.ll -o /dev/null
LLVM ERROR: %6:id(<2 x s64>) = G_INTRINSIC intrinsic(@llvm.spv.reflect), %0:vfid(<2 x s64>), %1:vfid(<2 x s64>) is only supported with the GLSL extended instruction set.

pure virtual method called
terminate called without an active exception
Aborted

Running it through lldb, the backtrace is as follows:

(lldb) bt
* thread #1, name = 'llc', stop reason = signal SIGABRT
  * frame #0: 0x0000fffff7a10794 libc.so.6`__pthread_kill_implementation + 340
    frame #1: 0x0000fffff79bb15c libc.so.6`raise + 28
    frame #2: 0x0000fffff79a5a00 libc.so.6`abort + 244
    frame #3: 0x0000aaaaace540d4 llc`__sanitizer::Abort() + 20
    frame #4: 0x0000aaaaace51b50 llc`__sanitizer::Die() + 80
    frame #5: 0x0000aaaaace6866c llc`__lsan::CheckForLeaks() + 676
    frame #6: 0x0000aaaaace686d4 llc`__lsan::DoLeakCheck() + 104
    frame #7: 0x0000fffff79bd9d8 libc.so.6`__cxa_finalize + 472
    frame #8: 0x0000aaaaace1a3bc llc`__do_global_dtors_aux + 48
    frame #9: 0x0000fffff7fbede8 ld-linux-aarch64.so.1`_dl_call_fini + 104
    frame #10: 0x0000fffff7fc1c00 ld-linux-aarch64.so.1`_dl_fini + 416
    frame #11: 0x0000fffff79bdfe8 libc.so.6`__run_exit_handlers + 328
    frame #12: 0x0000fffff79be0c4 libc.so.6`exit + 28
    frame #13: 0x0000aaaab077e07c llc`llvm::report_fatal_error(Reason=<unavailable>, GenCrashDiag=fa
    frame #14: 0x0000aaaab077de24 llc`llvm::report_fatal_error(Reason=<unavailable>, GenCrashDiag=<u
    frame #15: 0x0000aaaaad142dec llc`(anonymous namespace)::SPIRVInstructionSelector::selectExtInst
    frame #16: 0x0000aaaaad121e88 llc`(anonymous namespace)::SPIRVInstructionSelector::selectIntrinsic(this=0x3e00ed9dfffe0000, ResVReg=(Reg = 2147483654), ResType=0xad00edfdfffe87d0, I=0xad00edfdfffe8f98) const at SPIRVInstructionSelector.cpp:3040:12
    frame #17: 0x0000aaaaad0faea4 llc`(anonymous namespace)::SPIRVInstructionSelector::select(llvm::MachineInstr&) [inlined] (anonymous namespace)::SPIRVInstructionSelector::spvSelect(this=0x3e00ed9dfffe0000, ResVReg=(Reg = 2147483654), ResType=0xad00edfdfffe87d0, I=0xad00edfdfffe8f98) const at SPIRVInstructionSelector.cpp:550:12
    frame #18: 0x0000aaaaad0fae68 llc`(anonymous namespace)::SPIRVInstructionSelector::select(this=<unavailable>, I=<unavailable>) at SPIRVInstructionSelector.cpp:494:7
    frame #19: 0x0000aaaab13555b4 llc`llvm::InstructionSelect::selectMachineFunction(this=0x0300ec5dfffe0480, MF=0x4e00ed8dffff3100) at InstructionSelect.cpp:216:14
    frame #20: 0x0000aaaab1354464 llc`llvm::InstructionSelect::runOnMachineFunction(this=0x0300ec5dfffe0480, MF=0x4e00ed8dffff3100) at InstructionSelect.cpp:156:10
    frame #21: 0x0000aaaaae556328 llc`llvm::MachineFunctionPass::runOnFunction(this=<unavailable>, F=0xd900ec8dfffe0098) at MachineFunctionPass.cpp:94:13
    frame #22: 0x0000aaaaaf17ad38 llc`llvm::FPPassManager::runOnFunction(this=<unavailable>, F=0xd900ec8dfffe0098) at LegacyPassManager.cpp:1406:27
    frame #23: 0x0000aaaaaf18c8dc llc`llvm::FPPassManager::runOnModule(this=0xa000ed2dfffe0540, M=<unavailable>) at LegacyPassManager.cpp:1452:16
    frame #24: 0x0000aaaaaf17bc70 llc`llvm::legacy::PassManagerImpl::run(llvm::Module&) [inlined] (anonymous namespace)::MPPassManager::runOnModule(this=<unavailable>, M=<unavailable>) at LegacyPassManager.cpp:1521:27
    frame #25: 0x0000aaaaaf17ba24 llc`llvm::legacy::PassManagerImpl::run(this=<unavailable>, M=0xfc00ed6dfffe0e00) at LegacyPassManager.cpp:539:44
    frame #26: 0x0000aaaaace7a494 llc`compileModule(argv=0x0000ffffffffc028, Context=0x4800ffffffffbcb0) at llc.cpp:751:8
    frame #27: 0x0000aaaaace750d4 llc`main(argc=7, argv=0x0000ffffffffc028) at llc.cpp:411:22
    frame #28: 0x0000fffff79a62b4 libc.so.6`__libc_start_call_main + 116
    frame #29: 0x0000fffff79a6398 libc.so.6`__libc_start_main@@GLIBC_2.34 + 152
    frame #30: 0x0000aaaaace1a2f0 llc`_start + 48

The "pure virtual method called" error does not appear when -o /dev/null is replaced with -o -.

The reflect-error.ll test passes by making the same modifications to the RUN lines. So this is another method by which the test can be made to pass under hwasan.

This bug has been fixed recently. See #125599 (comment) and #125599 (comment)
The test no longer fails with -o /dev/null both locally and on the aarch64 hwasan buildbot

@Icohedron
Copy link
Contributor

Icohedron commented Mar 5, 2025

Closing this issue since the issue has been fixed (see above comment) and there are actually some other tests that exist which check for report_fatal_error from this file (such as vararr.ll)

@github-project-automation github-project-automation bot moved this from Active to Closed in HLSL Support Mar 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:SPIR-V crash Prefer [crash-on-valid] or [crash-on-invalid]
Projects
Status: Closed
Development

No branches or pull requests

6 participants