Skip to content

Clang -fms-hotpatch elides memory allocation calls #76879

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tycho opened this issue Jan 3, 2024 · 6 comments · Fixed by #77245
Closed

Clang -fms-hotpatch elides memory allocation calls #76879

tycho opened this issue Jan 3, 2024 · 6 comments · Fixed by #77245
Assignees
Labels
llvm:optimizations question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead!

Comments

@tycho
Copy link
Member

tycho commented Jan 3, 2024

I built my application using clang-cl with optimizations enabled and the /hotpatch flag (to try out Live++). It compiled and linked successfully, but when I try to run it, the very first memory allocation hits an int 3 instruction. It seems all the memory allocation functions (default malloc, default new/new[], custom malloc/new, etc) are just a nop followed by several int 3s. It seems to have assumed it's unreachable code or something?

For example, a somewhat reduced test case (and here's a Godbolt link):

typedef unsigned long long size_t;

extern "C" {
    void* mi_new(size_t size);
}

void *mi_new_test(size_t count)
{
    return mi_new(count);
}

void *builtin_malloc_test(size_t count)
{
    return __builtin_malloc(count);
}

When buit with -O0 -target x86_64-pc-windows-msvc19.38.33133 -fms-hotpatch, it yields fairly reasonable (if unoptimized) code:

"?mi_new_test@@YAPEAX_K@Z":             # @"?mi_new_test@@YAPEAX_K@Z"
        sub     rsp, 40
        mov     qword ptr [rsp + 32], rcx
        mov     rcx, qword ptr [rsp + 32]
        call    mi_new
        nop
        add     rsp, 40
        ret
"?builtin_malloc_test@@YAPEAX_K@Z":     # @"?builtin_malloc_test@@YAPEAX_K@Z"
        sub     rsp, 40
        mov     qword ptr [rsp + 32], rcx
        mov     rcx, qword ptr [rsp + 32]
        call    malloc
        nop
        add     rsp, 40
        ret

When -O2 (or indeed any -O flag higher than 0) is used, it emits this:

"?mi_new_test@@YAPEAX_K@Z":             # @"?mi_new_test@@YAPEAX_K@Z"
        xchg    ax, ax

"?builtin_malloc_test@@YAPEAX_K@Z":     # @"?builtin_malloc_test@@YAPEAX_K@Z"
        xchg    ax, ax

What happened here?

@github-actions github-actions bot added the clang Clang issues not falling into any other category label Jan 3, 2024
@EugeneZelenko EugeneZelenko added question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead! llvm:optimizations and removed clang Clang issues not falling into any other category labels Jan 3, 2024
@tycho
Copy link
Member Author

tycho commented Jan 4, 2024

@EugeneZelenko Why the "question" tag? This is clearly a bug somehow.

@tycho
Copy link
Member Author

tycho commented Jan 4, 2024

Interesting, the LLVM IR appears fine:

define dso_local noundef ptr @"?mi_new_test@@YAPEAX_K@Z"(i64 noundef %count) local_unnamed_addr {
entry:
  %call = tail call ptr @mi_new(i64 noundef %count)
  ret ptr %call
}

declare dso_local ptr @mi_new(i64 noundef) local_unnamed_addr #1

define dso_local noalias noundef ptr @"?builtin_malloc_test@@YAPEAX_K@Z"(i64 noundef %count) local_unnamed_addr {
entry:
  %call = tail call ptr @malloc(i64 noundef %count) #5
  ret ptr %call
}

declare dso_local noalias noundef ptr @malloc(i64 noundef) local_unnamed_addr #3

declare void @llvm.dbg.value(metadata, metadata, metadata) #4

And if you use aarch64-pc-windows-msvc it generates branch instructions to the appropriate functions.

"?mi_new_test@@YAPEAX_K@Z":             // @"?mi_new_test@@YAPEAX_K@Z"
        b       mi_new
"?builtin_malloc_test@@YAPEAX_K@Z":     // @"?builtin_malloc_test@@YAPEAX_K@Z"
        b       malloc

So this is a bug somewhere in the X86 backend.

@tycho
Copy link
Member Author

tycho commented Jan 4, 2024

Also adding -fno-optimize-sibling-calls appears to unbreak the memory allocation functions (I'll retest my app with that...). So tail-call optimization + MS hotpatch on x86 is somehow broken then?

@shafik
Copy link
Collaborator

shafik commented Jan 4, 2024

Looks like we have a crash bug if we run w/ assertions turned on: https://gcc.godbolt.org/z/6e1cWP3qG

Assertion:

clang++: /root/llvm-project/llvm/lib/CodeGen/MachineFunction.cpp:446:
void llvm::MachineFunction::deleteMachineInstr(llvm::MachineInstr*):
Assertion `(!MI->isCandidateForCallSiteEntry() || !CallSitesInfo.contains(MI)) && "Call site info was not updated!"' failed.

Backtrace:

PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: /opt/compiler-explorer/clang-assertions-trunk/bin/clang++ -gdwarf-4 -g -o /app/output.s -mllvm --x86-asm-syntax=intel -S --gcc-toolchain=/opt/compiler-explorer/gcc-snapshot -fcolor-diagnostics -fno-crash-diagnostics -O2 -fms-hotpatch <source>
1.	<eof> parser at end of file
2.	Code generation
3.	Running pass 'Function Pass Manager' on module '<source>'.
4.	Running pass 'Implement the 'patchable-function' attribute' on function '@_Z11mi_new_testy'
 #0 0x00000000037d89e8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x37d89e8)
 #1 0x00000000037d66cc llvm::sys::CleanupOnSignal(unsigned long) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x37d66cc)
 #2 0x000000000371ef18 CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0
 #3 0x00007f1bbd242520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #4 0x00007f1bbd2969fc pthread_kill (/lib/x86_64-linux-gnu/libc.so.6+0x969fc)
 #5 0x00007f1bbd242476 gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x42476)
 #6 0x00007f1bbd2287f3 abort (/lib/x86_64-linux-gnu/libc.so.6+0x287f3)
 #7 0x00007f1bbd22871b (/lib/x86_64-linux-gnu/libc.so.6+0x2871b)
 #8 0x00007f1bbd239e96 (/lib/x86_64-linux-gnu/libc.so.6+0x39e96)
 #9 0x0000000002bbddcc llvm::MachineFunction::deleteMachineInstr(llvm::MachineInstr*) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x2bbddcc)
#10 0x0000000002bd5caa llvm::MachineInstr::eraseFromParent() (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x2bd5caa)
#11 0x0000000002cd581b (anonymous namespace)::PatchableFunction::runOnMachineFunction(llvm::MachineFunction&) PatchableFunction.cpp:0:0
#12 0x0000000002bcbee1 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (.part.0) MachineFunctionPass.cpp:0:0
#13 0x000000000318b599 llvm::FPPassManager::runOnFunction(llvm::Function&) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x318b599)
#14 0x000000000318b7d1 llvm::FPPassManager::runOnModule(llvm::Module&) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x318b7d1)
#15 0x000000000318c012 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x318c012)
#16 0x0000000003a51276 (anonymous namespace)::EmitAssemblyHelper::EmitAssembly(clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>, clang::BackendConsumer*) BackendUtil.cpp:0:0
#17 0x0000000003a51601 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>, clang::BackendConsumer*) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x3a51601)
#18 0x000000000405456c clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x405456c)
#19 0x0000000005fc9449 clang::ParseAST(clang::Sema&, bool, bool) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x5fc9449)
#20 0x0000000004053908 clang::CodeGenAction::ExecuteAction() (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x4053908)
#21 0x00000000042c0a29 clang::FrontendAction::Execute() (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x42c0a29)
#22 0x000000000423fa7e clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x423fa7e)
#23 0x00000000043a032e clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x43a032e)
#24 0x0000000000c05686 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0xc05686)
#25 0x0000000000bfcf4a ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0
#26 0x0000000004094a49 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const::'lambda'()>(long) Job.cpp:0:0
#27 0x000000000371f3c4 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x371f3c4)
#28 0x000000000409503f clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const (.part.0) Job.cpp:0:0
#29 0x000000000405d1a5 clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&, bool) const (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x405d1a5)
#30 0x000000000405dc0d clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&, bool) const (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x405dc0d)
#31 0x0000000004065b05 clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0x4065b05)
#32 0x0000000000c02b0c clang_main(int, char**, llvm::ToolContext const&) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0xc02b0c)
#33 0x0000000000afb801 main (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0xafb801)
#34 0x00007f1bbd229d90 (/lib/x86_64-linux-gnu/libc.so.6+0x29d90)
#35 0x00007f1bbd229e40 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e40)
#36 0x0000000000bfca2e _start (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+++0xbfca2e)
clang++: error: clang frontend command failed with exit code 134 (use -v to see invocation)
Compiler returned: 134

@shafik
Copy link
Collaborator

shafik commented Jan 4, 2024

CC @lhames

@aganea
Copy link
Member

aganea commented Jan 7, 2024

Please see #77245 for a fix.

When -O2 (or indeed any -O flag higher than 0) is used, it emits this:

"?mi_new_test@@YAPEAX_K@Z":             # @"?mi_new_test@@YAPEAX_K@Z"
        xchg    ax, ax

"?builtin_malloc_test@@YAPEAX_K@Z":     # @"?builtin_malloc_test@@YAPEAX_K@Z"
        xchg    ax, ax

What happened here?

The tail call pseudo instruction isn't handled directly by the encoder, it needs translation first. We used that to determine the size of the instruction and see if we needed to insert NOPs or not. Turned out the encoded size was 0 thus we only emitted NOPs and nothing else.

And if you use aarch64-pc-windows-msvc it generates branch instructions to the appropriate functions.

"?mi_new_test@@YAPEAX_K@Z":             // @"?mi_new_test@@YAPEAX_K@Z"
        b       mi_new
"?builtin_malloc_test@@YAPEAX_K@Z":     // @"?builtin_malloc_test@@YAPEAX_K@Z"
        b       malloc

So this is a bug somewhere in the X86 backend.

Indeed. ARM64 doesn't go through the PatchableFunction pass since instructions are already fixed size and jumps can be patched atomically.

Looks like we have a crash bug if we run w/ assertions turned on: https://gcc.godbolt.org/z/6e1cWP3qG

Fixed that too, thanks for pointing it out!

aganea added a commit that referenced this issue Jan 22, 2024
…77245)

Previously, tail jump pseudo-opcodes were skipped by the
`encodeInstruction()` call inside `X86AsmPrinter::LowerPATCHABLE_OP`.
This caused emission of a 2-byte NOP and dropping of the tail jump.

With this PR, we change `PATCHABLE_OP` to not wrap the first
`MachineInstr` anymore, but inserting itself before,
leaving the instruction unaltered. At lowering time in `X86AsmPrinter`,
we now "look ahead" for the next non-pseudo `MachineInstr` and
lower+encode it, to inspect its size. If the size is below what
`PATCHABLE_OP` expects, it inserts NOPs; otherwise it does nothing. That
way, now the first `MachineInstr` is always lowered as usual even if
`"patchable-function"="prologue-short-redirect"` is used.

Fixes #76879,
#76958 and
#59039
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:optimizations question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants