-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[CodeGen][X86] Fix lowering of tailcalls when -ms-hotpatch
is used
#77245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Previously, tail jump pseudo-opcodes were skipped by the `encodeInstruction()` call inside `X86AsmPrinter::LowerPATCHABLE_OP`. This caused emission of a 2-byte NOP and dropping of the tail jump Also, the `PatchableFunction` pass didn't properly update the call site info for the existing MachineInstr, after wrapping it into a `TargetOpcode::PATCHABLE_OP`.
@llvm/pr-subscribers-llvm-support @llvm/pr-subscribers-backend-x86 Author: Alexandre Ganea (aganea) ChangesPreviously, tail jump pseudo-opcodes were skipped by the Also, the Full diff: https://github.com/llvm/llvm-project/pull/77245.diff 3 Files Affected:
diff --git a/llvm/lib/CodeGen/PatchableFunction.cpp b/llvm/lib/CodeGen/PatchableFunction.cpp
index 9449f143366f0f..768e8dff2efc24 100644
--- a/llvm/lib/CodeGen/PatchableFunction.cpp
+++ b/llvm/lib/CodeGen/PatchableFunction.cpp
@@ -87,6 +87,9 @@ bool PatchableFunction::runOnMachineFunction(MachineFunction &MF) {
for (auto &MO : FirstActualI->operands())
MIB.add(MO);
+ if (FirstActualI->shouldUpdateCallSiteInfo())
+ MF.eraseCallSiteInfo(&*FirstActualI);
+
FirstActualI->eraseFromParent();
MF.ensureAlignment(Align(16));
return true;
diff --git a/llvm/lib/Target/X86/X86MCInstLower.cpp b/llvm/lib/Target/X86/X86MCInstLower.cpp
index e1a67f61e76640..93aa0fc23eb9f2 100644
--- a/llvm/lib/Target/X86/X86MCInstLower.cpp
+++ b/llvm/lib/Target/X86/X86MCInstLower.cpp
@@ -959,7 +959,8 @@ void X86AsmPrinter::LowerPATCHABLE_OP(const MachineInstr &MI,
bool EmptyInst = (Opcode == TargetOpcode::PATCHABLE_OP);
MCInst MCI;
- MCI.setOpcode(Opcode);
+ // Make sure below we don't encode pseudo tailcalls.
+ MCI.setOpcode(convertTailJumpOpcode(Opcode));
for (auto &MO : drop_begin(MI.operands(), 2))
if (auto MaybeOperand = MCIL.LowerMachineOperand(&MI, MO))
MCI.addOperand(*MaybeOperand);
@@ -994,8 +995,11 @@ void X86AsmPrinter::LowerPATCHABLE_OP(const MachineInstr &MI,
(void)NopSize;
}
}
- if (!EmptyInst)
+ if (!EmptyInst) {
+ if (Opcode != convertTailJumpOpcode(Opcode))
+ OutStreamer->AddComment("TAILCALL");
OutStreamer->emitInstruction(MCI, getSubtargetInfo());
+ }
}
// Lower a stackmap of the form:
diff --git a/llvm/test/CodeGen/X86/patchable-prologue-tailcall.ll b/llvm/test/CodeGen/X86/patchable-prologue-tailcall.ll
new file mode 100644
index 00000000000000..ab3ff5852a74b0
--- /dev/null
+++ b/llvm/test/CodeGen/X86/patchable-prologue-tailcall.ll
@@ -0,0 +1,69 @@
+; RUN: llc -verify-machineinstrs < %s | FileCheck %s --check-prefix=CHECK
+
+; CHECK: "?mi_new_test@@YAPEAX_K@Z":
+; CHECK-NEXT: # %bb.0:
+; CHECK-NEXT: jmp mi_new
+
+; CHECK: "?builtin_malloc_test@@YAPEAX_K@Z":
+; CHECK-NEXT: # %bb.0:
+; CHECK-NEXT: jmp malloc
+
+; // Built with: clang-cl.exe /c patchable-prologue-tailcall.cpp /O2 /hotpatch -Xclang -emit-llvm
+;
+; typedef unsigned long long size_t;
+;
+; extern "C" {
+; void* mi_new(size_t size);
+; }
+;
+; void *mi_new_test(size_t count)
+; {
+; return mi_new(count);
+; }
+;
+; void *builtin_malloc_test(size_t count)
+; {
+; return __builtin_malloc(count);
+; }
+
+; ModuleID = 'patchable-prologue-tailcall.cpp'
+source_filename = "patchable-prologue-tailcall.cpp"
+target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-pc-windows-msvc19.38.33133"
+
+; Function Attrs: mustprogress nounwind sspstrong uwtable
+define dso_local noundef ptr @"?mi_new_test@@YAPEAX_K@Z"(i64 noundef %count) local_unnamed_addr #0 {
+entry:
+ %call = tail call ptr @mi_new(i64 noundef %count) #4
+ ret ptr %call
+}
+
+declare dso_local ptr @mi_new(i64 noundef) local_unnamed_addr #1
+
+; Function Attrs: mustprogress nofree nounwind sspstrong willreturn memory(inaccessiblemem: readwrite) uwtable
+define dso_local noalias noundef ptr @"?builtin_malloc_test@@YAPEAX_K@Z"(i64 noundef %count) local_unnamed_addr #2 {
+entry:
+ %call = tail call ptr @malloc(i64 noundef %count) #4
+ ret ptr %call
+}
+
+; Function Attrs: mustprogress nofree nounwind willreturn allockind("alloc,uninitialized") allocsize(0) memory(inaccessiblemem: readwrite)
+declare dso_local noalias noundef ptr @malloc(i64 noundef) local_unnamed_addr #3
+
+attributes #0 = { mustprogress nounwind sspstrong uwtable "min-legal-vector-width"="0" "no-trapping-math"="true" "patchable-function"="prologue-short-redirect" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
+attributes #1 = { "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
+attributes #2 = { mustprogress nofree nounwind sspstrong willreturn memory(inaccessiblemem: readwrite) uwtable "min-legal-vector-width"="0" "no-trapping-math"="true" "patchable-function"="prologue-short-redirect" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
+attributes #3 = { mustprogress nofree nounwind willreturn allockind("alloc,uninitialized") allocsize(0) memory(inaccessiblemem: readwrite) "alloc-family"="malloc" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
+attributes #4 = { nounwind }
+
+!llvm.linker.options = !{!0, !1}
+!llvm.module.flags = !{!2, !3, !4, !5}
+!llvm.ident = !{!6}
+
+!0 = !{!"/DEFAULTLIB:libcmt.lib"}
+!1 = !{!"/DEFAULTLIB:oldnames.lib"}
+!2 = !{i32 1, !"wchar_size", i32 2}
+!3 = !{i32 8, !"PIC Level", i32 2}
+!4 = !{i32 7, !"uwtable", i32 2}
+!5 = !{i32 1, !"MaxTLSAlign", i32 65536}
+!6 = !{!"clang version 18.0.0git (https://github.com/llvm/llvm-project.git 0ccf9e29e9626a3a6813b35608c8959178c50a6f)"}
|
; ModuleID = 'patchable-prologue-tailcall.cpp' | ||
source_filename = "patchable-prologue-tailcall.cpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete?
attributes #0 = { mustprogress nounwind sspstrong uwtable "min-legal-vector-width"="0" "no-trapping-math"="true" "patchable-function"="prologue-short-redirect" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" } | ||
attributes #1 = { "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" } | ||
attributes #2 = { mustprogress nofree nounwind sspstrong willreturn memory(inaccessiblemem: readwrite) uwtable "min-legal-vector-width"="0" "no-trapping-math"="true" "patchable-function"="prologue-short-redirect" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" } | ||
attributes #3 = { mustprogress nofree nounwind willreturn allockind("alloc,uninitialized") allocsize(0) memory(inaccessiblemem: readwrite) "alloc-family"="malloc" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep the attribute "patchable-function"="prologue-short-redirect"
only?
!3 = !{i32 8, !"PIC Level", i32 2} | ||
!4 = !{i32 7, !"uwtable", i32 2} | ||
!5 = !{i32 1, !"MaxTLSAlign", i32 65536} | ||
!6 = !{!"clang version 18.0.0git (https://github.com/llvm/llvm-project.git 0ccf9e29e9626a3a6813b35608c8959178c50a6f)"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove !0, !1, !6?
if (!EmptyInst) | ||
if (!EmptyInst) { | ||
if (Opcode != convertTailJumpOpcode(Opcode)) | ||
OutStreamer->AddComment("TAILCALL"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is not tested at all?
@@ -87,6 +87,9 @@ bool PatchableFunction::runOnMachineFunction(MachineFunction &MF) { | |||
for (auto &MO : FirstActualI->operands()) | |||
MIB.add(MO); | |||
|
|||
if (FirstActualI->shouldUpdateCallSiteInfo()) | |||
MF.eraseCallSiteInfo(&*FirstActualI); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we don't need this. According to the description above, PatchableFunction
needs the first instruction is two-byte long.
http://xxeo.com/single-byte-or-small-x86-opcodes
One-byte instruction is rare in X86. TailCall
can be transformed to either a call
or a jmp
, which is always longer than 2 bytes. I think we can just skip emitting patchable-op if the first instruction is a TailCall
.
I remember trying a similar approach: Making the pass skip the functions for which the first instruction's One one hand, I think it fixed the issue for the repro provided in the 2 issues, so it could be considered a workaround (and it is not worse than before this code change). However I don't think it should close #59039, because of the remaining design issues:
What do you think? |
@sylvain-audi For 2, instruction length is only determined before emitting. I think it's impossible to know that in |
Instead of `PATCHABLE_OP` wrapping the first `MachineInstr`, it now inserts itself before it, leaving the instruction unaltered. At lowering time in `X86AsmPrinter`, we now "look ahead" for the next non-pseudo `MachineInstr` and lower+encode it, to inspect its size. If the size is below what `PATCHABLE_OP` expects, it inserts NOPs; otherwise it does nothing
So I did something different to address the points raised. Instead of If we wanted in a future PR we could change the attribute Also, I've removed the |
To keep the optimization, is there a (clean) way we could "tell" the |
I was under the impression that was initially a low hanging fruit optimization, but I'm not sure it's required ( @sanjoy ). I'll look into what you suggest if that's required. |
Hi @aganea - I touched this code 7+ years back and I don't remember any details, so unfortunately I don't have any useful feedback or suggestions on this PR. |
One interesting approach that goes along your lines would be to do like x64 MSVC does: always generate a 2-byte instruction at the start of the function, with no consideration to |
Another thing that @sylvain-audi raised in a offline discussion, is that |
@MolecularMatters would you be able please to run this patch through the Live++ tests, see if that doesn't break other cases? |
I'm the original reporter to @MolecularMatters about the issue, and I've been running with the Live++ build flags in my project for the past 12 hours with the changes in this PR applied on top of 8fb8ad6 and it seems to be fine now without needing |
@aganea I think you addressed all my concerns in your latest update. |
@aganea @tycho @sylvain-audi If somebody could make a clang/lld build with the patch available for me, I'd be happy to run this against the Live++ test suite. |
@MolecularMatters I sent you a build with this PR modification, based on 17.0.6. |
@MolecularMatters Did you have a chance to test this PR? I’d like to merge it before the branch cutoff. Thanks! |
@aganea Thanks for reminding me! |
Thanks! To all other reviewers: is this good to go? Can anyone approve please? |
Ping! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with some suggestion
As suggested |
@sylvain-audi Would you prefer that we leave #59039 open? |
No, you can close it in my opinion, thanks! |
Previously, tail jump pseudo-opcodes were skipped by the
encodeInstruction()
call insideX86AsmPrinter::LowerPATCHABLE_OP
. This caused emission of a 2-byte NOP and dropping of the tail jump.With this PR, we change
PATCHABLE_OP
to not wrap the firstMachineInstr
anymore, but inserting itself before the instruction, leaving the instruction unaltered. At lowering time inX86AsmPrinter
, we now "look ahead" for the next non-pseudoMachineInstr
and lower+encode it, to inspect its size. If the size is below whatPATCHABLE_OP
expects, it inserts NOPs; otherwise it does nothing. That way, now the firstMachineInstr
is always lowered as usual even if"patchable-function"="prologue-short-redirect"
is used.Fixes #76879, #76958 and #59039