Skip to content

Conversation

dpaoliello
Copy link
Contributor

This change implements import call optimization for AArch64 Windows (equivalent to the undocumented MSVC /d2ImportCallOptimization flag).

Import call optimization adds additional data to the binary which can be used by the Windows kernel loader to rewrite indirect calls to imported functions as direct calls. It uses the same Dynamic Value Relocation Table mechanism that was leveraged on x64 to implement /d2GuardRetpoline.

The change to the obj file is to add a new .impcall section with the following layout:

  // Per section that contains calls to imported functions:
  //  uint32_t SectionSize: Size in bytes for information in this section.
  //  uint32_t Section Number
  //  Per call to imported function in section:
  //    uint32_t Kind: the kind of imported function.
  //    uint32_t BranchOffset: the offset of the branch instruction in its
  //                            parent section.
  //    uint32_t TargetSymbolId: the symbol id of the called function.

NOTE: If the import call optimization feature is enabled, then the .impcall section must be emitted, even if there are no calls to imported functions.

The implementation is split across a few parts of LLVM:

  • During AArch64 instruction selection, the GlobalValue for each call to a global is recorded into the Extra Information for that node.
  • During lowering to machine instructions, the called global value for each call is noted in its containing MachineFunction.
  • During AArch64 asm printing, if the import call optimization feature is enabled:
    • A (new) .impcall directive is emitted for each call to an imported function.
    • The .impcall section is emitted with its magic header (but is not filled in).
  • During COFF object writing, the .impcall section is filled in based on each .impcall directive that were encountered.

The .impcall section can only be filled in when we are writing the COFF object as it requires the actual section numbers, which are only assigned at that point (i.e., they don't exist during asm printing).

I had tried to avoid using the Extra Information during instruction selection and instead implement this either purely during asm printing or in a MachineFunctionPass (as suggested in on the forums) but this was not possible due to how loading and calling an imported function works on AArch64. Specifically, they are emitted as ADRP + LDR (to load the symbol) then a BR (to do the call), so at the point when we have machine instructions, we would have to work backwards through the instructions to discover what is being called. An initial prototype did work by inspecting instructions; however, it didn't correctly handle the case where the same function was called twice in a row, which caused LLVM to elide the ADRP + LDR and reuse the previously loaded address. Worse than that, sometimes for the double-call case LLVM decided to spill the loaded address to the stack and then reload it before making the second call. So, instead of trying to implement logic to discover where the value in a register came from, I instead recorded the symbol being called at the last place where it was easy to do: instruction selection.

@llvmbot llvmbot added llvm:mc Machine (object) code llvm:SelectionDAG SelectionDAGISel as well labels Jan 2, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 2, 2025

@llvm/pr-subscribers-mc

@llvm/pr-subscribers-platform-windows

Author: Daniel Paoliello (dpaoliello)

Changes

This change implements import call optimization for AArch64 Windows (equivalent to the undocumented MSVC /d2ImportCallOptimization flag).

Import call optimization adds additional data to the binary which can be used by the Windows kernel loader to rewrite indirect calls to imported functions as direct calls. It uses the same Dynamic Value Relocation Table mechanism that was leveraged on x64 to implement /d2GuardRetpoline.

The change to the obj file is to add a new .impcall section with the following layout:

  // Per section that contains calls to imported functions:
  //  uint32_t SectionSize: Size in bytes for information in this section.
  //  uint32_t Section Number
  //  Per call to imported function in section:
  //    uint32_t Kind: the kind of imported function.
  //    uint32_t BranchOffset: the offset of the branch instruction in its
  //                            parent section.
  //    uint32_t TargetSymbolId: the symbol id of the called function.

NOTE: If the import call optimization feature is enabled, then the .impcall section must be emitted, even if there are no calls to imported functions.

The implementation is split across a few parts of LLVM:

  • During AArch64 instruction selection, the GlobalValue for each call to a global is recorded into the Extra Information for that node.
  • During lowering to machine instructions, the called global value for each call is noted in its containing MachineFunction.
  • During AArch64 asm printing, if the import call optimization feature is enabled:
    • A (new) .impcall directive is emitted for each call to an imported function.
    • The .impcall section is emitted with its magic header (but is not filled in).
  • During COFF object writing, the .impcall section is filled in based on each .impcall directive that were encountered.

The .impcall section can only be filled in when we are writing the COFF object as it requires the actual section numbers, which are only assigned at that point (i.e., they don't exist during asm printing).

I had tried to avoid using the Extra Information during instruction selection and instead implement this either purely during asm printing or in a MachineFunctionPass (as suggested in on the forums) but this was not possible due to how loading and calling an imported function works on AArch64. Specifically, they are emitted as ADRP + LDR (to load the symbol) then a BR (to do the call), so at the point when we have machine instructions, we would have to work backwards through the instructions to discover what is being called. An initial prototype did work by inspecting instructions; however, it didn't correctly handle the case where the same function was called twice in a row, which caused LLVM to elide the ADRP + LDR and reuse the previously loaded address. Worse than that, sometimes for the double-call case LLVM decided to spill the loaded address to the stack and then reload it before making the second call. So, instead of trying to implement logic to discover where the value in a register came from, I instead recorded the symbol being called at the last place where it was easy to do: instruction selection.


Patch is 28.00 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/121516.diff

20 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/MachineFunction.h (+18)
  • (modified) llvm/include/llvm/CodeGen/SelectionDAG.h (+14)
  • (modified) llvm/include/llvm/MC/MCObjectFileInfo.h (+5)
  • (modified) llvm/include/llvm/MC/MCStreamer.h (+3)
  • (modified) llvm/include/llvm/MC/MCWinCOFFObjectWriter.h (+2)
  • (modified) llvm/include/llvm/MC/MCWinCOFFStreamer.h (+1)
  • (modified) llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp (+3)
  • (modified) llvm/lib/MC/MCAsmStreamer.cpp (+9)
  • (modified) llvm/lib/MC/MCObjectFileInfo.cpp (+5)
  • (modified) llvm/lib/MC/MCParser/COFFAsmParser.cpp (+21)
  • (modified) llvm/lib/MC/MCStreamer.cpp (+2)
  • (modified) llvm/lib/MC/MCWinCOFFStreamer.cpp (+9)
  • (modified) llvm/lib/MC/WinCOFFObjectWriter.cpp (+78)
  • (modified) llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp (+42)
  • (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+8-4)
  • (added) llvm/test/CodeGen/AArch64/win-import-call-optimization-nocalls.ll (+18)
  • (added) llvm/test/CodeGen/AArch64/win-import-call-optimization.ll (+36)
  • (added) llvm/test/MC/AArch64/win-import-call-optimization-no-section.s (+9)
  • (added) llvm/test/MC/AArch64/win-import-call-optimization.s (+62)
  • (added) llvm/test/MC/COFF/win-import-call-optimization-not-supported.s (+13)
diff --git a/llvm/include/llvm/CodeGen/MachineFunction.h b/llvm/include/llvm/CodeGen/MachineFunction.h
index d696add8a1af53..520f1745de2979 100644
--- a/llvm/include/llvm/CodeGen/MachineFunction.h
+++ b/llvm/include/llvm/CodeGen/MachineFunction.h
@@ -354,6 +354,11 @@ class LLVM_ABI MachineFunction {
   /// a table of valid targets for Windows EHCont Guard.
   std::vector<MCSymbol *> CatchretTargets;
 
+  /// Mapping of call instruction to the global value and target flags that it
+  /// calls, if applicable.
+  DenseMap<const MachineInstr *, std::pair<const GlobalValue *, unsigned>>
+      CalledGlobalsMap;
+
   /// \name Exception Handling
   /// \{
 
@@ -1182,6 +1187,19 @@ class LLVM_ABI MachineFunction {
     CatchretTargets.push_back(Target);
   }
 
+  /// Tries to get the global and target flags for a call site, if the
+  /// instruction is a call to a global.
+  std::pair<const GlobalValue *, unsigned>
+  tryGetCalledGlobal(const MachineInstr *MI) const {
+    return CalledGlobalsMap.lookup(MI);
+  }
+
+  /// Notes the global and target flags for a call site.
+  void addCalledGlobal(const MachineInstr *MI,
+                       std::pair<const GlobalValue *, unsigned> Details) {
+    CalledGlobalsMap.insert({MI, Details});
+  }
+
   /// \name Exception Handling
   /// \{
 
diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h b/llvm/include/llvm/CodeGen/SelectionDAG.h
index ff7caec41855fd..b31ad11c3ee0ee 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -293,6 +293,7 @@ class SelectionDAG {
     MDNode *HeapAllocSite = nullptr;
     MDNode *PCSections = nullptr;
     MDNode *MMRA = nullptr;
+    std::pair<const GlobalValue *, unsigned> CalledGlobal{};
     bool NoMerge = false;
   };
   /// Out-of-line extra information for SDNodes.
@@ -2373,6 +2374,19 @@ class SelectionDAG {
     auto It = SDEI.find(Node);
     return It != SDEI.end() ? It->second.MMRA : nullptr;
   }
+  /// Set CalledGlobal to be associated with Node.
+  void addCalledGlobal(const SDNode *Node, const GlobalValue *GV,
+                       unsigned OpFlags) {
+    SDEI[Node].CalledGlobal = {GV, OpFlags};
+  }
+  /// Return CalledGlobal associated with Node, or a nullopt if none exists.
+  std::optional<std::pair<const GlobalValue *, unsigned>>
+  getCalledGlobal(const SDNode *Node) {
+    auto I = SDEI.find(Node);
+    return I != SDEI.end()
+               ? std::make_optional(std::move(I->second).CalledGlobal)
+               : std::nullopt;
+  }
   /// Set NoMergeSiteInfo to be associated with Node if NoMerge is true.
   void addNoMergeSiteInfo(const SDNode *Node, bool NoMerge) {
     if (NoMerge)
diff --git a/llvm/include/llvm/MC/MCObjectFileInfo.h b/llvm/include/llvm/MC/MCObjectFileInfo.h
index e2a2c84e47910b..fb575fe721015c 100644
--- a/llvm/include/llvm/MC/MCObjectFileInfo.h
+++ b/llvm/include/llvm/MC/MCObjectFileInfo.h
@@ -73,6 +73,10 @@ class MCObjectFileInfo {
   /// to emit them into.
   MCSection *CompactUnwindSection = nullptr;
 
+  /// If import call optimization is supported by the target, this is the
+  /// section to emit import call data to.
+  MCSection *ImportCallSection = nullptr;
+
   // Dwarf sections for debug info.  If a target supports debug info, these must
   // be set.
   MCSection *DwarfAbbrevSection = nullptr;
@@ -269,6 +273,7 @@ class MCObjectFileInfo {
   MCSection *getBSSSection() const { return BSSSection; }
   MCSection *getReadOnlySection() const { return ReadOnlySection; }
   MCSection *getLSDASection() const { return LSDASection; }
+  MCSection *getImportCallSection() const { return ImportCallSection; }
   MCSection *getCompactUnwindSection() const { return CompactUnwindSection; }
   MCSection *getDwarfAbbrevSection() const { return DwarfAbbrevSection; }
   MCSection *getDwarfInfoSection() const { return DwarfInfoSection; }
diff --git a/llvm/include/llvm/MC/MCStreamer.h b/llvm/include/llvm/MC/MCStreamer.h
index 21da4dac4872b4..c82ce4428ed09c 100644
--- a/llvm/include/llvm/MC/MCStreamer.h
+++ b/llvm/include/llvm/MC/MCStreamer.h
@@ -569,6 +569,9 @@ class MCStreamer {
   /// \param Symbol - Symbol the image relative relocation should point to.
   virtual void emitCOFFImgRel32(MCSymbol const *Symbol, int64_t Offset);
 
+  /// Emits an import call directive, used to build the import call table.
+  virtual void emitCOFFImpCall(MCSymbol const *Symbol);
+
   /// Emits an lcomm directive with XCOFF csect information.
   ///
   /// \param LabelSym - Label on the block of storage.
diff --git a/llvm/include/llvm/MC/MCWinCOFFObjectWriter.h b/llvm/include/llvm/MC/MCWinCOFFObjectWriter.h
index a4ede61e45099d..00a132706879f2 100644
--- a/llvm/include/llvm/MC/MCWinCOFFObjectWriter.h
+++ b/llvm/include/llvm/MC/MCWinCOFFObjectWriter.h
@@ -72,6 +72,8 @@ class WinCOFFObjectWriter final : public MCObjectWriter {
                         const MCFixup &Fixup, MCValue Target,
                         uint64_t &FixedValue) override;
   uint64_t writeObject(MCAssembler &Asm) override;
+  void recordImportCall(const MCDataFragment &FB, const MCSymbol *Symbol);
+  bool hasRecordedImportCalls() const;
 };
 
 /// Construct a new Win COFF writer instance.
diff --git a/llvm/include/llvm/MC/MCWinCOFFStreamer.h b/llvm/include/llvm/MC/MCWinCOFFStreamer.h
index 5c39d80538944b..2318d1b8e0a223 100644
--- a/llvm/include/llvm/MC/MCWinCOFFStreamer.h
+++ b/llvm/include/llvm/MC/MCWinCOFFStreamer.h
@@ -58,6 +58,7 @@ class MCWinCOFFStreamer : public MCObjectStreamer {
   void emitCOFFSectionIndex(MCSymbol const *Symbol) override;
   void emitCOFFSecRel32(MCSymbol const *Symbol, uint64_t Offset) override;
   void emitCOFFImgRel32(MCSymbol const *Symbol, int64_t Offset) override;
+  void emitCOFFImpCall(MCSymbol const *Symbol) override;
   void emitCommonSymbol(MCSymbol *Symbol, uint64_t Size,
                         Align ByteAlignment) override;
   void emitLocalCommonSymbol(MCSymbol *Symbol, uint64_t Size,
diff --git a/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp b/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
index 26fc75c0578ec2..6744e7cd2ecfcf 100644
--- a/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
@@ -908,6 +908,9 @@ EmitSchedule(MachineBasicBlock::iterator &InsertPos) {
         It->setMMRAMetadata(MF, MMRA);
     }
 
+    if (auto CalledGlobal = DAG->getCalledGlobal(Node))
+      MF.addCalledGlobal(MI, *CalledGlobal);
+
     return MI;
   };
 
diff --git a/llvm/lib/MC/MCAsmStreamer.cpp b/llvm/lib/MC/MCAsmStreamer.cpp
index 01fe11ed205017..01903e2c1335b9 100644
--- a/llvm/lib/MC/MCAsmStreamer.cpp
+++ b/llvm/lib/MC/MCAsmStreamer.cpp
@@ -209,6 +209,7 @@ class MCAsmStreamer final : public MCStreamer {
   void emitCOFFSectionIndex(MCSymbol const *Symbol) override;
   void emitCOFFSecRel32(MCSymbol const *Symbol, uint64_t Offset) override;
   void emitCOFFImgRel32(MCSymbol const *Symbol, int64_t Offset) override;
+  void emitCOFFImpCall(MCSymbol const *Symbol) override;
   void emitXCOFFLocalCommonSymbol(MCSymbol *LabelSym, uint64_t Size,
                                   MCSymbol *CsectSym, Align Alignment) override;
   void emitXCOFFSymbolLinkageWithVisibility(MCSymbol *Symbol,
@@ -893,6 +894,14 @@ void MCAsmStreamer::emitCOFFImgRel32(MCSymbol const *Symbol, int64_t Offset) {
   EmitEOL();
 }
 
+void MCAsmStreamer::emitCOFFImpCall(MCSymbol const *Symbol) {
+  assert(this->getContext().getObjectFileInfo()->getImportCallSection() &&
+         "This target doesn't have a import call section");
+  OS << "\t.impcall\t";
+  Symbol->print(OS, MAI);
+  EmitEOL();
+}
+
 // We need an XCOFF-specific version of this directive as the AIX syntax
 // requires a QualName argument identifying the csect name and storage mapping
 // class to appear before the alignment if we are specifying it.
diff --git a/llvm/lib/MC/MCObjectFileInfo.cpp b/llvm/lib/MC/MCObjectFileInfo.cpp
index f37e138edc36b1..150e38a94db6a6 100644
--- a/llvm/lib/MC/MCObjectFileInfo.cpp
+++ b/llvm/lib/MC/MCObjectFileInfo.cpp
@@ -596,6 +596,11 @@ void MCObjectFileInfo::initCOFFMCObjectFileInfo(const Triple &T) {
                                           COFF::IMAGE_SCN_MEM_READ);
   }
 
+  if (T.getArch() == Triple::aarch64) {
+    ImportCallSection =
+        Ctx->getCOFFSection(".impcall", COFF::IMAGE_SCN_LNK_INFO);
+  }
+
   // Debug info.
   COFFDebugSymbolsSection =
       Ctx->getCOFFSection(".debug$S", (COFF::IMAGE_SCN_MEM_DISCARDABLE |
diff --git a/llvm/lib/MC/MCParser/COFFAsmParser.cpp b/llvm/lib/MC/MCParser/COFFAsmParser.cpp
index 4d95a720852835..48f03ec0d3a847 100644
--- a/llvm/lib/MC/MCParser/COFFAsmParser.cpp
+++ b/llvm/lib/MC/MCParser/COFFAsmParser.cpp
@@ -12,6 +12,7 @@
 #include "llvm/BinaryFormat/COFF.h"
 #include "llvm/MC/MCContext.h"
 #include "llvm/MC/MCDirectives.h"
+#include "llvm/MC/MCObjectFileInfo.h"
 #include "llvm/MC/MCParser/MCAsmLexer.h"
 #include "llvm/MC/MCParser/MCAsmParserExtension.h"
 #include "llvm/MC/MCSectionCOFF.h"
@@ -70,6 +71,7 @@ class COFFAsmParser : public MCAsmParserExtension {
     addDirectiveHandler<&COFFAsmParser::parseDirectiveSymbolAttribute>(
         ".weak_anti_dep");
     addDirectiveHandler<&COFFAsmParser::parseDirectiveCGProfile>(".cg_profile");
+    addDirectiveHandler<&COFFAsmParser::parseDirectiveImpCall>(".impcall");
 
     // Win64 EH directives.
     addDirectiveHandler<&COFFAsmParser::parseSEHDirectiveStartProc>(
@@ -126,6 +128,7 @@ class COFFAsmParser : public MCAsmParserExtension {
   bool parseDirectiveLinkOnce(StringRef, SMLoc);
   bool parseDirectiveRVA(StringRef, SMLoc);
   bool parseDirectiveCGProfile(StringRef, SMLoc);
+  bool parseDirectiveImpCall(StringRef, SMLoc);
 
   // Win64 EH directives.
   bool parseSEHDirectiveStartProc(StringRef, SMLoc);
@@ -577,6 +580,24 @@ bool COFFAsmParser::parseDirectiveSymIdx(StringRef, SMLoc) {
   return false;
 }
 
+bool COFFAsmParser::parseDirectiveImpCall(StringRef, SMLoc) {
+  if (!getContext().getObjectFileInfo()->getImportCallSection())
+    return TokError("target doesn't have an import call section");
+
+  StringRef SymbolID;
+  if (getParser().parseIdentifier(SymbolID))
+    return TokError("expected identifier in directive");
+
+  if (getLexer().isNot(AsmToken::EndOfStatement))
+    return TokError("unexpected token in directive");
+
+  MCSymbol *Symbol = getContext().getOrCreateSymbol(SymbolID);
+
+  Lex();
+  getStreamer().emitCOFFImpCall(Symbol);
+  return false;
+}
+
 /// ::= [ identifier ]
 bool COFFAsmParser::parseCOMDATType(COFF::COMDATType &Type) {
   StringRef TypeId = getTok().getIdentifier();
diff --git a/llvm/lib/MC/MCStreamer.cpp b/llvm/lib/MC/MCStreamer.cpp
index ccf65df150e786..ee26fc07313f18 100644
--- a/llvm/lib/MC/MCStreamer.cpp
+++ b/llvm/lib/MC/MCStreamer.cpp
@@ -1023,6 +1023,8 @@ void MCStreamer::emitCOFFSecRel32(MCSymbol const *Symbol, uint64_t Offset) {}
 
 void MCStreamer::emitCOFFImgRel32(MCSymbol const *Symbol, int64_t Offset) {}
 
+void MCStreamer::emitCOFFImpCall(MCSymbol const *Symbol) {}
+
 /// EmitRawText - If this file is backed by an assembly streamer, this dumps
 /// the specified string in the output .s file.  This capability is
 /// indicated by the hasRawTextSupport() predicate.
diff --git a/llvm/lib/MC/MCWinCOFFStreamer.cpp b/llvm/lib/MC/MCWinCOFFStreamer.cpp
index 395d4db3103d78..71e31f3288c001 100644
--- a/llvm/lib/MC/MCWinCOFFStreamer.cpp
+++ b/llvm/lib/MC/MCWinCOFFStreamer.cpp
@@ -280,6 +280,15 @@ void MCWinCOFFStreamer::emitCOFFImgRel32(const MCSymbol *Symbol,
   DF->appendContents(4, 0);
 }
 
+void MCWinCOFFStreamer::emitCOFFImpCall(MCSymbol const *Symbol) {
+  assert(this->getContext().getObjectFileInfo()->getImportCallSection() &&
+         "This target doesn't have a import call section");
+
+  auto *DF = getOrCreateDataFragment();
+  getAssembler().registerSymbol(*Symbol);
+  getWriter().recordImportCall(*DF, Symbol);
+}
+
 void MCWinCOFFStreamer::emitCommonSymbol(MCSymbol *S, uint64_t Size,
                                          Align ByteAlignment) {
   auto *Symbol = cast<MCSymbolCOFF>(S);
diff --git a/llvm/lib/MC/WinCOFFObjectWriter.cpp b/llvm/lib/MC/WinCOFFObjectWriter.cpp
index 09d2b08e43050f..527464fa54ce02 100644
--- a/llvm/lib/MC/WinCOFFObjectWriter.cpp
+++ b/llvm/lib/MC/WinCOFFObjectWriter.cpp
@@ -23,6 +23,7 @@
 #include "llvm/MC/MCExpr.h"
 #include "llvm/MC/MCFixup.h"
 #include "llvm/MC/MCFragment.h"
+#include "llvm/MC/MCObjectFileInfo.h"
 #include "llvm/MC/MCObjectWriter.h"
 #include "llvm/MC/MCSection.h"
 #include "llvm/MC/MCSectionCOFF.h"
@@ -147,6 +148,13 @@ class llvm::WinCOFFWriter {
   bool UseBigObj;
   bool UseOffsetLabels = false;
 
+  struct ImportCall {
+    unsigned CallsiteOffset;
+    const MCSymbol *CalledSymbol;
+  };
+  using importcall_map = MapVector<MCSection *, std::vector<ImportCall>>;
+  importcall_map SectionToImportCallsMap;
+
 public:
   enum DwoMode {
     AllSections,
@@ -163,6 +171,11 @@ class llvm::WinCOFFWriter {
                         const MCFixup &Fixup, MCValue Target,
                         uint64_t &FixedValue);
   uint64_t writeObject(MCAssembler &Asm);
+  void generateAArch64ImportCallSection(llvm::MCAssembler &Asm);
+  void recordImportCall(const MCDataFragment &FB, const MCSymbol *Symbol);
+  bool hasRecordedImportCalls() const {
+    return !SectionToImportCallsMap.empty();
+  }
 
 private:
   COFFSymbol *createSymbol(StringRef Name);
@@ -1097,6 +1110,17 @@ uint64_t WinCOFFWriter::writeObject(MCAssembler &Asm) {
     }
   }
 
+  // Create the contents of the import call section.
+  if (hasRecordedImportCalls()) {
+    switch (Asm.getContext().getTargetTriple().getArch()) {
+    case Triple::aarch64:
+      generateAArch64ImportCallSection(Asm);
+      break;
+    default:
+      llvm_unreachable("unsupported architecture for import call section");
+    }
+  }
+
   assignFileOffsets(Asm);
 
   // MS LINK expects to be able to use this timestamp to implement their
@@ -1143,6 +1167,51 @@ uint64_t WinCOFFWriter::writeObject(MCAssembler &Asm) {
   return W.OS.tell() - StartOffset;
 }
 
+void llvm::WinCOFFWriter::generateAArch64ImportCallSection(
+    llvm::MCAssembler &Asm) {
+  auto *ImpCallSection =
+      Asm.getContext().getObjectFileInfo()->getImportCallSection();
+
+  if (!SectionMap.contains(ImpCallSection)) {
+    Asm.getContext().reportError(SMLoc(),
+                                 ".impcall directives were used, but no "
+                                 "existing .impcall section exists");
+    return;
+  }
+
+  auto *Frag = cast<MCDataFragment>(ImpCallSection->curFragList()->Head);
+  raw_svector_ostream OS(Frag->getContents());
+
+  // Layout of this section is:
+  // Per section that contains calls to imported functions:
+  //  uint32_t SectionSize: Size in bytes for information in this section.
+  //  uint32_t Section Number
+  //  Per call to imported function in section:
+  //    uint32_t Kind: the kind of imported function.
+  //    uint32_t BranchOffset: the offset of the branch instruction in its
+  //                            parent section.
+  //    uint32_t TargetSymbolId: the symbol id of the called function.
+
+  // Per section that contained eligible targets...
+  for (auto &[Section, Targets] : SectionToImportCallsMap) {
+    unsigned SectionSize = sizeof(uint32_t) * (2 + 3 * Targets.size());
+    support::endian::write(OS, SectionSize, W.Endian);
+    support::endian::write(OS, SectionMap.at(Section)->Number, W.Endian);
+    for (auto &[BranchOffset, TargetSymbol] : Targets) {
+      // Kind is always IMAGE_REL_ARM64_DYNAMIC_IMPORT_CALL (0x13).
+      support::endian::write(OS, 0x13, W.Endian);
+      support::endian::write(OS, BranchOffset, W.Endian);
+      support::endian::write(OS, TargetSymbol->getIndex(), W.Endian);
+    }
+  }
+}
+
+void WinCOFFWriter::recordImportCall(const MCDataFragment &FB,
+                                     const MCSymbol *Symbol) {
+  auto &SectionData = SectionToImportCallsMap[FB.getParent()];
+  SectionData.push_back(ImportCall{unsigned(FB.getContents().size()), Symbol});
+}
+
 //------------------------------------------------------------------------------
 // WinCOFFObjectWriter class implementation
 
@@ -1194,6 +1263,15 @@ uint64_t WinCOFFObjectWriter::writeObject(MCAssembler &Asm) {
   return TotalSize;
 }
 
+void WinCOFFObjectWriter::recordImportCall(const MCDataFragment &FB,
+                                           const MCSymbol *Symbol) {
+  ObjWriter->recordImportCall(FB, Symbol);
+}
+
+bool WinCOFFObjectWriter::hasRecordedImportCalls() const {
+  return ObjWriter->hasRecordedImportCalls();
+}
+
 MCWinCOFFObjectTargetWriter::MCWinCOFFObjectTargetWriter(unsigned Machine_)
     : Machine(Machine_) {}
 
diff --git a/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp b/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
index 9bec782ca8ce97..4c03f876465051 100644
--- a/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
+++ b/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
@@ -77,6 +77,11 @@ static cl::opt<PtrauthCheckMode> PtrauthAuthChecks(
     cl::desc("Check pointer authentication auth/resign failures"),
     cl::init(Default));
 
+static cl::opt<bool> EnableImportCallOptimization(
+    "aarch64-win-import-call-optimization", cl::Hidden,
+    cl::desc("Enable import call optimization for AArch64 Windows"),
+    cl::init(false));
+
 #define DEBUG_TYPE "asm-printer"
 
 namespace {
@@ -293,6 +298,11 @@ class AArch64AsmPrinter : public AsmPrinter {
                               MCSymbol *LazyPointer) override;
   void emitMachOIFuncStubHelperBody(Module &M, const GlobalIFunc &GI,
                                     MCSymbol *LazyPointer) override;
+
+  /// Checks if this instruction is part of a sequence that is eligle for import
+  /// call optimization and, if so, records it to be emitted in the import call
+  /// section.
+  void recordIfImportCall(const MachineInstr *BranchInst);
 };
 
 } // end anonymous namespace
@@ -921,6 +931,15 @@ void AArch64AsmPrinter::emitEndOfAsmFile(Module &M) {
   // Emit stack and fault map information.
   FM.serializeToFaultMapSection();
 
+  // If import call optimization is enabled, emit the appropriate section.
+  // We do this whether or not we recorded any import calls.
+  if (EnableImportCallOptimization && TT.isOSBinFormatCOFF()) {
+    OutStreamer->switchSection(getObjFileLowering().getImportCallSection());
+
+    // Section always starts with some magic.
+    constexpr char ImpCallMagic[12] = "Imp_Call_V1";
+    OutStreamer->emitBytes(StringRef(ImpCallMagic, sizeof(ImpCallMagic)));
+  }
 }
 
 void AArch64AsmPrinter::emitLOHs() {
@@ -2693,6 +2712,7 @@ void AArch64AsmPrinter::emitInstruction(const MachineInstr *MI) {
   case AArch64::TCRETURNrinotx16:
   case AArch64::TCRETURNriALL: {
     emitPtrauthTailCallHardening(MI);
+    recordIfImportCall(MI);
 
     MCInst TmpInst;
     TmpInst.setOpcode(AArch64::BR);
@@ -2702,6 +2722,7 @@ void AArch64AsmPrinter::emitInstruction(const MachineInstr *MI) {
   }
   case AArch64::TCRETURNdi: {
     emitPtrauthTailCallHardening(MI);
+    recordIfImportCall(MI);
 
     MCOperand Dest;
     MCInstLowering.lowerOperand(MI->getOperand(0), Dest);
@@ -3035,6 +3056,14 @@ void AArch64AsmPrinter::emitInstruction(const MachineInstr *MI) {
     TS->emitARM64WinCFISaveAnyRegQPX(MI->getOperand(0).getImm(),
                                      -MI->getOperand(2).getImm());
     return;
+
+  case AArch64::BLR:
+  case AArch64::BR:
+    recordIfImportCall(MI);
+    MCInst TmpInst;
+    MCInstLowering.Lower(MI, TmpInst);
+    EmitToStreamer(*OutStreamer, TmpInst);
+    return;
   }
 
   // Finally, do the automated lowerings for everything else.
@@ -3043,6 +3072,19 @@ void AArch64AsmPrinter::emitInstruction(const MachineInstr *MI) {
   EmitToStreamer(*OutStreamer, TmpInst);
 }
 
+void AArch64AsmPrinter::recordIfImportCall(
+    const llvm::MachineInstr *BranchInst) {
+  if (!EnableImportCallOptimization ||
+      !TM.getTargetTriple().isOSBinFormatCOFF())
+    return;
+
+  auto [GV, OpFlags] = BranchInst->getMF()->tryGetCalledGlobal(BranchInst);
+  if (GV && GV->hasDLLImportStorageClass()) {
+    OutStreamer->emitCOFFImpCall(
+        MCInstLowering.GetGlobalValueSymbol(GV, OpFlags));
+  }
+}
+
 void AArch64AsmPrinter::emitMachOIFuncStubBody(Module &M, const GlobalIFunc &GI,
                                                MCSymbol *LazyPointer) {
   // _ifunc:
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 070163a5fb29...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jan 2, 2025

@llvm/pr-subscribers-backend-aarch64

Author: Daniel Paoliello (dpaoliello)

Changes

This change implements import call optimization for AArch64 Windows (equivalent to the undocumented MSVC /d2ImportCallOptimization flag).

Import call optimization adds additional data to the binary which can be used by the Windows kernel loader to rewrite indirect calls to imported functions as direct calls. It uses the same Dynamic Value Relocation Table mechanism that was leveraged on x64 to implement /d2GuardRetpoline.

The change to the obj file is to add a new .impcall section with the following layout:

  // Per section that contains calls to imported functions:
  //  uint32_t SectionSize: Size in bytes for information in this section.
  //  uint32_t Section Number
  //  Per call to imported function in section:
  //    uint32_t Kind: the kind of imported function.
  //    uint32_t BranchOffset: the offset of the branch instruction in its
  //                            parent section.
  //    uint32_t TargetSymbolId: the symbol id of the called function.

NOTE: If the import call optimization feature is enabled, then the .impcall section must be emitted, even if there are no calls to imported functions.

The implementation is split across a few parts of LLVM:

  • During AArch64 instruction selection, the GlobalValue for each call to a global is recorded into the Extra Information for that node.
  • During lowering to machine instructions, the called global value for each call is noted in its containing MachineFunction.
  • During AArch64 asm printing, if the import call optimization feature is enabled:
    • A (new) .impcall directive is emitted for each call to an imported function.
    • The .impcall section is emitted with its magic header (but is not filled in).
  • During COFF object writing, the .impcall section is filled in based on each .impcall directive that were encountered.

The .impcall section can only be filled in when we are writing the COFF object as it requires the actual section numbers, which are only assigned at that point (i.e., they don't exist during asm printing).

I had tried to avoid using the Extra Information during instruction selection and instead implement this either purely during asm printing or in a MachineFunctionPass (as suggested in on the forums) but this was not possible due to how loading and calling an imported function works on AArch64. Specifically, they are emitted as ADRP + LDR (to load the symbol) then a BR (to do the call), so at the point when we have machine instructions, we would have to work backwards through the instructions to discover what is being called. An initial prototype did work by inspecting instructions; however, it didn't correctly handle the case where the same function was called twice in a row, which caused LLVM to elide the ADRP + LDR and reuse the previously loaded address. Worse than that, sometimes for the double-call case LLVM decided to spill the loaded address to the stack and then reload it before making the second call. So, instead of trying to implement logic to discover where the value in a register came from, I instead recorded the symbol being called at the last place where it was easy to do: instruction selection.


Patch is 28.00 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/121516.diff

20 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/MachineFunction.h (+18)
  • (modified) llvm/include/llvm/CodeGen/SelectionDAG.h (+14)
  • (modified) llvm/include/llvm/MC/MCObjectFileInfo.h (+5)
  • (modified) llvm/include/llvm/MC/MCStreamer.h (+3)
  • (modified) llvm/include/llvm/MC/MCWinCOFFObjectWriter.h (+2)
  • (modified) llvm/include/llvm/MC/MCWinCOFFStreamer.h (+1)
  • (modified) llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp (+3)
  • (modified) llvm/lib/MC/MCAsmStreamer.cpp (+9)
  • (modified) llvm/lib/MC/MCObjectFileInfo.cpp (+5)
  • (modified) llvm/lib/MC/MCParser/COFFAsmParser.cpp (+21)
  • (modified) llvm/lib/MC/MCStreamer.cpp (+2)
  • (modified) llvm/lib/MC/MCWinCOFFStreamer.cpp (+9)
  • (modified) llvm/lib/MC/WinCOFFObjectWriter.cpp (+78)
  • (modified) llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp (+42)
  • (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+8-4)
  • (added) llvm/test/CodeGen/AArch64/win-import-call-optimization-nocalls.ll (+18)
  • (added) llvm/test/CodeGen/AArch64/win-import-call-optimization.ll (+36)
  • (added) llvm/test/MC/AArch64/win-import-call-optimization-no-section.s (+9)
  • (added) llvm/test/MC/AArch64/win-import-call-optimization.s (+62)
  • (added) llvm/test/MC/COFF/win-import-call-optimization-not-supported.s (+13)
diff --git a/llvm/include/llvm/CodeGen/MachineFunction.h b/llvm/include/llvm/CodeGen/MachineFunction.h
index d696add8a1af53..520f1745de2979 100644
--- a/llvm/include/llvm/CodeGen/MachineFunction.h
+++ b/llvm/include/llvm/CodeGen/MachineFunction.h
@@ -354,6 +354,11 @@ class LLVM_ABI MachineFunction {
   /// a table of valid targets for Windows EHCont Guard.
   std::vector<MCSymbol *> CatchretTargets;
 
+  /// Mapping of call instruction to the global value and target flags that it
+  /// calls, if applicable.
+  DenseMap<const MachineInstr *, std::pair<const GlobalValue *, unsigned>>
+      CalledGlobalsMap;
+
   /// \name Exception Handling
   /// \{
 
@@ -1182,6 +1187,19 @@ class LLVM_ABI MachineFunction {
     CatchretTargets.push_back(Target);
   }
 
+  /// Tries to get the global and target flags for a call site, if the
+  /// instruction is a call to a global.
+  std::pair<const GlobalValue *, unsigned>
+  tryGetCalledGlobal(const MachineInstr *MI) const {
+    return CalledGlobalsMap.lookup(MI);
+  }
+
+  /// Notes the global and target flags for a call site.
+  void addCalledGlobal(const MachineInstr *MI,
+                       std::pair<const GlobalValue *, unsigned> Details) {
+    CalledGlobalsMap.insert({MI, Details});
+  }
+
   /// \name Exception Handling
   /// \{
 
diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h b/llvm/include/llvm/CodeGen/SelectionDAG.h
index ff7caec41855fd..b31ad11c3ee0ee 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -293,6 +293,7 @@ class SelectionDAG {
     MDNode *HeapAllocSite = nullptr;
     MDNode *PCSections = nullptr;
     MDNode *MMRA = nullptr;
+    std::pair<const GlobalValue *, unsigned> CalledGlobal{};
     bool NoMerge = false;
   };
   /// Out-of-line extra information for SDNodes.
@@ -2373,6 +2374,19 @@ class SelectionDAG {
     auto It = SDEI.find(Node);
     return It != SDEI.end() ? It->second.MMRA : nullptr;
   }
+  /// Set CalledGlobal to be associated with Node.
+  void addCalledGlobal(const SDNode *Node, const GlobalValue *GV,
+                       unsigned OpFlags) {
+    SDEI[Node].CalledGlobal = {GV, OpFlags};
+  }
+  /// Return CalledGlobal associated with Node, or a nullopt if none exists.
+  std::optional<std::pair<const GlobalValue *, unsigned>>
+  getCalledGlobal(const SDNode *Node) {
+    auto I = SDEI.find(Node);
+    return I != SDEI.end()
+               ? std::make_optional(std::move(I->second).CalledGlobal)
+               : std::nullopt;
+  }
   /// Set NoMergeSiteInfo to be associated with Node if NoMerge is true.
   void addNoMergeSiteInfo(const SDNode *Node, bool NoMerge) {
     if (NoMerge)
diff --git a/llvm/include/llvm/MC/MCObjectFileInfo.h b/llvm/include/llvm/MC/MCObjectFileInfo.h
index e2a2c84e47910b..fb575fe721015c 100644
--- a/llvm/include/llvm/MC/MCObjectFileInfo.h
+++ b/llvm/include/llvm/MC/MCObjectFileInfo.h
@@ -73,6 +73,10 @@ class MCObjectFileInfo {
   /// to emit them into.
   MCSection *CompactUnwindSection = nullptr;
 
+  /// If import call optimization is supported by the target, this is the
+  /// section to emit import call data to.
+  MCSection *ImportCallSection = nullptr;
+
   // Dwarf sections for debug info.  If a target supports debug info, these must
   // be set.
   MCSection *DwarfAbbrevSection = nullptr;
@@ -269,6 +273,7 @@ class MCObjectFileInfo {
   MCSection *getBSSSection() const { return BSSSection; }
   MCSection *getReadOnlySection() const { return ReadOnlySection; }
   MCSection *getLSDASection() const { return LSDASection; }
+  MCSection *getImportCallSection() const { return ImportCallSection; }
   MCSection *getCompactUnwindSection() const { return CompactUnwindSection; }
   MCSection *getDwarfAbbrevSection() const { return DwarfAbbrevSection; }
   MCSection *getDwarfInfoSection() const { return DwarfInfoSection; }
diff --git a/llvm/include/llvm/MC/MCStreamer.h b/llvm/include/llvm/MC/MCStreamer.h
index 21da4dac4872b4..c82ce4428ed09c 100644
--- a/llvm/include/llvm/MC/MCStreamer.h
+++ b/llvm/include/llvm/MC/MCStreamer.h
@@ -569,6 +569,9 @@ class MCStreamer {
   /// \param Symbol - Symbol the image relative relocation should point to.
   virtual void emitCOFFImgRel32(MCSymbol const *Symbol, int64_t Offset);
 
+  /// Emits an import call directive, used to build the import call table.
+  virtual void emitCOFFImpCall(MCSymbol const *Symbol);
+
   /// Emits an lcomm directive with XCOFF csect information.
   ///
   /// \param LabelSym - Label on the block of storage.
diff --git a/llvm/include/llvm/MC/MCWinCOFFObjectWriter.h b/llvm/include/llvm/MC/MCWinCOFFObjectWriter.h
index a4ede61e45099d..00a132706879f2 100644
--- a/llvm/include/llvm/MC/MCWinCOFFObjectWriter.h
+++ b/llvm/include/llvm/MC/MCWinCOFFObjectWriter.h
@@ -72,6 +72,8 @@ class WinCOFFObjectWriter final : public MCObjectWriter {
                         const MCFixup &Fixup, MCValue Target,
                         uint64_t &FixedValue) override;
   uint64_t writeObject(MCAssembler &Asm) override;
+  void recordImportCall(const MCDataFragment &FB, const MCSymbol *Symbol);
+  bool hasRecordedImportCalls() const;
 };
 
 /// Construct a new Win COFF writer instance.
diff --git a/llvm/include/llvm/MC/MCWinCOFFStreamer.h b/llvm/include/llvm/MC/MCWinCOFFStreamer.h
index 5c39d80538944b..2318d1b8e0a223 100644
--- a/llvm/include/llvm/MC/MCWinCOFFStreamer.h
+++ b/llvm/include/llvm/MC/MCWinCOFFStreamer.h
@@ -58,6 +58,7 @@ class MCWinCOFFStreamer : public MCObjectStreamer {
   void emitCOFFSectionIndex(MCSymbol const *Symbol) override;
   void emitCOFFSecRel32(MCSymbol const *Symbol, uint64_t Offset) override;
   void emitCOFFImgRel32(MCSymbol const *Symbol, int64_t Offset) override;
+  void emitCOFFImpCall(MCSymbol const *Symbol) override;
   void emitCommonSymbol(MCSymbol *Symbol, uint64_t Size,
                         Align ByteAlignment) override;
   void emitLocalCommonSymbol(MCSymbol *Symbol, uint64_t Size,
diff --git a/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp b/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
index 26fc75c0578ec2..6744e7cd2ecfcf 100644
--- a/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
@@ -908,6 +908,9 @@ EmitSchedule(MachineBasicBlock::iterator &InsertPos) {
         It->setMMRAMetadata(MF, MMRA);
     }
 
+    if (auto CalledGlobal = DAG->getCalledGlobal(Node))
+      MF.addCalledGlobal(MI, *CalledGlobal);
+
     return MI;
   };
 
diff --git a/llvm/lib/MC/MCAsmStreamer.cpp b/llvm/lib/MC/MCAsmStreamer.cpp
index 01fe11ed205017..01903e2c1335b9 100644
--- a/llvm/lib/MC/MCAsmStreamer.cpp
+++ b/llvm/lib/MC/MCAsmStreamer.cpp
@@ -209,6 +209,7 @@ class MCAsmStreamer final : public MCStreamer {
   void emitCOFFSectionIndex(MCSymbol const *Symbol) override;
   void emitCOFFSecRel32(MCSymbol const *Symbol, uint64_t Offset) override;
   void emitCOFFImgRel32(MCSymbol const *Symbol, int64_t Offset) override;
+  void emitCOFFImpCall(MCSymbol const *Symbol) override;
   void emitXCOFFLocalCommonSymbol(MCSymbol *LabelSym, uint64_t Size,
                                   MCSymbol *CsectSym, Align Alignment) override;
   void emitXCOFFSymbolLinkageWithVisibility(MCSymbol *Symbol,
@@ -893,6 +894,14 @@ void MCAsmStreamer::emitCOFFImgRel32(MCSymbol const *Symbol, int64_t Offset) {
   EmitEOL();
 }
 
+void MCAsmStreamer::emitCOFFImpCall(MCSymbol const *Symbol) {
+  assert(this->getContext().getObjectFileInfo()->getImportCallSection() &&
+         "This target doesn't have a import call section");
+  OS << "\t.impcall\t";
+  Symbol->print(OS, MAI);
+  EmitEOL();
+}
+
 // We need an XCOFF-specific version of this directive as the AIX syntax
 // requires a QualName argument identifying the csect name and storage mapping
 // class to appear before the alignment if we are specifying it.
diff --git a/llvm/lib/MC/MCObjectFileInfo.cpp b/llvm/lib/MC/MCObjectFileInfo.cpp
index f37e138edc36b1..150e38a94db6a6 100644
--- a/llvm/lib/MC/MCObjectFileInfo.cpp
+++ b/llvm/lib/MC/MCObjectFileInfo.cpp
@@ -596,6 +596,11 @@ void MCObjectFileInfo::initCOFFMCObjectFileInfo(const Triple &T) {
                                           COFF::IMAGE_SCN_MEM_READ);
   }
 
+  if (T.getArch() == Triple::aarch64) {
+    ImportCallSection =
+        Ctx->getCOFFSection(".impcall", COFF::IMAGE_SCN_LNK_INFO);
+  }
+
   // Debug info.
   COFFDebugSymbolsSection =
       Ctx->getCOFFSection(".debug$S", (COFF::IMAGE_SCN_MEM_DISCARDABLE |
diff --git a/llvm/lib/MC/MCParser/COFFAsmParser.cpp b/llvm/lib/MC/MCParser/COFFAsmParser.cpp
index 4d95a720852835..48f03ec0d3a847 100644
--- a/llvm/lib/MC/MCParser/COFFAsmParser.cpp
+++ b/llvm/lib/MC/MCParser/COFFAsmParser.cpp
@@ -12,6 +12,7 @@
 #include "llvm/BinaryFormat/COFF.h"
 #include "llvm/MC/MCContext.h"
 #include "llvm/MC/MCDirectives.h"
+#include "llvm/MC/MCObjectFileInfo.h"
 #include "llvm/MC/MCParser/MCAsmLexer.h"
 #include "llvm/MC/MCParser/MCAsmParserExtension.h"
 #include "llvm/MC/MCSectionCOFF.h"
@@ -70,6 +71,7 @@ class COFFAsmParser : public MCAsmParserExtension {
     addDirectiveHandler<&COFFAsmParser::parseDirectiveSymbolAttribute>(
         ".weak_anti_dep");
     addDirectiveHandler<&COFFAsmParser::parseDirectiveCGProfile>(".cg_profile");
+    addDirectiveHandler<&COFFAsmParser::parseDirectiveImpCall>(".impcall");
 
     // Win64 EH directives.
     addDirectiveHandler<&COFFAsmParser::parseSEHDirectiveStartProc>(
@@ -126,6 +128,7 @@ class COFFAsmParser : public MCAsmParserExtension {
   bool parseDirectiveLinkOnce(StringRef, SMLoc);
   bool parseDirectiveRVA(StringRef, SMLoc);
   bool parseDirectiveCGProfile(StringRef, SMLoc);
+  bool parseDirectiveImpCall(StringRef, SMLoc);
 
   // Win64 EH directives.
   bool parseSEHDirectiveStartProc(StringRef, SMLoc);
@@ -577,6 +580,24 @@ bool COFFAsmParser::parseDirectiveSymIdx(StringRef, SMLoc) {
   return false;
 }
 
+bool COFFAsmParser::parseDirectiveImpCall(StringRef, SMLoc) {
+  if (!getContext().getObjectFileInfo()->getImportCallSection())
+    return TokError("target doesn't have an import call section");
+
+  StringRef SymbolID;
+  if (getParser().parseIdentifier(SymbolID))
+    return TokError("expected identifier in directive");
+
+  if (getLexer().isNot(AsmToken::EndOfStatement))
+    return TokError("unexpected token in directive");
+
+  MCSymbol *Symbol = getContext().getOrCreateSymbol(SymbolID);
+
+  Lex();
+  getStreamer().emitCOFFImpCall(Symbol);
+  return false;
+}
+
 /// ::= [ identifier ]
 bool COFFAsmParser::parseCOMDATType(COFF::COMDATType &Type) {
   StringRef TypeId = getTok().getIdentifier();
diff --git a/llvm/lib/MC/MCStreamer.cpp b/llvm/lib/MC/MCStreamer.cpp
index ccf65df150e786..ee26fc07313f18 100644
--- a/llvm/lib/MC/MCStreamer.cpp
+++ b/llvm/lib/MC/MCStreamer.cpp
@@ -1023,6 +1023,8 @@ void MCStreamer::emitCOFFSecRel32(MCSymbol const *Symbol, uint64_t Offset) {}
 
 void MCStreamer::emitCOFFImgRel32(MCSymbol const *Symbol, int64_t Offset) {}
 
+void MCStreamer::emitCOFFImpCall(MCSymbol const *Symbol) {}
+
 /// EmitRawText - If this file is backed by an assembly streamer, this dumps
 /// the specified string in the output .s file.  This capability is
 /// indicated by the hasRawTextSupport() predicate.
diff --git a/llvm/lib/MC/MCWinCOFFStreamer.cpp b/llvm/lib/MC/MCWinCOFFStreamer.cpp
index 395d4db3103d78..71e31f3288c001 100644
--- a/llvm/lib/MC/MCWinCOFFStreamer.cpp
+++ b/llvm/lib/MC/MCWinCOFFStreamer.cpp
@@ -280,6 +280,15 @@ void MCWinCOFFStreamer::emitCOFFImgRel32(const MCSymbol *Symbol,
   DF->appendContents(4, 0);
 }
 
+void MCWinCOFFStreamer::emitCOFFImpCall(MCSymbol const *Symbol) {
+  assert(this->getContext().getObjectFileInfo()->getImportCallSection() &&
+         "This target doesn't have a import call section");
+
+  auto *DF = getOrCreateDataFragment();
+  getAssembler().registerSymbol(*Symbol);
+  getWriter().recordImportCall(*DF, Symbol);
+}
+
 void MCWinCOFFStreamer::emitCommonSymbol(MCSymbol *S, uint64_t Size,
                                          Align ByteAlignment) {
   auto *Symbol = cast<MCSymbolCOFF>(S);
diff --git a/llvm/lib/MC/WinCOFFObjectWriter.cpp b/llvm/lib/MC/WinCOFFObjectWriter.cpp
index 09d2b08e43050f..527464fa54ce02 100644
--- a/llvm/lib/MC/WinCOFFObjectWriter.cpp
+++ b/llvm/lib/MC/WinCOFFObjectWriter.cpp
@@ -23,6 +23,7 @@
 #include "llvm/MC/MCExpr.h"
 #include "llvm/MC/MCFixup.h"
 #include "llvm/MC/MCFragment.h"
+#include "llvm/MC/MCObjectFileInfo.h"
 #include "llvm/MC/MCObjectWriter.h"
 #include "llvm/MC/MCSection.h"
 #include "llvm/MC/MCSectionCOFF.h"
@@ -147,6 +148,13 @@ class llvm::WinCOFFWriter {
   bool UseBigObj;
   bool UseOffsetLabels = false;
 
+  struct ImportCall {
+    unsigned CallsiteOffset;
+    const MCSymbol *CalledSymbol;
+  };
+  using importcall_map = MapVector<MCSection *, std::vector<ImportCall>>;
+  importcall_map SectionToImportCallsMap;
+
 public:
   enum DwoMode {
     AllSections,
@@ -163,6 +171,11 @@ class llvm::WinCOFFWriter {
                         const MCFixup &Fixup, MCValue Target,
                         uint64_t &FixedValue);
   uint64_t writeObject(MCAssembler &Asm);
+  void generateAArch64ImportCallSection(llvm::MCAssembler &Asm);
+  void recordImportCall(const MCDataFragment &FB, const MCSymbol *Symbol);
+  bool hasRecordedImportCalls() const {
+    return !SectionToImportCallsMap.empty();
+  }
 
 private:
   COFFSymbol *createSymbol(StringRef Name);
@@ -1097,6 +1110,17 @@ uint64_t WinCOFFWriter::writeObject(MCAssembler &Asm) {
     }
   }
 
+  // Create the contents of the import call section.
+  if (hasRecordedImportCalls()) {
+    switch (Asm.getContext().getTargetTriple().getArch()) {
+    case Triple::aarch64:
+      generateAArch64ImportCallSection(Asm);
+      break;
+    default:
+      llvm_unreachable("unsupported architecture for import call section");
+    }
+  }
+
   assignFileOffsets(Asm);
 
   // MS LINK expects to be able to use this timestamp to implement their
@@ -1143,6 +1167,51 @@ uint64_t WinCOFFWriter::writeObject(MCAssembler &Asm) {
   return W.OS.tell() - StartOffset;
 }
 
+void llvm::WinCOFFWriter::generateAArch64ImportCallSection(
+    llvm::MCAssembler &Asm) {
+  auto *ImpCallSection =
+      Asm.getContext().getObjectFileInfo()->getImportCallSection();
+
+  if (!SectionMap.contains(ImpCallSection)) {
+    Asm.getContext().reportError(SMLoc(),
+                                 ".impcall directives were used, but no "
+                                 "existing .impcall section exists");
+    return;
+  }
+
+  auto *Frag = cast<MCDataFragment>(ImpCallSection->curFragList()->Head);
+  raw_svector_ostream OS(Frag->getContents());
+
+  // Layout of this section is:
+  // Per section that contains calls to imported functions:
+  //  uint32_t SectionSize: Size in bytes for information in this section.
+  //  uint32_t Section Number
+  //  Per call to imported function in section:
+  //    uint32_t Kind: the kind of imported function.
+  //    uint32_t BranchOffset: the offset of the branch instruction in its
+  //                            parent section.
+  //    uint32_t TargetSymbolId: the symbol id of the called function.
+
+  // Per section that contained eligible targets...
+  for (auto &[Section, Targets] : SectionToImportCallsMap) {
+    unsigned SectionSize = sizeof(uint32_t) * (2 + 3 * Targets.size());
+    support::endian::write(OS, SectionSize, W.Endian);
+    support::endian::write(OS, SectionMap.at(Section)->Number, W.Endian);
+    for (auto &[BranchOffset, TargetSymbol] : Targets) {
+      // Kind is always IMAGE_REL_ARM64_DYNAMIC_IMPORT_CALL (0x13).
+      support::endian::write(OS, 0x13, W.Endian);
+      support::endian::write(OS, BranchOffset, W.Endian);
+      support::endian::write(OS, TargetSymbol->getIndex(), W.Endian);
+    }
+  }
+}
+
+void WinCOFFWriter::recordImportCall(const MCDataFragment &FB,
+                                     const MCSymbol *Symbol) {
+  auto &SectionData = SectionToImportCallsMap[FB.getParent()];
+  SectionData.push_back(ImportCall{unsigned(FB.getContents().size()), Symbol});
+}
+
 //------------------------------------------------------------------------------
 // WinCOFFObjectWriter class implementation
 
@@ -1194,6 +1263,15 @@ uint64_t WinCOFFObjectWriter::writeObject(MCAssembler &Asm) {
   return TotalSize;
 }
 
+void WinCOFFObjectWriter::recordImportCall(const MCDataFragment &FB,
+                                           const MCSymbol *Symbol) {
+  ObjWriter->recordImportCall(FB, Symbol);
+}
+
+bool WinCOFFObjectWriter::hasRecordedImportCalls() const {
+  return ObjWriter->hasRecordedImportCalls();
+}
+
 MCWinCOFFObjectTargetWriter::MCWinCOFFObjectTargetWriter(unsigned Machine_)
     : Machine(Machine_) {}
 
diff --git a/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp b/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
index 9bec782ca8ce97..4c03f876465051 100644
--- a/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
+++ b/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
@@ -77,6 +77,11 @@ static cl::opt<PtrauthCheckMode> PtrauthAuthChecks(
     cl::desc("Check pointer authentication auth/resign failures"),
     cl::init(Default));
 
+static cl::opt<bool> EnableImportCallOptimization(
+    "aarch64-win-import-call-optimization", cl::Hidden,
+    cl::desc("Enable import call optimization for AArch64 Windows"),
+    cl::init(false));
+
 #define DEBUG_TYPE "asm-printer"
 
 namespace {
@@ -293,6 +298,11 @@ class AArch64AsmPrinter : public AsmPrinter {
                               MCSymbol *LazyPointer) override;
   void emitMachOIFuncStubHelperBody(Module &M, const GlobalIFunc &GI,
                                     MCSymbol *LazyPointer) override;
+
+  /// Checks if this instruction is part of a sequence that is eligle for import
+  /// call optimization and, if so, records it to be emitted in the import call
+  /// section.
+  void recordIfImportCall(const MachineInstr *BranchInst);
 };
 
 } // end anonymous namespace
@@ -921,6 +931,15 @@ void AArch64AsmPrinter::emitEndOfAsmFile(Module &M) {
   // Emit stack and fault map information.
   FM.serializeToFaultMapSection();
 
+  // If import call optimization is enabled, emit the appropriate section.
+  // We do this whether or not we recorded any import calls.
+  if (EnableImportCallOptimization && TT.isOSBinFormatCOFF()) {
+    OutStreamer->switchSection(getObjFileLowering().getImportCallSection());
+
+    // Section always starts with some magic.
+    constexpr char ImpCallMagic[12] = "Imp_Call_V1";
+    OutStreamer->emitBytes(StringRef(ImpCallMagic, sizeof(ImpCallMagic)));
+  }
 }
 
 void AArch64AsmPrinter::emitLOHs() {
@@ -2693,6 +2712,7 @@ void AArch64AsmPrinter::emitInstruction(const MachineInstr *MI) {
   case AArch64::TCRETURNrinotx16:
   case AArch64::TCRETURNriALL: {
     emitPtrauthTailCallHardening(MI);
+    recordIfImportCall(MI);
 
     MCInst TmpInst;
     TmpInst.setOpcode(AArch64::BR);
@@ -2702,6 +2722,7 @@ void AArch64AsmPrinter::emitInstruction(const MachineInstr *MI) {
   }
   case AArch64::TCRETURNdi: {
     emitPtrauthTailCallHardening(MI);
+    recordIfImportCall(MI);
 
     MCOperand Dest;
     MCInstLowering.lowerOperand(MI->getOperand(0), Dest);
@@ -3035,6 +3056,14 @@ void AArch64AsmPrinter::emitInstruction(const MachineInstr *MI) {
     TS->emitARM64WinCFISaveAnyRegQPX(MI->getOperand(0).getImm(),
                                      -MI->getOperand(2).getImm());
     return;
+
+  case AArch64::BLR:
+  case AArch64::BR:
+    recordIfImportCall(MI);
+    MCInst TmpInst;
+    MCInstLowering.Lower(MI, TmpInst);
+    EmitToStreamer(*OutStreamer, TmpInst);
+    return;
   }
 
   // Finally, do the automated lowerings for everything else.
@@ -3043,6 +3072,19 @@ void AArch64AsmPrinter::emitInstruction(const MachineInstr *MI) {
   EmitToStreamer(*OutStreamer, TmpInst);
 }
 
+void AArch64AsmPrinter::recordIfImportCall(
+    const llvm::MachineInstr *BranchInst) {
+  if (!EnableImportCallOptimization ||
+      !TM.getTargetTriple().isOSBinFormatCOFF())
+    return;
+
+  auto [GV, OpFlags] = BranchInst->getMF()->tryGetCalledGlobal(BranchInst);
+  if (GV && GV->hasDLLImportStorageClass()) {
+    OutStreamer->emitCOFFImpCall(
+        MCInstLowering.GetGlobalValueSymbol(GV, OpFlags));
+  }
+}
+
 void AArch64AsmPrinter::emitMachOIFuncStubBody(Module &M, const GlobalIFunc &GI,
                                                MCSymbol *LazyPointer) {
   // _ifunc:
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 070163a5fb29...
[truncated]


bool COFFAsmParser::parseDirectiveImpCall(StringRef, SMLoc) {
if (!getContext().getObjectFileInfo()->getImportCallSection())
return TokError("target doesn't have an import call section");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid ' in error messages

// CHECK-NEXT: 0x[[#%x,TCOFFSET - 8]] IMAGE_REL_ARM64_PAGEBASE_REL21 __imp_b ([[#%u,TCSYM]])
// CHECK-NEXT: 0x[[#%x,TCOFFSET - 4]] IMAGE_REL_ARM64_PAGEOFFSET_12L __imp_b ([[#%u,TCSYM]])
// CHECK-NEXT: }
// CHECK-NEXT: ] No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

End of file whitespace error

Comment on lines +588 to +605
if (getParser().parseIdentifier(SymbolID))
return TokError("expected identifier in directive");

if (getLexer().isNot(AsmToken::EndOfStatement))
return TokError("unexpected token in directive");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think these error cases are tested

.impcall __imp_b
br x8

// CHECK: error: .impcall directives were used, but no existing .impcall section exists
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to include a source location

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, no: the .impcall section may appear after the .impcall directives, so I'd need to keep track of the locations of all (or at least one) observed .impcall directives, which is not easy to do with the way asm parsing vs object writing is setup.

@mstorsjo mstorsjo requested review from cjacek and mstorsjo January 3, 2025 10:23
@mstorsjo
Copy link
Member

mstorsjo commented Jan 3, 2025

A couple questions on the mechanism:

  • As the flag /d2ImportCallOptimization is undocumented, I presume that this format is undocumnted too. This covers two separate formats as far as I can see - the .impcall section contents, which this PR generates, and which the linker consumes. This format is mostly a convention - an agreement between compiler and linker; this currently uses the Imp_Call_V1 identifier. And then secondly, the dynamic relocations that the linker generates, which ends up in the final binary, which is consumed by the windows loader. This format isn't touched upon here (as this only covers code generation, not linking). The format of those dynamic relocations is much more fixed, as it's handled by the OS. Although this is only an optimization, so running on older Windows versions which doesn't recognize it, should be fine too?
  • When the linker uses these dynamic relocations, it changes a br or blr indirect branch instruction into a direct b or bl, if the target address is close enough within the address space - right? (The other instructions for loading the address are left untouched, as those instructions can be anywhere detached from the branch, and we can't speculate on whether the register contents is needed elsewhere.)
  • As the 64 bit address space is kinda large, and the b and bl instructions only have a +/- 128 MB range, I guess that this optimization simply can't be applied, if the target is too far away? Intuitively, it feels like this wouldn't end up effective all that often? Then again, I guess most DLLs are loaded close to each other in the virtual memory layout, and the base EXE, with dynamic base enabled (as always on aarch64) also would end up somewhere close.
  • Is the only thing we gain here, the performance benefits of a direct branch, which is easier for the branch predictor, compared to an indirect branch? While that obviously is better, this feels like a whole lot of extra work and structures, for something that feels like relatively small gains. Is there something else to be gained in relation to this as well that I'm missing (and/or that isn't mentioned yet)? Or is the gains from better branch prediction much bigger than what I'm thinking here?
  • For cases when dll imported functions are called without being marked as dllimport in headers, we jump via a thunk (from the import library, and/or linker generated). Can the same thing be applied to them? Does that require including similar .impcall sections in the import libraries for the thunks? (In the case of lld, it actually doesn't use the import library contents here but just synthesize it on its own - there we could do the same right away without needing to update import libraries with this metadata.)

@dpaoliello
Copy link
Contributor Author

As the flag /d2ImportCallOptimization is undocumented, I presume that this format is undocumnted too. This covers two separate formats as far as I can see - the .impcall section contents, which this PR generates, and which the linker consumes. This format is mostly a convention - an agreement between compiler and linker; this currently uses the Imp_Call_V1 identifier. And then secondly, the dynamic relocations that the linker generates, which ends up in the final binary, which is consumed by the windows loader. This format isn't touched upon here (as this only covers code generation, not linking). The format of those dynamic relocations is much more fixed, as it's handled by the OS. Although this is only an optimization, so running on older Windows versions which doesn't recognize it, should be fine too?

Correct on all accounts:

@dpaoliello
Copy link
Contributor Author

When the linker uses these dynamic relocations, it changes a br or blr indirect branch instruction into a direct b or bl, if the target address is close enough within the address space - right? (The other instructions for loading the address are left untouched, as those instructions can be anywhere detached from the branch, and we can't speculate on whether the register contents is needed elsewhere.)
As the 64 bit address space is kinda large, and the b and bl instructions only have a +/- 128 MB range, I guess that this optimization simply can't be applied, if the target is too far away? Intuitively, it feels like this wouldn't end up effective all that often? Then again, I guess most DLLs are loaded close to each other in the virtual memory layout, and the base EXE, with dynamic base enabled (as always on aarch64) also would end up somewhere close.

The linker doesn't do the rewriting, the kernel-mode loader does, but yes it will rewrite the indirect branches to direct branches. Since it is the loader doing this work, it knows how far away the target address is and, therefore, if the instruction can be rewritten or not. Since this is an optimization, the loader can choose not to rewrite the instruction without affecting correctness.

@dpaoliello
Copy link
Contributor Author

Is the only thing we gain here, the performance benefits of a direct branch, which is easier for the branch predictor, compared to an indirect branch? While that obviously is better, this feels like a whole lot of extra work and structures, for something that feels like relatively small gains. Is there something else to be gained in relation to this as well that I'm missing (and/or that isn't mentioned yet)? Or is the gains from better branch prediction much bigger than what I'm thinking here?

I don't have the exact numbers, but we did see significant performance improvement within the Windows kernel for some components (which were especially chatty with other .sys files) when MSVC added this optimization. I'll see if I can get one of my colleagues to provide more details.

@dpaoliello
Copy link
Contributor Author

For cases when dll imported functions are called without being marked as dllimport in headers, we jump via a thunk (from the import library, and/or linker generated). Can the same thing be applied to them? Does that require including similar .impcall sections in the import libraries for the thunks? (In the case of lld, it actually doesn't use the import library contents here but just synthesize it on its own - there we could do the same right away without needing to update import libraries with this metadata.)

I'm not 100% sure how MSVC handles this - there are no changes to import libraries (i.e., there's no .impcall section), so I'm guessing it would be done within the linker.

If lld supports generating the Dynamic Value Relocation Table, there's no reason it can't add additional entries in there for other indirect branches to imported functions that it knows about.

@dpaoliello
Copy link
Contributor Author

Is the only thing we gain here, the performance benefits of a direct branch, which is easier for the branch predictor, compared to an indirect branch? While that obviously is better, this feels like a whole lot of extra work and structures, for something that feels like relatively small gains. Is there something else to be gained in relation to this as well that I'm missing (and/or that isn't mentioned yet)? Or is the gains from better branch prediction much bigger than what I'm thinking here?

I don't have the exact numbers, but we did see significant performance improvement within the Windows kernel for some components (which were especially chatty with other .sys files) when MSVC added this optimization. I'll see if I can get one of my colleagues to provide more details.

I have some numbers: when applied to the Arm64 Windows Kernel, we saw a 3% throughput improvement on DiskSpd benchmarks for both SnapDragon 8cx Gen2 and Gen3 devices.

@mstorsjo
Copy link
Member

mstorsjo commented Jan 6, 2025

I have some numbers: when applied to the Arm64 Windows Kernel, we saw a 3% throughput improvement on DiskSpd benchmarks for both SnapDragon 8cx Gen2 and Gen3 devices.

Interesting - that's a massive number a change like this. I'm not too familiar with the structure of the kernel though, I presume that there would need to be a huge number of cross-DLL function calls within the kernel for this change to make any measurable difference.

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mechanism for tracking the call from ISel through the assembler seems generally fine. Alternatively, you could add an operand to call instructions, but I don't see any reason to prefer that.

void WinCOFFWriter::recordImportCall(const MCDataFragment &FB,
const MCSymbol *Symbol) {
auto &SectionData = SectionToImportCallsMap[FB.getParent()];
SectionData.push_back(ImportCall{unsigned(FB.getContents().size()), Symbol});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think unsigned(FB.getContents().size()) actually computes the value you want: a section contains multiple fragments, in general. And some of those fragments have a size that can't actually be computed until we do relaxation. (Relaxation is less common on aarch64 than on x86, but it shows up in a few places... in particular, .p2align .)

Probably need to generate a temporary symbol, then compute the offset between the beginning of the section and the symbol when you generate the section.

(Less importantly, the cast implicitly truncates from 64-bit to 32-bit. Unlikely to come up in valid code, but should print a reasonable error.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call - I switched this to using symbols and Asm.getSymbolOffset


// Section always starts with some magic.
constexpr char ImpCallMagic[12] = "Imp_Call_V1";
OutStreamer->emitBytes(StringRef(ImpCallMagic, sizeof(ImpCallMagic)));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer to make the assembler generate the magic number automatically, just so it's harder to mess up for people hand-writing asm.

@efriedma-quic
Copy link
Collaborator

The .impcall section can only be filled in when we are writing the COFF object as it requires the actual section numbers, which are only assigned at that point (i.e., they don't exist during asm printing).

You could possibly add a directive specifically to emit section numbers... that would allow you to move more of the logic into the AsmPrinter. We already have .secidx; that's not quite what you want here, but maybe to give you some idea what it would look like.

Not sure if that's worth doing.

@dpaoliello
Copy link
Contributor Author

The .impcall section can only be filled in when we are writing the COFF object as it requires the actual section numbers, which are only assigned at that point (i.e., they don't exist during asm printing).

You could possibly add a directive specifically to emit section numbers... that would allow you to move more of the logic into the AsmPrinter. We already have .secidx; that's not quite what you want here, but maybe to give you some idea what it would look like.

I considered this for an earlier design, but it ended up being very messy: the section numbers are assigned after the sections have been streamed, so if there is a general purpose "insert section number here" mechanism, it needs to be able to go back and rewrite already-streamed sections. The current design avoids that complexity since we only use the section numbers in the .impcall sections which we pre-allocate but then fill in afterwards.

@efriedma-quic
Copy link
Collaborator

it needs to be able to go back and rewrite already-streamed sections

Yes, a relocation (at the assembler level, not in the resulting object). We have infrastructure to do this if you wanted to.

@dpaoliello
Copy link
Contributor Author

Yes, a relocation (at the assembler level, not in the resulting object). We have infrastructure to do this if you wanted to.

Sure, can you point me to the code?

I'd be happy to get rid of the .impcall directive and do most of the work in the asm printer, as it'll make the x86 implementation easier.

@efriedma-quic
Copy link
Collaborator

Internally it should be an MCFixup. I guess you'd add a new fixup kind.

Maybe follow the code for .secidx to get some idea what that would look like. It's not quite the same, because you need to resolve it instead of emitting a relocation, but before that point it's similar.

@dpaoliello
Copy link
Contributor Author

Ok, switched this over to using section number and offset assembly directives, implemented using new MCTargetExpr derived types (which avoided the need to introduce any new fixup kinds and localizes the changes to just the COFF streamer and object writer).

This also meant that writing the .impcall section is now completely contained in the asm printer, avoiding the need for .impcall directives (and the weird failure cases associated with them) and will make the x86 implementation much easier.

/// Mapping of call instruction to the global value and target flags that it
/// calls, if applicable.
DenseMap<const MachineInstr *, std::pair<const GlobalValue *, unsigned>>
CalledGlobalsMap;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, if we're adding state to MachineFunction, we should add MIR serialization/deserialization (MIRPrinter::print etc.). Hopefully not too hard to implement here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added. Are there explicit tests for this?
(I ran into some crashes while testing that I fixed, so I know that round-tripping is working...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test/CodeGen/MIR is for print/parse tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, thanks! I've added a test.

@dpaoliello dpaoliello force-pushed the impcall branch 3 times, most recently from de6d3da to 2e2c02c Compare January 10, 2025 23:22
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jan 12, 2025

LLVM Buildbot has detected a new failure on builder sanitizer-x86_64-linux-fast running on sanitizer-buildbot4 while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/169/builds/7294

Here is the relevant piece of the build log for the reference
Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 88235 tests, 88 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80
FAIL: LLVM :: CodeGen/AArch64/machine-outliner-throw.ll (28487 of 88235)
******************** TEST 'LLVM :: CodeGen/AArch64/machine-outliner-throw.ll' FAILED ********************
Exit Code: 2

Command Output (stderr):
--
RUN: at line 1: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64 < /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll | /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/FileCheck /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll
+ /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64
+ /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/FileCheck /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll
RUN: at line 2: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64 -stop-after=machine-outliner < /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll | /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/FileCheck /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll -check-prefix=TARGET_FEATURES
+ /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/FileCheck /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll -check-prefix=TARGET_FEATURES
+ /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64 -stop-after=machine-outliner
=================================================================
==1224403==ERROR: AddressSanitizer: use-after-poison on address 0x52100002dc30 at pc 0x58c3d119d462 bp 0x7ffee661bc70 sp 0x7ffee661bc68
READ of size 8 at 0x52100002dc30 thread T0
    #0 0x58c3d119d461 in getParent /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/CodeGen/MachineInstr.h:347:55
    #1 0x58c3d119d461 in llvm::MIRPrinter::convertCalledGlobals(llvm::yaml::MachineFunction&, llvm::MachineFunction const&, llvm::MachineModuleSlotTracker&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MIRPrinter.cpp:609:25
    #2 0x58c3d1191cfb in llvm::MIRPrinter::print(llvm::MachineFunction const&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MIRPrinter.cpp:275:3
    #3 0x58c3d11a7c12 in llvm::printMIR(llvm::raw_ostream&, llvm::MachineModuleInfo const&, llvm::MachineFunction const&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MIRPrinter.cpp:1071:11
    #4 0x58c3d11f275e in (anonymous namespace)::MIRPrintingPass::runOnMachineFunction(llvm::MachineFunction&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MIRPrintingPass.cpp:65:5
    #5 0x58c3d0db97b1 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:94:13
    #6 0x58c3d1c78a03 in llvm::FPPassManager::runOnFunction(llvm::Function&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1406:27
    #7 0x58c3d1c8fabe in llvm::FPPassManager::runOnModule(llvm::Module&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1452:16
    #8 0x58c3d1c7a5fa in runOnModule /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1521:27
    #9 0x58c3d1c7a5fa in llvm::legacy::PassManagerImpl::run(llvm::Module&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:539:44
    #10 0x58c3cbe6bfdc in compileModule(char**, llvm::LLVMContext&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/llc/llc.cpp:751:8
    #11 0x58c3cbe6652f in main /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/llc/llc.cpp:411:22
    #12 0x7c418222a3b7  (/lib/x86_64-linux-gnu/libc.so.6+0x2a3b7) (BuildId: 5f3f024b472f38389da3a2f567b3d0eaa8835ca2)
    #13 0x7c418222a47a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a47a) (BuildId: 5f3f024b472f38389da3a2f567b3d0eaa8835ca2)
    #14 0x58c3cbd79da4 in _start (/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llc+0xb4beda4)

0x52100002dc30 is located 2864 bytes inside of 4096-byte region [0x52100002d100,0x52100002e100)
allocated by thread T0 here:
    #0 0x58c3cbe59a82 in operator new(unsigned long, std::align_val_t) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/asan_new_delete.cpp:98:3
    #1 0x58c3d368f6bd in llvm::allocate_buffer(unsigned long, unsigned long) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/MemAlloc.cpp:16:10
    #2 0x58c3cc2f4b59 in Allocate /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Support/AllocatorBase.h:92:12
    #3 0x58c3cc2f4b59 in StartNewSlab /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Support/Allocator.h:346:42
    #4 0x58c3cc2f4b59 in llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, 4096ul, 4096ul, 128ul>::AllocateSlow(unsigned long, unsigned long, llvm::Align) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Support/Allocator.h:202:5
    #5 0x58c3d0dd4728 in Allocate /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Support/Allocator.h:216:12
Step 10 (stage2/asan_ubsan check) failure: stage2/asan_ubsan check (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 88235 tests, 88 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80
FAIL: LLVM :: CodeGen/AArch64/machine-outliner-throw.ll (28487 of 88235)
******************** TEST 'LLVM :: CodeGen/AArch64/machine-outliner-throw.ll' FAILED ********************
Exit Code: 2

Command Output (stderr):
--
RUN: at line 1: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64 < /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll | /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/FileCheck /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll
+ /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64
+ /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/FileCheck /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll
RUN: at line 2: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64 -stop-after=machine-outliner < /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll | /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/FileCheck /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll -check-prefix=TARGET_FEATURES
+ /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/FileCheck /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll -check-prefix=TARGET_FEATURES
+ /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64 -stop-after=machine-outliner
=================================================================
==1224403==ERROR: AddressSanitizer: use-after-poison on address 0x52100002dc30 at pc 0x58c3d119d462 bp 0x7ffee661bc70 sp 0x7ffee661bc68
READ of size 8 at 0x52100002dc30 thread T0
    #0 0x58c3d119d461 in getParent /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/CodeGen/MachineInstr.h:347:55
    #1 0x58c3d119d461 in llvm::MIRPrinter::convertCalledGlobals(llvm::yaml::MachineFunction&, llvm::MachineFunction const&, llvm::MachineModuleSlotTracker&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MIRPrinter.cpp:609:25
    #2 0x58c3d1191cfb in llvm::MIRPrinter::print(llvm::MachineFunction const&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MIRPrinter.cpp:275:3
    #3 0x58c3d11a7c12 in llvm::printMIR(llvm::raw_ostream&, llvm::MachineModuleInfo const&, llvm::MachineFunction const&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MIRPrinter.cpp:1071:11
    #4 0x58c3d11f275e in (anonymous namespace)::MIRPrintingPass::runOnMachineFunction(llvm::MachineFunction&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MIRPrintingPass.cpp:65:5
    #5 0x58c3d0db97b1 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:94:13
    #6 0x58c3d1c78a03 in llvm::FPPassManager::runOnFunction(llvm::Function&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1406:27
    #7 0x58c3d1c8fabe in llvm::FPPassManager::runOnModule(llvm::Module&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1452:16
    #8 0x58c3d1c7a5fa in runOnModule /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1521:27
    #9 0x58c3d1c7a5fa in llvm::legacy::PassManagerImpl::run(llvm::Module&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:539:44
    #10 0x58c3cbe6bfdc in compileModule(char**, llvm::LLVMContext&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/llc/llc.cpp:751:8
    #11 0x58c3cbe6652f in main /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/llc/llc.cpp:411:22
    #12 0x7c418222a3b7  (/lib/x86_64-linux-gnu/libc.so.6+0x2a3b7) (BuildId: 5f3f024b472f38389da3a2f567b3d0eaa8835ca2)
    #13 0x7c418222a47a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a47a) (BuildId: 5f3f024b472f38389da3a2f567b3d0eaa8835ca2)
    #14 0x58c3cbd79da4 in _start (/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llc+0xb4beda4)

0x52100002dc30 is located 2864 bytes inside of 4096-byte region [0x52100002d100,0x52100002e100)
allocated by thread T0 here:
    #0 0x58c3cbe59a82 in operator new(unsigned long, std::align_val_t) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/asan_new_delete.cpp:98:3
    #1 0x58c3d368f6bd in llvm::allocate_buffer(unsigned long, unsigned long) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/MemAlloc.cpp:16:10
    #2 0x58c3cc2f4b59 in Allocate /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Support/AllocatorBase.h:92:12
    #3 0x58c3cc2f4b59 in StartNewSlab /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Support/Allocator.h:346:42
    #4 0x58c3cc2f4b59 in llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, 4096ul, 4096ul, 128ul>::AllocateSlow(unsigned long, unsigned long, llvm::Align) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Support/Allocator.h:202:5
    #5 0x58c3d0dd4728 in Allocate /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Support/Allocator.h:216:12

@llvm-ci
Copy link
Collaborator

llvm-ci commented Jan 12, 2025

LLVM Buildbot has detected a new failure on builder sanitizer-x86_64-linux-bootstrap-asan running on sanitizer-buildbot1 while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/52/builds/5179

Here is the relevant piece of the build log for the reference
Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 88235 tests, 88 workers --
Testing:  0.. 10.. 20.. 
FAIL: LLVM :: CodeGen/AArch64/machine-outliner-throw.ll (28051 of 88235)
******************** TEST 'LLVM :: CodeGen/AArch64/machine-outliner-throw.ll' FAILED ********************
Exit Code: 2

Command Output (stderr):
--
RUN: at line 1: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64 < /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll | /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll
RUN: at line 2: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64 -stop-after=machine-outliner < /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll | /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll -check-prefix=TARGET_FEATURES
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll -check-prefix=TARGET_FEATURES
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64 -stop-after=machine-outliner
=================================================================
==273570==ERROR: AddressSanitizer: use-after-poison on address 0x78bfbba4dc30 at pc 0x62c4365ab02b bp 0x7ffc42c302b0 sp 0x7ffc42c302a8
READ of size 8 at 0x78bfbba4dc30 thread T0
    #0 0x62c4365ab02a in getParent /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/CodeGen/MachineInstr.h:347:55
    #1 0x62c4365ab02a in llvm::MIRPrinter::convertCalledGlobals(llvm::yaml::MachineFunction&, llvm::MachineFunction const&, llvm::MachineModuleSlotTracker&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MIRPrinter.cpp:609:25
    #2 0x62c43659f0ac in llvm::MIRPrinter::print(llvm::MachineFunction const&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MIRPrinter.cpp:275:3
    #3 0x62c4365b4c6a in llvm::printMIR(llvm::raw_ostream&, llvm::MachineModuleInfo const&, llvm::MachineFunction const&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MIRPrinter.cpp:1071:11
    #4 0x62c4365f85f0 in (anonymous namespace)::MIRPrintingPass::runOnMachineFunction(llvm::MachineFunction&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MIRPrintingPass.cpp:65:5
    #5 0x62c4361cc1c7 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:94:13
    #6 0x62c437030bfd in llvm::FPPassManager::runOnFunction(llvm::Function&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1406:27
    #7 0x62c4370477d1 in llvm::FPPassManager::runOnModule(llvm::Module&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1452:16
    #8 0x62c43703222a in runOnModule /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1521:27
    #9 0x62c43703222a in llvm::legacy::PassManagerImpl::run(llvm::Module&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:539:44
    #10 0x62c4311ba217 in compileModule(char**, llvm::LLVMContext&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/tools/llc/llc.cpp:751:8
    #11 0x62c4311b546c in main /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/tools/llc/llc.cpp:411:22
    #12 0x7aafbc82a3b7  (/lib/x86_64-linux-gnu/libc.so.6+0x2a3b7) (BuildId: 5f3f024b472f38389da3a2f567b3d0eaa8835ca2)
    #13 0x7aafbc82a47a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a47a) (BuildId: 5f3f024b472f38389da3a2f567b3d0eaa8835ca2)
    #14 0x62c4310be924 in _start (/home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llc+0x8823924)

0x78bfbba4dc30 is located 2864 bytes inside of 4096-byte region [0x78bfbba4d100,0x78bfbba4e100)
allocated by thread T0 here:
    #0 0x62c4311a4bb2 in operator new(unsigned long, std::align_val_t) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/compiler-rt/lib/asan/asan_new_delete.cpp:98:3
    #1 0x62c43169acae in Allocate /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/Support/AllocatorBase.h:92:12
    #2 0x62c43169acae in StartNewSlab /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/Support/Allocator.h:346:42
    #3 0x62c43169acae in llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, 4096ul, 4096ul, 128ul>::AllocateSlow(unsigned long, unsigned long, llvm::Align) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/Support/Allocator.h:202:5
    #4 0x62c4361e72af in allocateOperandArray /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/CodeGen/MachineFunction.h:1117:28
    #5 0x62c4361e72af in llvm::MachineInstr::MachineInstr(llvm::MachineFunction&, llvm::MCInstrDesc const&, llvm::DebugLoc, bool) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MachineInstr.cpp:108:19
Step 11 (stage2/asan check) failure: stage2/asan check (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 88235 tests, 88 workers --
Testing:  0.. 10.. 20.. 
FAIL: LLVM :: CodeGen/AArch64/machine-outliner-throw.ll (28051 of 88235)
******************** TEST 'LLVM :: CodeGen/AArch64/machine-outliner-throw.ll' FAILED ********************
Exit Code: 2

Command Output (stderr):
--
RUN: at line 1: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64 < /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll | /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll
RUN: at line 2: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64 -stop-after=machine-outliner < /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll | /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll -check-prefix=TARGET_FEATURES
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll -check-prefix=TARGET_FEATURES
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64 -stop-after=machine-outliner
=================================================================
==273570==ERROR: AddressSanitizer: use-after-poison on address 0x78bfbba4dc30 at pc 0x62c4365ab02b bp 0x7ffc42c302b0 sp 0x7ffc42c302a8
READ of size 8 at 0x78bfbba4dc30 thread T0
    #0 0x62c4365ab02a in getParent /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/CodeGen/MachineInstr.h:347:55
    #1 0x62c4365ab02a in llvm::MIRPrinter::convertCalledGlobals(llvm::yaml::MachineFunction&, llvm::MachineFunction const&, llvm::MachineModuleSlotTracker&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MIRPrinter.cpp:609:25
    #2 0x62c43659f0ac in llvm::MIRPrinter::print(llvm::MachineFunction const&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MIRPrinter.cpp:275:3
    #3 0x62c4365b4c6a in llvm::printMIR(llvm::raw_ostream&, llvm::MachineModuleInfo const&, llvm::MachineFunction const&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MIRPrinter.cpp:1071:11
    #4 0x62c4365f85f0 in (anonymous namespace)::MIRPrintingPass::runOnMachineFunction(llvm::MachineFunction&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MIRPrintingPass.cpp:65:5
    #5 0x62c4361cc1c7 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:94:13
    #6 0x62c437030bfd in llvm::FPPassManager::runOnFunction(llvm::Function&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1406:27
    #7 0x62c4370477d1 in llvm::FPPassManager::runOnModule(llvm::Module&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1452:16
    #8 0x62c43703222a in runOnModule /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1521:27
    #9 0x62c43703222a in llvm::legacy::PassManagerImpl::run(llvm::Module&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:539:44
    #10 0x62c4311ba217 in compileModule(char**, llvm::LLVMContext&) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/tools/llc/llc.cpp:751:8
    #11 0x62c4311b546c in main /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/tools/llc/llc.cpp:411:22
    #12 0x7aafbc82a3b7  (/lib/x86_64-linux-gnu/libc.so.6+0x2a3b7) (BuildId: 5f3f024b472f38389da3a2f567b3d0eaa8835ca2)
    #13 0x7aafbc82a47a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a47a) (BuildId: 5f3f024b472f38389da3a2f567b3d0eaa8835ca2)
    #14 0x62c4310be924 in _start (/home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/llc+0x8823924)

0x78bfbba4dc30 is located 2864 bytes inside of 4096-byte region [0x78bfbba4d100,0x78bfbba4e100)
allocated by thread T0 here:
    #0 0x62c4311a4bb2 in operator new(unsigned long, std::align_val_t) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/compiler-rt/lib/asan/asan_new_delete.cpp:98:3
    #1 0x62c43169acae in Allocate /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/Support/AllocatorBase.h:92:12
    #2 0x62c43169acae in StartNewSlab /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/Support/Allocator.h:346:42
    #3 0x62c43169acae in llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, 4096ul, 4096ul, 128ul>::AllocateSlow(unsigned long, unsigned long, llvm::Align) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/Support/Allocator.h:202:5
    #4 0x62c4361e72af in allocateOperandArray /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/include/llvm/CodeGen/MachineFunction.h:1117:28
    #5 0x62c4361e72af in llvm::MachineInstr::MachineInstr(llvm::MachineFunction&, llvm::MCInstrDesc const&, llvm::DebugLoc, bool) /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/lib/CodeGen/MachineInstr.cpp:108:19

@llvm-ci
Copy link
Collaborator

llvm-ci commented Jan 12, 2025

LLVM Buildbot has detected a new failure on builder clang-aarch64-sve-vla-2stage running on linaro-g3-03 while building llvm at step 11 "build stage 2".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/41/builds/4469

Here is the relevant piece of the build log for the reference
Step 11 (build stage 2) failure: 'ninja' (failure)
...
[7996/8801] Building CXX object tools/flang/lib/Frontend/CMakeFiles/flangFrontend.dir/TextDiagnostic.cpp.o
[7997/8801] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/compute-offsets.cpp.o
[7998/8801] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/mod-file.cpp.o
[7999/8801] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/check-omp-structure.cpp.o
[8000/8801] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/openmp-modifiers.cpp.o
[8001/8801] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/BoxValue.cpp.o
[8002/8801] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/Complex.cpp.o
[8003/8801] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/data-to-inits.cpp.o
[8004/8801] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/Character.cpp.o
[8005/8801] Building CXX object tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/OpenMP/Utils.cpp.o
FAILED: tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/OpenMP/Utils.cpp.o 
/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage1.install/bin/clang++ -DFLANG_INCLUDE_TESTS=1 -DFLANG_LITTLE_ENDIAN=1 -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/tools/flang/lib/Lower -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/flang/lib/Lower -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/flang/include -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/tools/flang/include -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/include -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/include -isystem /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/../mlir/include -isystem /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/tools/mlir/include -isystem /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/tools/clang/include -isystem /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/../clang/include -mcpu=neoverse-512tvb -mllvm -scalable-vectorization=preferred -mllvm -treat-scalable-fixed-error-as-warning=false -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Wno-deprecated-copy -Wno-string-conversion -Wno-ctad-maybe-unsupported -Wno-unused-command-line-argument -Wstring-conversion           -Wcovered-switch-default -Wno-nested-anon-types -O3 -DNDEBUG -std=c++17  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/OpenMP/Utils.cpp.o -MF tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/OpenMP/Utils.cpp.o.d -o tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/OpenMP/Utils.cpp.o -c /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/flang/lib/Lower/OpenMP/Utils.cpp
Killed
[8006/8801] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/expression.cpp.o
[8007/8801] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/DoLoopHelper.cpp.o
[8008/8801] Building CXX object tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/ConvertCall.cpp.o
FAILED: tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/ConvertCall.cpp.o 
/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage1.install/bin/clang++ -DFLANG_INCLUDE_TESTS=1 -DFLANG_LITTLE_ENDIAN=1 -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/tools/flang/lib/Lower -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/flang/lib/Lower -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/flang/include -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/tools/flang/include -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/include -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/include -isystem /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/../mlir/include -isystem /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/tools/mlir/include -isystem /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/tools/clang/include -isystem /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/../clang/include -mcpu=neoverse-512tvb -mllvm -scalable-vectorization=preferred -mllvm -treat-scalable-fixed-error-as-warning=false -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Wno-deprecated-copy -Wno-string-conversion -Wno-ctad-maybe-unsupported -Wno-unused-command-line-argument -Wstring-conversion           -Wcovered-switch-default -Wno-nested-anon-types -O3 -DNDEBUG -std=c++17  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/ConvertCall.cpp.o -MF tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/ConvertCall.cpp.o.d -o tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/ConvertCall.cpp.o -c /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/flang/lib/Lower/ConvertCall.cpp
Killed
[8009/8801] Building CXX object tools/flang/lib/Optimizer/Analysis/CMakeFiles/FIRAnalysis.dir/AliasAnalysis.cpp.o
[8010/8801] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/CUFCommon.cpp.o
[8011/8801] Building CXX object tools/flang/lib/Optimizer/Analysis/CMakeFiles/FIRAnalysis.dir/TBAAForest.cpp.o
[8012/8801] Building CXX object tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/ConvertVariable.cpp.o
FAILED: tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/ConvertVariable.cpp.o 
/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage1.install/bin/clang++ -DFLANG_INCLUDE_TESTS=1 -DFLANG_LITTLE_ENDIAN=1 -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/tools/flang/lib/Lower -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/flang/lib/Lower -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/flang/include -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/tools/flang/include -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/include -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/include -isystem /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/../mlir/include -isystem /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/tools/mlir/include -isystem /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/tools/clang/include -isystem /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/../clang/include -mcpu=neoverse-512tvb -mllvm -scalable-vectorization=preferred -mllvm -treat-scalable-fixed-error-as-warning=false -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Wno-deprecated-copy -Wno-string-conversion -Wno-ctad-maybe-unsupported -Wno-unused-command-line-argument -Wstring-conversion           -Wcovered-switch-default -Wno-nested-anon-types -O3 -DNDEBUG -std=c++17  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/ConvertVariable.cpp.o -MF tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/ConvertVariable.cpp.o.d -o tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/ConvertVariable.cpp.o -c /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/flang/lib/Lower/ConvertVariable.cpp
Killed
[8013/8801] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/FIRBuilder.cpp.o
[8014/8801] Building CXX object tools/flang/lib/Optimizer/Builder/CMakeFiles/FIRBuilder.dir/HLFIRTools.cpp.o
[8015/8801] Building CXX object tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/CallInterface.cpp.o
FAILED: tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/CallInterface.cpp.o 
/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage1.install/bin/clang++ -DFLANG_INCLUDE_TESTS=1 -DFLANG_LITTLE_ENDIAN=1 -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/tools/flang/lib/Lower -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/flang/lib/Lower -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/flang/include -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/tools/flang/include -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/include -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/include -isystem /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/../mlir/include -isystem /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/tools/mlir/include -isystem /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/tools/clang/include -isystem /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/../clang/include -mcpu=neoverse-512tvb -mllvm -scalable-vectorization=preferred -mllvm -treat-scalable-fixed-error-as-warning=false -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Wno-deprecated-copy -Wno-string-conversion -Wno-ctad-maybe-unsupported -Wno-unused-command-line-argument -Wstring-conversion           -Wcovered-switch-default -Wno-nested-anon-types -O3 -DNDEBUG -std=c++17  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/CallInterface.cpp.o -MF tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/CallInterface.cpp.o.d -o tools/flang/lib/Lower/CMakeFiles/FortranLower.dir/CallInterface.cpp.o -c /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/flang/lib/Lower/CallInterface.cpp
Killed
[8016/8801] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/program-tree.cpp.o
[8017/8801] Building CXX object tools/flang/lib/Frontend/CMakeFiles/flangFrontend.dir/FrontendAction.cpp.o
[8018/8801] Building CXX object tools/flang/lib/Frontend/CMakeFiles/flangFrontend.dir/CompilerInstance.cpp.o
[8019/8801] Building CXX object tools/flang/lib/FrontendTool/CMakeFiles/flangFrontendTool.dir/ExecuteCompilerInvocation.cpp.o
[8020/8801] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/pointer-assignment.cpp.o
[8021/8801] Building CXX object tools/flang/lib/Frontend/CMakeFiles/flangFrontend.dir/CompilerInvocation.cpp.o
[8022/8801] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/tools.cpp.o
[8023/8801] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/resolve-labels.cpp.o
[8024/8801] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/unparse-with-symbols.cpp.o
[8025/8801] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/rewrite-directives.cpp.o
[8026/8801] Building CXX object tools/flang/lib/Evaluate/CMakeFiles/FortranEvaluate.dir/fold-integer.cpp.o
[8027/8801] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/rewrite-parse-tree.cpp.o
[8028/8801] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/runtime-type-info.cpp.o
[8029/8801] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/type.cpp.o
[8030/8801] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/scope.cpp.o
[8031/8801] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/symbol.cpp.o
[8032/8801] Building CXX object tools/flang/lib/Semantics/CMakeFiles/FortranSemantics.dir/resolve-names-utils.cpp.o

@fhahn
Copy link
Contributor

fhahn commented Jan 12, 2025

It looks like the change introduced a use-after-poison https://lab.llvm.org/buildbot/#/builders/169/builds/7294/steps/10/logs/stdio

FAIL: LLVM :: CodeGen/AArch64/machine-outliner-throw.ll (1 of 88242)
******************** TEST 'LLVM :: CodeGen/AArch64/machine-outliner-throw.ll' FAILED ********************
Exit Code: 2
Command Output (stderr):
--
RUN: at line 1: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64 < /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll | /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/FileCheck /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll
+ /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/FileCheck /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll
+ /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64
RUN: at line 2: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64 -stop-after=machine-outliner < /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll | /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/FileCheck /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll -check-prefix=TARGET_FEATURES
+ /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/FileCheck /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/AArch64/machine-outliner-throw.ll -check-prefix=TARGET_FEATURES
+ /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llc -verify-machineinstrs -enable-machine-outliner -mtriple=aarch64 -stop-after=machine-outliner
=================================================================
==1112042==ERROR: AddressSanitizer: use-after-poison on address 0x52100002dc30 at pc 0x5979d93c2be3 bp 0x7ffd78825d90 sp 0x7ffd78825d88
READ of size 8 at 0x52100002dc30 thread T0
    #0 0x5979d93c2be2 in getParent /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/CodeGen/MachineInstr.h:347:55
    #1 0x5979d93c2be2 in llvm::MIRPrinter::convertCalledGlobals(llvm::yaml::MachineFunction&, llvm::MachineFunction const&, llvm::MachineModuleSlotTracker&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MIRPrinter.cpp:609:25
    #2 0x5979d93b74ab in llvm::MIRPrinter::print(llvm::MachineFunction const&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MIRPrinter.cpp:275:3
    #3 0x5979d93cd392 in llvm::printMIR(llvm::raw_ostream&, llvm::MachineModuleInfo const&, llvm::MachineFunction const&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MIRPrinter.cpp:1071:11
    #4 0x5979d9417ede in (anonymous namespace)::MIRPrintingPass::runOnMachineFunction(llvm::MachineFunction&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MIRPrintingPass.cpp:65:5
    #5 0x5979d8fdef61 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:94:13
    #6 0x5979d9e9e183 in llvm::FPPassManager::runOnFunction(llvm::Function&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1406:27
    #7 0x5979d9eb523e in llvm::FPPassManager::runOnModule(llvm::Module&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1452:16
    #8 0x5979d9e9fd7a in runOnModule /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1521:27
    #9 0x5979d9e9fd7a in llvm::legacy::PassManagerImpl::run(llvm::Module&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:539:44
    #10 0x5979d408e2dc in compileModule(char**, llvm::LLVMContext&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/llc/llc.cpp:751:8
    #11 0x5979d408882f in main /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/llc/llc.cpp:411:22
    #12 0x7cb6a4a2a3b7  (/lib/x86_64-linux-gnu/libc.so.6+0x2a3b7) (BuildId: 5f3f024b472f38389da3a2f567b3d0eaa8835ca2)
    #13 0x7cb6a4a2a47a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a47a) (BuildId: 5f3f024b472f38389da3a2f567b3d0eaa8835ca2)
    #14 0x5979d3f9c0a4 in _start (/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llc+0xb4bf0a4)
0x52100002dc30 is located 2864 bytes inside of 4096-byte region [0x52100002d100,0x52100002e100)
allocated by thread T0 here:
    #0 0x5979d407bd82 in operator new(unsigned long, std::align_val_t) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/asan/asan_new_delete.cpp:98:3
    #1 0x5979db8b4e3d in llvm::allocate_buffer(unsigned long, unsigned long) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/MemAlloc.cpp:16:10
    #2 0x5979d4516e59 in Allocate /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Support/AllocatorBase.h:92:12
    #3 0x5979d4516e59 in StartNewSlab /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Support/Allocator.h:346:42
    #4 0x5979d4516e59 in llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, 4096ul, 4096ul, 128ul>::AllocateSlow(unsigned long, unsigned long, llvm::Align) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Support/Allocator.h:202:5
    #5 0x5979d8ff9ed8 in Allocate /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Support/Allocator.h:216:12
    #6 0x5979d8ff9ed8 in allocate<llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, 4096UL, 4096UL, 128UL> > /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Support/ArrayRecycler.h:130:38
    #7 0x5979d8ff9ed8 in allocateOperandArray /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/CodeGen/MachineFunction.h:1117:28
    #8 0x5979d8ff9ed8 in llvm::MachineInstr::MachineInstr(llvm::MachineFunction&, llvm::MCInstrDesc const&, llvm::DebugLoc, bool) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MachineInstr.cpp:108:19
    #9 0x5979d8fb74cc in llvm::MachineFunction::CreateMachineInstr(llvm::MCInstrDesc const&, llvm::DebugLoc, bool) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MachineFunction.cpp:433:7
    #10 0x5979d45d7a8c in llvm::BuildMI(llvm::MachineFunction&, llvm::MIMetadata const&, llvm::MCInstrDesc const&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/CodeGen/MachineInstrBuilder.h:375:37
    #11 0x5979db2175cd in llvm::InstrEmitter::EmitMachineNode(llvm::SDNode*, bool, bool, llvm::SmallDenseMap<llvm::SDValue, llvm::Register, 16u, llvm::DenseMapInfo<llvm::SDValue, void>, llvm::detail::DenseMapPair<llvm::SDValue, llvm::Register>>&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp:1060:29
    #12 0x5979db25acab in EmitNode /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.h:145:7
    #13 0x5979db25acab in llvm::ScheduleDAGSDNodes::EmitSchedule(llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>&)::$_0::operator()(llvm::SDNode*, bool, bool, llvm::SmallDenseMap<llvm::SDValue, llvm::Register, 16u, llvm::DenseMapInfo<llvm::SDValue, void>, llvm::detail::DenseMapPair<llvm::SDValue, llvm::Register>>&) const /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp:874:13
    #14 0x5979db2587c3 in llvm::ScheduleDAGSDNodes::EmitSchedule(llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp:965:9
    #15 0x5979db494dbe in llvm::SelectionDAGISel::CodeGenAndEmitDAG() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1151:42
    #16 0x5979db48d112 in llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1898:7
    #17 0x5979db4848d1 in llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:615:3
    #18 0x5979db47dd38 in llvm::SelectionDAGISelLegacy::runOnMachineFunction(llvm::MachineFunction&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:375:20
    #19 0x5979d8fdef61 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:94:13
    #20 0x5979d9e9e183 in llvm::FPPassManager::runOnFunction(llvm::Function&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1406:27
    #21 0x5979d9eb523e in llvm::FPPassManager::runOnModule(llvm::Module&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1452:16
    #22 0x5979d9e9fd7a in runOnModule /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1521:27
    #23 0x5979d9e9fd7a in llvm::legacy::PassManagerImpl::run(llvm::Module&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:539:44
    #24 0x5979d408e2dc in compileModule(char**, llvm::LLVMContext&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/llc/llc.cpp:751:8
    #25 0x5979d408882f in main /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/llc/llc.cpp:411:22
    #26 0x7cb6a4a2a3b7  (/lib/x86_64-linux-gnu/libc.so.6+0x2a3b7) (BuildId: 5f3f024b472f38389da3a2f567b3d0eaa8835ca2)
    #27 0x7cb6a4a2a47a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a47a) (BuildId: 5f3f024b472f38389da3a2f567b3d0eaa8835ca2)
    #28 0x5979d3f9c0a4 in _start (/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan_ubsan/bin/llc+0xb4bf0a4)
SUMMARY: AddressSanitizer: use-after-poison /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/CodeGen/MachineInstr.h:347:55 in getParent
Shadow bytes around the buggy address:
  0x52100002d980: 00 00 00 00 00 00 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
  0x52100002da00: f7 00 00 00 00 00 00 00 00 f7 f7 f7 f7 f7 f7 f7
  0x52100002da80: f7 f7 f7 f7 00 00 00 00 00 00 00 00 f7 f7 f7 f7
  0x52100002db00: f7 f7 f7 f7 f7 f7 f7 00 00 00 00 00 00 00 00 f7
  0x52100002db80: 00 00 00 00 00 00 00 00 00 f7 00 00 00 00 00 00
=>0x52100002dc00: 00 00 f7 f7 f7 f7[f7]f7 f7 f7 f7 f7 f7 00 00 00
  0x52100002dc80: 00 00 00 00 00 00 00 00 00 00 00 00 00 f7 00 00
  0x52100002dd00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x52100002dd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f7 00
  0x52100002de00: 00 00 00 00 00 00 00 00 f7 f7 f7 f7 f7 f7 f7 f7
  0x52100002de80: f7 f7 f7 00 00 00 00 00 00 00 00 f7 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==1112042==ABORTING

Please take a look and revert the change if it takes longer to fix.

@dpaoliello
Copy link
Contributor Author

It looks like the change introduced a use-after-poison

Investigating...

@dpaoliello
Copy link
Contributor Author

It looks like the change introduced a use-after-poison

Investigating...

I found the root cause: even though I modeled "Called Globals" after "Call Site Info", I didn't all the move/copy/erase functions that keep the Calle Site Info in-sync with changes to the instructions. It should be an easy fix since I can rename and update the existing functions.

@dpaoliello
Copy link
Contributor Author

It looks like the change introduced a use-after-poison

Investigating...

I found the root cause: even though I modeled "Called Globals" after "Call Site Info", I didn't all the move/copy/erase functions that keep the Calle Site Info in-sync with changes to the instructions. It should be an easy fix since I can rename and update the existing functions.

PR with fix: #122762

kstoimenov added a commit that referenced this pull request Jan 13, 2025
…valent to MSVC /d2ImportCallOptimization) (#121516)"

Breaks sanitizer build: https://lab.llvm.org/buildbot/#/builders/52/builds/5179

This reverts commits:
5ee0a71
d997a72
@kstoimenov
Copy link
Contributor

@dpaoliello please note this was reverted in 2f7ade4.

dpaoliello added a commit to dpaoliello/llvm-project that referenced this pull request Jan 13, 2025
…ivalent to MSVC /d2ImportCallOptimization) (llvm#121516)"

This reverts commit 2f7ade4.
kazutakahirata pushed a commit to kazutakahirata/llvm-project that referenced this pull request Jan 13, 2025
dpaoliello added a commit that referenced this pull request Jan 13, 2025
…ivalent to MSVC /d2ImportCallOptimization) (#121516)" (#122777)

This reverts commit 2f7ade4.

Fix is available in #122762
dpaoliello added a commit that referenced this pull request Jan 13, 2025
#122762)

Fixes the "use after poison" issue introduced by #121516 (see
<#121516 (comment)>).

The root cause of this issue is that #121516 introduced "Called Global"
information for call instructions modeling how "Call Site" info is
stored in the machine function, HOWEVER it didn't copy the
copy/move/erase operations for call site information.

The fix is to rename and update the existing copy/move/erase functions
so they also take care of Called Global info.
github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Jan 13, 2025
…ll Site info (#122762)

Fixes the "use after poison" issue introduced by #121516 (see
<llvm/llvm-project#121516 (comment)>).

The root cause of this issue is that #121516 introduced "Called Global"
information for call instructions modeling how "Call Site" info is
stored in the machine function, HOWEVER it didn't copy the
copy/move/erase operations for call site information.

The fix is to rename and update the existing copy/move/erase functions
so they also take care of Called Global info.
dpaoliello added a commit that referenced this pull request Jan 30, 2025
… import call optimization, and remove LLVM flag (#122831)

Switches import call optimization from being enabled by an LLVM flag to
instead using a module attribute, and creates a new Clang flag that will
set that attribute. This addresses the concern raised in the original
PR:
<#121516 (comment)>

This change also only creates the Called Global info if the module
attribute is present, addressing this concern:
<#122762 (review)>
github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Jan 30, 2025
…tribute for import call optimization, and remove LLVM flag (#122831)

Switches import call optimization from being enabled by an LLVM flag to
instead using a module attribute, and creates a new Clang flag that will
set that attribute. This addresses the concern raised in the original
PR:
<llvm/llvm-project#121516 (comment)>

This change also only creates the Called Global info if the module
attribute is present, addressing this concern:
<llvm/llvm-project#122762 (review)>
dpaoliello added a commit that referenced this pull request May 20, 2025
…ivalent to MSVC /d2guardretpoline) (#126631)

This is the x64 equivalent of #121516

Since import call optimization was originally [added to x64 Windows to
implement a more efficient retpoline
mitigation](https://techcommunity.microsoft.com/blog/windowsosplatform/mitigating-spectre-variant-2-with-retpoline-on-windows/295618)
the section and constant names relating to this all mention "retpoline"
and we need to mark indirect calls, control-flow guard calls and jumps
for jump tables in the section alongside calls to imported functions.

As with the AArch64 feature, this emits a new section into the obj which
is used by the MSVC linker to generate the Dynamic Value Relocation
Table and the section itself does not appear in the final binary.

The Windows Loader requires a specific sequence of instructions be
emitted when this feature is enabled:
* Indirect calls/jumps must have the function pointer to jump to in
`rax`.
* Calls to imported functions must use the `rex` prefix and be followed
by a 5-byte nop.
* Indirect calls must be followed by a 3-byte nop.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:AArch64 llvm:mc Machine (object) code llvm:SelectionDAG SelectionDAGISel as well platform:windows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants