Skip to content

[ms] [llvm-ml] Add support for @CatStr built-in function symbol #130781

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 24, 2025

Conversation

ericastor
Copy link
Contributor

@ericastor ericastor commented Mar 11, 2025

MASM supports some built-in macro-type functions.

We start our support for these with @CatStr, one of the more commonly used.

@llvmbot llvmbot added the mc Machine (object) code label Mar 11, 2025
@ericastor ericastor changed the title [ms] [llvm-ml] Add support for @CatStr built-in function symbol [ms] [llvm-ml] Add support for @CatStr built-in function symbol Mar 11, 2025
@llvmbot
Copy link
Member

llvmbot commented Mar 11, 2025

@llvm/pr-subscribers-mc

Author: Eric Astor (ericastor)

Changes

MASM supports some built-in macro-type functions.

We start our support for these with @<!-- -->CatStr, one of the more commonly used.


Full diff: https://github.com/llvm/llvm-project/pull/130781.diff

2 Files Affected:

  • (modified) llvm/lib/MC/MCParser/MasmParser.cpp (+89-9)
  • (modified) llvm/test/tools/llvm-ml/macro_function.asm (+9-3)
diff --git a/llvm/lib/MC/MCParser/MasmParser.cpp b/llvm/lib/MC/MCParser/MasmParser.cpp
index c1a451a7119d5..e6123e6e74e43 100644
--- a/llvm/lib/MC/MCParser/MasmParser.cpp
+++ b/llvm/lib/MC/MCParser/MasmParser.cpp
@@ -778,6 +778,19 @@ class MasmParser : public MCAsmParser {
   std::optional<std::string> evaluateBuiltinTextMacro(BuiltinSymbol Symbol,
                                                       SMLoc StartLoc);
 
+  // Generic (target and platform independent) directive parsing.
+  enum BuiltinFunction {
+    BI_NO_FUNCTION, // Placeholder
+    BI_CATSTR,
+  };
+
+  /// Maps builtin name --> BuiltinFunction enum, for builtins handled by this
+  /// class.
+  StringMap<BuiltinFunction> BuiltinFunctionMap;
+
+  bool evaluateBuiltinMacroFunction(BuiltinFunction Function, StringRef Name,
+                                    std::string &Res);
+
   // ".ascii", ".asciz", ".string"
   bool parseDirectiveAscii(StringRef IDVal, bool ZeroTerminated);
 
@@ -959,7 +972,7 @@ class MasmParser : public MCAsmParser {
   bool parseDirectiveEcho(SMLoc DirectiveLoc);
 
   void initializeDirectiveKindMap();
-  void initializeBuiltinSymbolMap();
+  void initializeBuiltinSymbolMaps();
 };
 
 } // end anonymous namespace
@@ -999,7 +1012,7 @@ MasmParser::MasmParser(SourceMgr &SM, MCContext &Ctx, MCStreamer &Out,
 
   initializeDirectiveKindMap();
   PlatformParser->Initialize(*this);
-  initializeBuiltinSymbolMap();
+  initializeBuiltinSymbolMaps();
 
   NumOfMacroInstantiations = 0;
 }
@@ -1084,15 +1097,25 @@ bool MasmParser::expandMacros() {
   }
 
   std::optional<std::string> ExpandedValue;
-  auto BuiltinIt = BuiltinSymbolMap.find(IDLower);
-  if (BuiltinIt != BuiltinSymbolMap.end()) {
+
+  if (auto BuiltinIt = BuiltinSymbolMap.find(IDLower);
+      BuiltinIt != BuiltinSymbolMap.end()) {
     ExpandedValue =
         evaluateBuiltinTextMacro(BuiltinIt->getValue(), Tok.getLoc());
-  } else {
-    auto VarIt = Variables.find(IDLower);
-    if (VarIt != Variables.end() && VarIt->getValue().IsText) {
-      ExpandedValue = VarIt->getValue().TextValue;
+  } else if (auto BuiltinFuncIt = BuiltinFunctionMap.find(IDLower);
+             BuiltinFuncIt != BuiltinFunctionMap.end()) {
+    StringRef Name;
+    if (parseIdentifier(Name)) {
+      return true;
+    }
+    std::string Res;
+    if (evaluateBuiltinMacroFunction(BuiltinFuncIt->getValue(), Name, Res)) {
+      return true;
     }
+    ExpandedValue = Res;
+  } else if (auto VarIt = Variables.find(IDLower);
+             VarIt != Variables.end() && VarIt->getValue().IsText) {
+    ExpandedValue = VarIt->getValue().TextValue;
   }
 
   if (!ExpandedValue)
@@ -3197,6 +3220,18 @@ bool MasmParser::parseTextItem(std::string &Data) {
         continue;
       }
 
+      // Try to resolve as a built-in macro function
+      auto BuiltinFuncIt = BuiltinFunctionMap.find(ID.lower());
+      if (BuiltinFuncIt != BuiltinFunctionMap.end()) {
+        Data.clear();
+        if (evaluateBuiltinMacroFunction(BuiltinFuncIt->getValue(), ID, Data)) {
+          return true;
+        }
+        ID = StringRef(Data);
+        Expanded = true;
+        continue;
+      }
+
       // Try to resolve as a variable text macro
       auto VarIt = Variables.find(ID.lower());
       if (VarIt != Variables.end()) {
@@ -6204,7 +6239,7 @@ bool MasmParser::parseMSInlineAsm(
   return false;
 }
 
-void MasmParser::initializeBuiltinSymbolMap() {
+void MasmParser::initializeBuiltinSymbolMaps() {
   // Numeric built-ins (supported in all versions)
   BuiltinSymbolMap["@version"] = BI_VERSION;
   BuiltinSymbolMap["@line"] = BI_LINE;
@@ -6216,6 +6251,9 @@ void MasmParser::initializeBuiltinSymbolMap() {
   BuiltinSymbolMap["@filename"] = BI_FILENAME;
   BuiltinSymbolMap["@curseg"] = BI_CURSEG;
 
+  // Function built-ins (supported in all versions)
+  BuiltinFunctionMap["@catstr"] = BI_CATSTR;
+
   // Some built-ins exist only for MASM32 (32-bit x86)
   if (getContext().getSubtargetInfo()->getTargetTriple().getArch() ==
       Triple::x86) {
@@ -6289,6 +6327,48 @@ MasmParser::evaluateBuiltinTextMacro(BuiltinSymbol Symbol, SMLoc StartLoc) {
   llvm_unreachable("unhandled built-in symbol");
 }
 
+bool MasmParser::evaluateBuiltinMacroFunction(BuiltinFunction Function,
+                                              StringRef Name,
+                                              std::string &Res) {
+  if (parseToken(AsmToken::LParen, "invoking macro function '" + Name +
+                                       "' requires arguments in parentheses")) {
+    return true;
+  }
+
+  MCAsmMacroParameters P;
+  switch (Function) {
+  default:
+    return true;
+  case BI_CATSTR:
+    break;
+  }
+  MCAsmMacro M(Name, "", P, {}, true);
+
+  MCAsmMacroArguments A;
+  if (parseMacroArguments(&M, A, AsmToken::RParen) || parseRParen()) {
+    return true;
+  }
+
+  switch (Function) {
+  default:
+    llvm_unreachable("unhandled built-in function");
+  case BI_CATSTR: {
+    for (const MCAsmMacroArgument &Arg : A) {
+      for (const AsmToken &Tok : Arg) {
+        if (Tok.is(AsmToken::String)) {
+          Res.append(Tok.getStringContents());
+        } else {
+          Res.append(Tok.getString());
+        }
+      }
+    }
+    return false;
+  }
+  }
+  llvm_unreachable("unhandled built-in function");
+  return true;
+}
+
 /// Create an MCAsmParser instance.
 MCAsmParser *llvm::createMCMasmParser(SourceMgr &SM, MCContext &C,
                                       MCStreamer &Out, const MCAsmInfo &MAI,
diff --git a/llvm/test/tools/llvm-ml/macro_function.asm b/llvm/test/tools/llvm-ml/macro_function.asm
index c28d7c8c6222c..475c8b52dce79 100644
--- a/llvm/test/tools/llvm-ml/macro_function.asm
+++ b/llvm/test/tools/llvm-ml/macro_function.asm
@@ -103,14 +103,20 @@ expr_recursive_test PROC
   ret
 expr_recursive_test ENDP
 
+expand_as_directive_test @CatStr(P, RO, C)
+; CHECK-LABEL: expand_as_directive_test:
+
+  ret
+expand_as_directive_test ENDP
+
 custom_strcat MACRO arg1, arg2
   EXITM <arg1&arg2>
 ENDM
 
-expand_as_directive_test custom_strcat(P, ROC)
-; CHECK-LABEL: expand_as_directive_test:
+expand_as_directive_custom_test custom_strcat(P, ROC)
+; CHECK-LABEL: expand_as_directive_custom_test:
 
   ret
-expand_as_directive_test ENDP
+expand_as_directive_custom_test ENDP
 
 end

MASM supports some built-in functions, including @catstr to concatenate arbitrary string arguments.
@MisterDA
Copy link

MisterDA commented Mar 19, 2025

With this patch applied, the following macro crashes the assembler:

        index_a = 2
SubstitutionMacro MACRO field:REQ
        EXITM @CatStr(<[r14+>, %(index_&field), <*8]>)
ENDM

        .CODE
        mov SubstitutionMacro(a), r15
        END

It seems that the + and * operands have disappeared. Maybe there's just an off-by-one error, or a token that's been erroneously consumed.

$ llvm-project/build/bin/llvm-ml -m64 -nologo -c -Fo test.obj test.asm
Included from <instantiation>:2:
<instantiation>:1:1: error: unable to parse text item in 'EXITM' directive
r14%(my_struct_a)8
^
test.asm:7:13: note: while in macro instantiation
        mov SubstitutionMacro(a), r15
            ^
Assertion failed: (EndStatementAtEOFStack.empty()), function Lex, file MasmParser.cpp, line 1193.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /Users/user/llvm-project/build/bin/llvm-ml -m64 -nologo -c -Fo test.obj test.asm
 #0 0x0000000100fca168 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/Users/user/llvm-project/build/bin/llvm-ml+0x1006aa168)
 #1 0x0000000100fc88b0 llvm::sys::RunSignalHandlers() (/Users/user/llvm-project/build/bin/llvm-ml+0x1006a88b0)
 #2 0x0000000100fca8f0 SignalHandler(int, __siginfo*, void*) (/Users/user/llvm-project/build/bin/llvm-ml+0x1006aa8f0)
 #3 0x0000000192b9ade4 (/usr/lib/system/libsystem_platform.dylib+0x180482de4)
 #4 0x0000000192b63f70 (/usr/lib/system/libsystem_pthread.dylib+0x18044bf70)
 #5 0x0000000192a70908 (/usr/lib/system/libsystem_c.dylib+0x180358908)
 #6 0x0000000192a6fc1c (/usr/lib/system/libsystem_c.dylib+0x180357c1c)
 #7 0x0000000100f682a0 (anonymous namespace)::MasmParser::Lex((anonymous namespace)::MasmParser::ExpandKind) (.cold.6) (/Users/user/llvm-project/build/bin/llvm-ml+0x1006482a0)
 #8 0x0000000100f5bd9c (anonymous namespace)::MasmParser::Lex((anonymous namespace)::MasmParser::ExpandKind) (/Users/user/llvm-project/build/bin/llvm-ml+0x10063bd9c)
 #9 0x0000000100ab99a0 (anonymous namespace)::X86AsmParser::parseInstruction(llvm::ParseInstructionInfo&, llvm::StringRef, llvm::SMLoc, llvm::SmallVectorImpl<std::__1::unique_ptr<llvm::MCParsedAsmOperand, std::__1::default_delete<llvm::MCParsedAsmOperand>>>&) (/Users/user/llvm-project/build/bin/llvm-ml+0x1001999a0)
#10 0x0000000100f50a10 (anonymous namespace)::MasmParser::parseStatement((anonymous namespace)::ParseStatementInfo&, llvm::MCAsmParserSemaCallback*) (/Users/user/llvm-project/build/bin/llvm-ml+0x100630a10)
#11 0x0000000100f49c60 (anonymous namespace)::MasmParser::Run(bool, bool) (/Users/user/llvm-project/build/bin/llvm-ml+0x100629c60)
#12 0x0000000100924348 AssembleInput(llvm::StringRef, llvm::Target const*, llvm::SourceMgr&, llvm::MCContext&, llvm::MCStreamer&, llvm::MCAsmInfo&, llvm::MCSubtargetInfo&, llvm::MCInstrInfo&, llvm::MCTargetOptions&, llvm::opt::ArgList const&) (/Users/user/llvm-project/build/bin/llvm-ml+0x100004348)
#13 0x0000000100923484 llvm_ml_main(int, char**, llvm::ToolContext const&) (/Users/user/Tarides/llvm-project/build/bin/llvm-ml+0x100003484)
#14 0x0000000100925d54 main (/Users/user/llvm-project/build/bin/llvm-ml+0x100005d54)
#15 0x00000001927e4274
[1]    81581 abort      ~/llvm-project/build/bin/llvm-ml -m64 -nologo -c -Fo test.obj test.as

ml64 accepts it:

$ ml64 -nologo -Cp -c -Fo test.obj -W3 test.asm
 Assembling: test.asm
$ objdump -D --section='.text$mn' test.obj

test.obj:     file format pe-x86-64


Disassembly of section .text$mn:

0000000000000000 <.text$mn>:
   0:   4d 89 7e 10             mov    %r15,0x10(%r14)

@ericastor
Copy link
Contributor Author

ericastor commented Mar 24, 2025

@MisterDA:

With this patch applied, the following macro crashes the assembler:

...

This is absolutely correct - but turns out to be an existing issue with our handling of angle-bracket strings in macro calls, which you can see if you use a "custom" CatStr implementation before this change goes in. I'm going to fix this in a separate PR, if that's alright.

@ericastor ericastor merged commit 0ab330b into llvm:main Apr 24, 2025
11 checks passed
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
…vm#130781)

MASM supports some built-in macro-type functions.

We start our support for these with `@CatStr`, one of the more commonly used.
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
…vm#130781)

MASM supports some built-in macro-type functions.

We start our support for these with `@CatStr`, one of the more commonly used.
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
…vm#130781)

MASM supports some built-in macro-type functions.

We start our support for these with `@CatStr`, one of the more commonly used.
Ankur-0429 pushed a commit to Ankur-0429/llvm-project that referenced this pull request May 9, 2025
…vm#130781)

MASM supports some built-in macro-type functions.

We start our support for these with `@CatStr`, one of the more commonly used.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mc Machine (object) code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants