Skip to content

[lld-macho] Add flag --keep-icf-stabs to LLD for MachO #93137

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 28, 2024

Conversation

alx32
Copy link
Contributor

@alx32 alx32 commented May 23, 2024

This change adds the --keep-icf-stabs which, when specified, preserves symbols that were folded by ICF in the binary's stabs entries.
This allows dsymutil to process debug information for the folded symbols.

@alx32 alx32 requested review from int3, ellishg and kyulee-com May 23, 2024 05:21
Copy link

github-actions bot commented May 23, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@alx32 alx32 force-pushed the 02_lld_keep_icf_in_stabs branch from 549c374 to e3033a2 Compare May 23, 2024 05:25
@alx32 alx32 marked this pull request as ready for review May 23, 2024 05:26
@llvmbot
Copy link
Member

llvmbot commented May 23, 2024

@llvm/pr-subscribers-lld

@llvm/pr-subscribers-lld-macho

Author: None (alx32)

Changes

This change adds the --keep-icf-stabs which, when specified, preserves symbols that were folded by ICF in the binary's stabs entries.
This allows dsymutil to process debug information for the folded symbols.


Full diff: https://github.com/llvm/llvm-project/pull/93137.diff

5 Files Affected:

  • (modified) lld/MachO/Config.h (+1)
  • (modified) lld/MachO/Driver.cpp (+1)
  • (modified) lld/MachO/Options.td (+3)
  • (modified) lld/MachO/SyntheticSections.cpp (+10-5)
  • (modified) lld/test/MachO/stabs-icf.s (+27)
diff --git a/lld/MachO/Config.h b/lld/MachO/Config.h
index 7b45f7f4c39a1..96253e15f7eea 100644
--- a/lld/MachO/Config.h
+++ b/lld/MachO/Config.h
@@ -193,6 +193,7 @@ struct Configuration {
   UndefinedSymbolTreatment undefinedSymbolTreatment =
       UndefinedSymbolTreatment::error;
   ICFLevel icfLevel = ICFLevel::none;
+  bool keepICFStabs = false;
   ObjCStubsMode objcStubsMode = ObjCStubsMode::fast;
   llvm::MachO::HeaderFileType outputType;
   std::vector<llvm::StringRef> systemLibraryRoots;
diff --git a/lld/MachO/Driver.cpp b/lld/MachO/Driver.cpp
index d4d8d53d69eea..4ee6a907b2f46 100644
--- a/lld/MachO/Driver.cpp
+++ b/lld/MachO/Driver.cpp
@@ -1648,6 +1648,7 @@ bool link(ArrayRef<const char *> argsArr, llvm::raw_ostream &stdoutOS,
       config->emitChainedFixups || args.hasArg(OPT_init_offsets);
   config->emitRelativeMethodLists = shouldEmitRelativeMethodLists(args);
   config->icfLevel = getICFLevel(args);
+  config->keepICFStabs = args.hasArg(OPT_keep_icf_stabs);
   config->dedupStrings =
       args.hasFlag(OPT_deduplicate_strings, OPT_no_deduplicate_strings, true);
   config->deadStripDuplicates = args.hasArg(OPT_dead_strip_duplicates);
diff --git a/lld/MachO/Options.td b/lld/MachO/Options.td
index 11458d92b3abe..0ad487437a554 100644
--- a/lld/MachO/Options.td
+++ b/lld/MachO/Options.td
@@ -85,6 +85,9 @@ def icf_eq: Joined<["--"], "icf=">,
     HelpText<"Set level for identical code folding (default: none)">,
     MetaVarName<"[none,safe,all]">,
     Group<grp_lld>;
+def keep_icf_stabs: Joined<["--"], "keep-icf-stabs">,
+    HelpText<"For symbols that were folded by ICF - retain them as stabs entries">,
+    Group<grp_lld>;
 def lto_O: Joined<["--"], "lto-O">,
     HelpText<"Set optimization level for LTO (default: 2)">,
     MetaVarName<"<opt-level>">,
diff --git a/lld/MachO/SyntheticSections.cpp b/lld/MachO/SyntheticSections.cpp
index 29070810bb049..b3fe223938bf5 100644
--- a/lld/MachO/SyntheticSections.cpp
+++ b/lld/MachO/SyntheticSections.cpp
@@ -1220,15 +1220,18 @@ void SymtabSection::emitStabs() {
         continue;
 
       // Constant-folded symbols go in the executable's symbol table, but don't
-      // get a stabs entry.
-      if (defined->wasIdenticalCodeFolded)
+      // get a stabs entry unless --keep-icf-stabs flag is specified
+      if (!config->keepICFStabs && defined->wasIdenticalCodeFolded)
         continue;
 
       ObjFile *file = defined->getObjectFile();
       if (!file || !file->compileUnit)
         continue;
 
-      symbolsNeedingStabs.emplace_back(defined, defined->isec()->getFile()->id);
+      // We use 'originalIsec' to get the file id of the symbol since 'isec()'
+      // might point to the merged ICF symbol's file
+      symbolsNeedingStabs.emplace_back(defined,
+                                       defined->originalIsec->getFile()->id);
     }
   }
 
@@ -1243,7 +1246,9 @@ void SymtabSection::emitStabs() {
   InputFile *lastFile = nullptr;
   for (SortingPair &pair : symbolsNeedingStabs) {
     Defined *defined = pair.first;
-    InputSection *isec = defined->isec();
+    // We use 'originalIsec' of the symbol since we care about the actual origin
+    // of the symbol, not the canonical location returned by `isec()`.
+    InputSection *isec = defined->originalIsec;
     ObjFile *file = cast<ObjFile>(isec->getFile());
 
     if (lastFile == nullptr || lastFile != file) {
@@ -1256,7 +1261,7 @@ void SymtabSection::emitStabs() {
     }
 
     StabsEntry symStab;
-    symStab.sect = defined->isec()->parent->index;
+    symStab.sect = isec->parent->index;
     symStab.strx = stringTableSection.addString(defined->getName());
     symStab.value = defined->getVA();
 
diff --git a/lld/test/MachO/stabs-icf.s b/lld/test/MachO/stabs-icf.s
index 99d0871ce4d2c..5f8449809ddd1 100644
--- a/lld/test/MachO/stabs-icf.s
+++ b/lld/test/MachO/stabs-icf.s
@@ -4,6 +4,9 @@
 # RUN: %lld -lSystem --icf=all %t.o -o %t
 # RUN: dsymutil -s %t | FileCheck %s -DDIR=%t -DSRC_PATH=%t.o
 
+# RUN: %lld -lSystem --icf=all %t.o -o %t_icf_stabs --keep-icf-stabs
+# RUN: dsymutil -s %t_icf_stabs | FileCheck %s -DDIR=%t_icf_stabs -DSRC_PATH=%t.o --check-prefixes=ICF_STABS
+
 ## This should include no N_FUN entry for _baz (which is ICF'd into _bar),
 ## but it does include a SECT EXT entry.
 ## NOTE: We do not omit the N_FUN entry for _bar even though it is of size zero.
@@ -27,6 +30,30 @@
 # CHECK-DAG:  (       {{.*}}) {{[0-9]+}}                 0100   0000000000000000   'dyld_stub_binder'
 # CHECK-EMPTY:
 
+
+# ICF_STABS:      (N_SO         ) 00      0000   0000000000000000  '/tmp{{[/\\]}}test.cpp'
+# ICF_STABS-NEXT: (N_OSO        ) 03      0001   {{.*}} '[[SRC_PATH]]'
+# ICF_STABS-NEXT: (N_FUN        ) 01      0000   [[#%.16x,MAIN:]]  '_main'
+# ICF_STABS-NEXT: (N_FUN        ) 00      0000   000000000000000b{{$}}
+# ICF_STABS-NEXT: (N_FUN        ) 01      0000   [[#%.16x,BAR:]]   '_bar'
+# ICF_STABS-NEXT: (N_FUN        ) 00      0000   0000000000000000{{$}}
+# ICF_STABS-NEXT: (N_FUN        ) 01      0000   [[#BAR]]          '_bar2'
+# ICF_STABS-NEXT: (N_FUN        ) 00      0000   0000000000000001{{$}}
+# ICF_STABS-NEXT: (N_FUN        ) 01      0000   [[#BAR]]          '_baz'
+# ICF_STABS-NEXT: (N_FUN        ) 00      0000   0000000000000000{{$}}
+# ICF_STABS-NEXT: (N_FUN        ) 01      0000   [[#BAR]]          '_baz2'
+# ICF_STABS-NEXT: (N_FUN        ) 00      0000   0000000000000001{{$}}
+# ICF_STABS-NEXT: (N_SO         ) 01      0000   0000000000000000{{$}}
+# ICF_STABS-DAG:  (     SECT EXT) 01      0000   [[#MAIN]]         '_main'
+# ICF_STABS-DAG:  (     SECT EXT) 01      0000   [[#BAR]]          '_bar'
+# ICF_STABS-DAG:  (     SECT EXT) 01      0000   [[#BAR]]          '_bar2'
+# ICF_STABS-DAG:  (     SECT EXT) 01      0000   [[#BAR]]          '_baz'
+# ICF_STABS-DAG:  (     SECT EXT) 01      0000   [[#BAR]]          '_baz2'
+# ICF_STABS-DAG:  (       {{.*}}) {{[0-9]+}}                 0010   {{[0-9a-f]+}}      '__mh_execute_header'
+# ICF_STABS-DAG:  (       {{.*}}) {{[0-9]+}}                 0100   0000000000000000   'dyld_stub_binder'
+# ICF_STABS-EMPTY:
+
+
 .text
 .globl _bar, _bar2, _baz, _baz2, _main
 

Copy link
Contributor

@kyulee-com kyulee-com left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, potentially does it increase the final binary size (even after strip) if we use this flag?

@@ -85,6 +85,9 @@ def icf_eq: Joined<["--"], "icf=">,
HelpText<"Set level for identical code folding (default: none)">,
MetaVarName<"[none,safe,all]">,
Group<grp_lld>;
def keep_icf_stabs: Joined<["--"], "keep-icf-stabs">,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stabs seems confusing as it appears a single word, but in fact it stands for symbol table?
Can we say keep-icf-symbol or something? Likewise, I'd use more descriptive word in the following, instead of using stabs entries below.

Copy link
Contributor Author

@alx32 alx32 May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flag name comes from the entries that are created - STAB entries. Not sure of the origin of the name. @int3 - do you know the meaning of 'STABS' here ? Some code here:

StabsEntry stab(N_SO);
and some docs here: https://wiki.dwarfstd.org/Apple's_%22Lazy%22_DWARF_Scheme.md

It is not referring to the symbol table itself - the symbol table will (I think always) already contain the names of the symbols as they might need to be exported, etc.
It only affects STABS entries which are for debugging only.
I am not sure of the best name but keep-icf-symbol suggests that we would otherwise erase it from the symbol table - which is not the case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears a known common word for debug info. From your link, I'm seeing this.

we read the debug map (the stabs entries) out of the executable and create an address translation map for each .o file specifying what address the symbols landed at.

So, I think you can just expand the HelpText a bit on this flag.

@alx32
Copy link
Contributor Author

alx32 commented May 23, 2024

So, potentially does it increase the final binary size (even after strip) if we use this flag?

This flag will never increase binary size of a stripped executable - the stabs entries are for debugging only so when removing debug info from the binary the additional info will go away.

Copy link
Contributor

@kyulee-com kyulee-com left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@alx32 alx32 merged commit 5eea4f4 into llvm:main May 28, 2024
5 checks passed
alx32 added a commit that referenced this pull request Nov 14, 2024
…nces (#110192)

**Summary**

This patch introduces a new compiler option `-mllvm
-emit-func-debug-line-table-offsets` that enables the emission of
per-function line table offsets and end sequences in DWARF debug
information. This enhancement allows tools and debuggers to accurately
attribute line number information to their corresponding functions, even
in scenarios where functions are merged or share the same address space
due to optimizations like Identical Code Folding (ICF) in the linker.

**Background**
RFC: [New DWARF Attribute for Symbolication of Merged
Functions](https://discourse.llvm.org/t/rfc-new-dwarf-attribute-for-symbolication-of-merged-functions/79434)

Previous similar PR:
[#93137](#93137) – This PR was
very similar to the current one but at the time, the assembler had no
support for emitting labels within the line table. That support was
added in PR [#99710](#99710) -
and in this PR we use some of the support added in the assembler PR.

In the current implementation, Clang generates line information in the
`debug_line` section without directly associating line entries with
their originating `DW_TAG_subprogram` DIEs. This can lead to issues when
post-compilation optimizations merge functions, resulting in overlapping
address ranges and ambiguous line information.

For example, when functions are merged by ICF in LLD, multiple functions
may end up sharing the same address range. Without explicit linkage
between functions and their line entries, tools cannot accurately
attribute line information to the correct function, adversely affecting
debugging and call stack resolution.


**Implementation Details**
To address the above issue, the patch makes the following key changes:

**`DW_AT_LLVM_stmt_sequence` Attribute**: Introduces a new LLVM-specific
attribute `DW_AT_LLVM_stmt_sequence` to each `DW_TAG_subprogram` DIE.
This attribute holds a label pointing to the offset in the line table
where the function's line entries begin.

**End-of-Sequence Markers**: Emits an explicit DW_LNE_end_sequence after
each function's line entries in the line table. This marks the end of
the line information for that function, ensuring that line entries are
correctly delimited.

**Assembler and Streamer Modifications**: Modifies the MCStreamer and
related classes to support emitting the necessary labels and tracking
the current function's line entries. A new flag
GenerateFuncLineTableOffsets is added to control this behavior.

**Compiler Option**: Introduces the `-mllvm
-emit-func-debug-line-table-offsets` option to enable this
functionality, allowing users to opt-in as needed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants