-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[lld-macho] Add flag --keep-icf-stabs to LLD for MachO #93137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ With the latest revision this PR passed the C/C++ code formatter. |
549c374
to
e3033a2
Compare
@llvm/pr-subscribers-lld @llvm/pr-subscribers-lld-macho Author: None (alx32) ChangesThis change adds the Full diff: https://github.com/llvm/llvm-project/pull/93137.diff 5 Files Affected:
diff --git a/lld/MachO/Config.h b/lld/MachO/Config.h
index 7b45f7f4c39a1..96253e15f7eea 100644
--- a/lld/MachO/Config.h
+++ b/lld/MachO/Config.h
@@ -193,6 +193,7 @@ struct Configuration {
UndefinedSymbolTreatment undefinedSymbolTreatment =
UndefinedSymbolTreatment::error;
ICFLevel icfLevel = ICFLevel::none;
+ bool keepICFStabs = false;
ObjCStubsMode objcStubsMode = ObjCStubsMode::fast;
llvm::MachO::HeaderFileType outputType;
std::vector<llvm::StringRef> systemLibraryRoots;
diff --git a/lld/MachO/Driver.cpp b/lld/MachO/Driver.cpp
index d4d8d53d69eea..4ee6a907b2f46 100644
--- a/lld/MachO/Driver.cpp
+++ b/lld/MachO/Driver.cpp
@@ -1648,6 +1648,7 @@ bool link(ArrayRef<const char *> argsArr, llvm::raw_ostream &stdoutOS,
config->emitChainedFixups || args.hasArg(OPT_init_offsets);
config->emitRelativeMethodLists = shouldEmitRelativeMethodLists(args);
config->icfLevel = getICFLevel(args);
+ config->keepICFStabs = args.hasArg(OPT_keep_icf_stabs);
config->dedupStrings =
args.hasFlag(OPT_deduplicate_strings, OPT_no_deduplicate_strings, true);
config->deadStripDuplicates = args.hasArg(OPT_dead_strip_duplicates);
diff --git a/lld/MachO/Options.td b/lld/MachO/Options.td
index 11458d92b3abe..0ad487437a554 100644
--- a/lld/MachO/Options.td
+++ b/lld/MachO/Options.td
@@ -85,6 +85,9 @@ def icf_eq: Joined<["--"], "icf=">,
HelpText<"Set level for identical code folding (default: none)">,
MetaVarName<"[none,safe,all]">,
Group<grp_lld>;
+def keep_icf_stabs: Joined<["--"], "keep-icf-stabs">,
+ HelpText<"For symbols that were folded by ICF - retain them as stabs entries">,
+ Group<grp_lld>;
def lto_O: Joined<["--"], "lto-O">,
HelpText<"Set optimization level for LTO (default: 2)">,
MetaVarName<"<opt-level>">,
diff --git a/lld/MachO/SyntheticSections.cpp b/lld/MachO/SyntheticSections.cpp
index 29070810bb049..b3fe223938bf5 100644
--- a/lld/MachO/SyntheticSections.cpp
+++ b/lld/MachO/SyntheticSections.cpp
@@ -1220,15 +1220,18 @@ void SymtabSection::emitStabs() {
continue;
// Constant-folded symbols go in the executable's symbol table, but don't
- // get a stabs entry.
- if (defined->wasIdenticalCodeFolded)
+ // get a stabs entry unless --keep-icf-stabs flag is specified
+ if (!config->keepICFStabs && defined->wasIdenticalCodeFolded)
continue;
ObjFile *file = defined->getObjectFile();
if (!file || !file->compileUnit)
continue;
- symbolsNeedingStabs.emplace_back(defined, defined->isec()->getFile()->id);
+ // We use 'originalIsec' to get the file id of the symbol since 'isec()'
+ // might point to the merged ICF symbol's file
+ symbolsNeedingStabs.emplace_back(defined,
+ defined->originalIsec->getFile()->id);
}
}
@@ -1243,7 +1246,9 @@ void SymtabSection::emitStabs() {
InputFile *lastFile = nullptr;
for (SortingPair &pair : symbolsNeedingStabs) {
Defined *defined = pair.first;
- InputSection *isec = defined->isec();
+ // We use 'originalIsec' of the symbol since we care about the actual origin
+ // of the symbol, not the canonical location returned by `isec()`.
+ InputSection *isec = defined->originalIsec;
ObjFile *file = cast<ObjFile>(isec->getFile());
if (lastFile == nullptr || lastFile != file) {
@@ -1256,7 +1261,7 @@ void SymtabSection::emitStabs() {
}
StabsEntry symStab;
- symStab.sect = defined->isec()->parent->index;
+ symStab.sect = isec->parent->index;
symStab.strx = stringTableSection.addString(defined->getName());
symStab.value = defined->getVA();
diff --git a/lld/test/MachO/stabs-icf.s b/lld/test/MachO/stabs-icf.s
index 99d0871ce4d2c..5f8449809ddd1 100644
--- a/lld/test/MachO/stabs-icf.s
+++ b/lld/test/MachO/stabs-icf.s
@@ -4,6 +4,9 @@
# RUN: %lld -lSystem --icf=all %t.o -o %t
# RUN: dsymutil -s %t | FileCheck %s -DDIR=%t -DSRC_PATH=%t.o
+# RUN: %lld -lSystem --icf=all %t.o -o %t_icf_stabs --keep-icf-stabs
+# RUN: dsymutil -s %t_icf_stabs | FileCheck %s -DDIR=%t_icf_stabs -DSRC_PATH=%t.o --check-prefixes=ICF_STABS
+
## This should include no N_FUN entry for _baz (which is ICF'd into _bar),
## but it does include a SECT EXT entry.
## NOTE: We do not omit the N_FUN entry for _bar even though it is of size zero.
@@ -27,6 +30,30 @@
# CHECK-DAG: ( {{.*}}) {{[0-9]+}} 0100 0000000000000000 'dyld_stub_binder'
# CHECK-EMPTY:
+
+# ICF_STABS: (N_SO ) 00 0000 0000000000000000 '/tmp{{[/\\]}}test.cpp'
+# ICF_STABS-NEXT: (N_OSO ) 03 0001 {{.*}} '[[SRC_PATH]]'
+# ICF_STABS-NEXT: (N_FUN ) 01 0000 [[#%.16x,MAIN:]] '_main'
+# ICF_STABS-NEXT: (N_FUN ) 00 0000 000000000000000b{{$}}
+# ICF_STABS-NEXT: (N_FUN ) 01 0000 [[#%.16x,BAR:]] '_bar'
+# ICF_STABS-NEXT: (N_FUN ) 00 0000 0000000000000000{{$}}
+# ICF_STABS-NEXT: (N_FUN ) 01 0000 [[#BAR]] '_bar2'
+# ICF_STABS-NEXT: (N_FUN ) 00 0000 0000000000000001{{$}}
+# ICF_STABS-NEXT: (N_FUN ) 01 0000 [[#BAR]] '_baz'
+# ICF_STABS-NEXT: (N_FUN ) 00 0000 0000000000000000{{$}}
+# ICF_STABS-NEXT: (N_FUN ) 01 0000 [[#BAR]] '_baz2'
+# ICF_STABS-NEXT: (N_FUN ) 00 0000 0000000000000001{{$}}
+# ICF_STABS-NEXT: (N_SO ) 01 0000 0000000000000000{{$}}
+# ICF_STABS-DAG: ( SECT EXT) 01 0000 [[#MAIN]] '_main'
+# ICF_STABS-DAG: ( SECT EXT) 01 0000 [[#BAR]] '_bar'
+# ICF_STABS-DAG: ( SECT EXT) 01 0000 [[#BAR]] '_bar2'
+# ICF_STABS-DAG: ( SECT EXT) 01 0000 [[#BAR]] '_baz'
+# ICF_STABS-DAG: ( SECT EXT) 01 0000 [[#BAR]] '_baz2'
+# ICF_STABS-DAG: ( {{.*}}) {{[0-9]+}} 0010 {{[0-9a-f]+}} '__mh_execute_header'
+# ICF_STABS-DAG: ( {{.*}}) {{[0-9]+}} 0100 0000000000000000 'dyld_stub_binder'
+# ICF_STABS-EMPTY:
+
+
.text
.globl _bar, _bar2, _baz, _baz2, _main
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, potentially does it increase the final binary size (even after strip) if we use this flag?
@@ -85,6 +85,9 @@ def icf_eq: Joined<["--"], "icf=">, | |||
HelpText<"Set level for identical code folding (default: none)">, | |||
MetaVarName<"[none,safe,all]">, | |||
Group<grp_lld>; | |||
def keep_icf_stabs: Joined<["--"], "keep-icf-stabs">, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stabs
seems confusing as it appears a single word, but in fact it stands for symbol table
?
Can we say keep-icf-symbol
or something? Likewise, I'd use more descriptive word in the following, instead of using stabs entries
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flag name comes from the entries that are created - STAB entries. Not sure of the origin of the name. @int3 - do you know the meaning of 'STABS' here ? Some code here:
llvm-project/lld/MachO/SyntheticSections.cpp
Line 1159 in e3033a2
StabsEntry stab(N_SO); |
It is not referring to the symbol table itself - the symbol table will (I think always) already contain the names of the symbols as they might need to be exported, etc.
It only affects STABS entries which are for debugging only.
I am not sure of the best name but keep-icf-symbol
suggests that we would otherwise erase it from the symbol table - which is not the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears a known common word for debug info. From your link, I'm seeing this.
we read the debug map (the stabs entries) out of the executable and create an address translation map for each .o file specifying what address the symbols landed at.
So, I think you can just expand the HelpText a bit on this flag.
This flag will never increase binary size of a stripped executable - the stabs entries are for debugging only so when removing debug info from the binary the additional info will go away. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
…nces (#110192) **Summary** This patch introduces a new compiler option `-mllvm -emit-func-debug-line-table-offsets` that enables the emission of per-function line table offsets and end sequences in DWARF debug information. This enhancement allows tools and debuggers to accurately attribute line number information to their corresponding functions, even in scenarios where functions are merged or share the same address space due to optimizations like Identical Code Folding (ICF) in the linker. **Background** RFC: [New DWARF Attribute for Symbolication of Merged Functions](https://discourse.llvm.org/t/rfc-new-dwarf-attribute-for-symbolication-of-merged-functions/79434) Previous similar PR: [#93137](#93137) – This PR was very similar to the current one but at the time, the assembler had no support for emitting labels within the line table. That support was added in PR [#99710](#99710) - and in this PR we use some of the support added in the assembler PR. In the current implementation, Clang generates line information in the `debug_line` section without directly associating line entries with their originating `DW_TAG_subprogram` DIEs. This can lead to issues when post-compilation optimizations merge functions, resulting in overlapping address ranges and ambiguous line information. For example, when functions are merged by ICF in LLD, multiple functions may end up sharing the same address range. Without explicit linkage between functions and their line entries, tools cannot accurately attribute line information to the correct function, adversely affecting debugging and call stack resolution. **Implementation Details** To address the above issue, the patch makes the following key changes: **`DW_AT_LLVM_stmt_sequence` Attribute**: Introduces a new LLVM-specific attribute `DW_AT_LLVM_stmt_sequence` to each `DW_TAG_subprogram` DIE. This attribute holds a label pointing to the offset in the line table where the function's line entries begin. **End-of-Sequence Markers**: Emits an explicit DW_LNE_end_sequence after each function's line entries in the line table. This marks the end of the line information for that function, ensuring that line entries are correctly delimited. **Assembler and Streamer Modifications**: Modifies the MCStreamer and related classes to support emitting the necessary labels and tracking the current function's line entries. A new flag GenerateFuncLineTableOffsets is added to control this behavior. **Compiler Option**: Introduces the `-mllvm -emit-func-debug-line-table-offsets` option to enable this functionality, allowing users to opt-in as needed.
This change adds the
--keep-icf-stabs
which, when specified, preserves symbols that were folded by ICF in the binary's stabs entries.This allows
dsymutil
to process debug information for the folded symbols.