-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[ELF] Fix unnecessary inclusion of unreferenced provide symbols #84512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write If you have received no comments on your PR for a week, you can request a review If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-lld @llvm/pr-subscribers-lld-elf Author: None (partaror) ChangesPreviously, linker was unnecessarily including a PROVIDE symbol which was referenced by another unused PROVIDE symbol. For example, if a linker script contained the below code and 'not_used_sym' provide symbol is not included, then linker was still unnecessarily including 'foo' PROVIDE symbol because it was referenced by 'not_used_sym'. This commit fixes this behavior. PROVIDE(not_used_sym = foo) This commit fixes this behavior by using dfs-like algorithm to find all the symbols referenced in provide expressions of included provide symbols. Closes #74771 Full diff: https://github.com/llvm/llvm-project/pull/84512.diff 4 Files Affected:
diff --git a/lld/ELF/Driver.cpp b/lld/ELF/Driver.cpp
index 24faa1753f1e3d..fff81d17755945 100644
--- a/lld/ELF/Driver.cpp
+++ b/lld/ELF/Driver.cpp
@@ -2659,6 +2659,56 @@ static void postParseObjectFile(ELFFileBase *file) {
}
}
+// Returns true if the provide symbol should be added to the link.
+bool shouldAddProvideSym(StringRef symName) {
+ Symbol *b = symtab.find(symName);
+ return b && !b->isDefined() && !b->isCommon();
+}
+
+// Add symbols referred by the provide symbol to the symbol table.
+// This function must only be called for provide symbols that should be added
+// to the link.
+static void
+addProvideSymReferences(StringRef provideSym,
+ llvm::StringSet<> &addedRefsFromProvideSym) {
+
+ if (addedRefsFromProvideSym.count(provideSym))
+ return;
+ assert((shouldAddProvideSym(provideSym) || !symtab.find(provideSym)) &&
+ "This function must only be called for provide symbols that should be "
+ "added to the link.");
+ addedRefsFromProvideSym.insert(provideSym);
+ for (StringRef name : script->provideMap[provideSym].keys()) {
+ script->referencedSymbols.push_back(name);
+ if (script->provideMap.count(name) &&
+ (shouldAddProvideSym(name) || !symtab.find(name)) &&
+ !addedRefsFromProvideSym.count(name))
+ addProvideSymReferences(name, addedRefsFromProvideSym);
+ }
+}
+
+// Add symbols that are referenced in the linker script.
+// Symbols referenced in a PROVIDE command are only added to the symbol table if
+// the PROVIDE command actually provides the symbol.
+static void addScriptReferencedSymbols() {
+ // Keeps track of references from which PROVIDE symbols have been added to the
+ // symbol table.
+ llvm::StringSet<> addedRefsFromProvideSym;
+ for (StringRef provideSym : script->provideMap.keys()) {
+ if (!addedRefsFromProvideSym.count(provideSym) &&
+ shouldAddProvideSym(provideSym))
+ addProvideSymReferences(provideSym, addedRefsFromProvideSym);
+ }
+
+ // Some symbols (such as __ehdr_start) are defined lazily only when there
+ // are undefined symbols for them, so we add these to trigger that logic.
+ for (StringRef name : script->referencedSymbols) {
+ Symbol *sym = addUnusedUndefined(name);
+ sym->isUsedInRegularObj = true;
+ sym->referenced = true;
+ }
+}
+
// Do actual linking. Note that when this function is called,
// all linker scripts have already been parsed.
void LinkerDriver::link(opt::InputArgList &args) {
@@ -2735,13 +2785,7 @@ void LinkerDriver::link(opt::InputArgList &args) {
config->hasDynSymTab =
!ctx.sharedFiles.empty() || config->isPic || config->exportDynamic;
- // Some symbols (such as __ehdr_start) are defined lazily only when there
- // are undefined symbols for them, so we add these to trigger that logic.
- for (StringRef name : script->referencedSymbols) {
- Symbol *sym = addUnusedUndefined(name);
- sym->isUsedInRegularObj = true;
- sym->referenced = true;
- }
+ addScriptReferencedSymbols();
// Prevent LTO from removing any definition referenced by -u.
for (StringRef name : config->undefined)
diff --git a/lld/ELF/LinkerScript.h b/lld/ELF/LinkerScript.h
index 18eaf58b785e37..9cd2d3f273d70f 100644
--- a/lld/ELF/LinkerScript.h
+++ b/lld/ELF/LinkerScript.h
@@ -16,7 +16,9 @@
#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/MapVector.h"
+#include "llvm/ADT/StringMap.h"
#include "llvm/ADT/StringRef.h"
+#include "llvm/ADT/StringSet.h"
#include "llvm/Support/Compiler.h"
#include <cstddef>
#include <cstdint>
@@ -379,6 +381,14 @@ class LinkerScript final {
// Sections that will be warned/errored by --orphan-handling.
SmallVector<const InputSectionBase *, 0> orphanSections;
+
+ // Stores the mapping: provide symbol -> symbols referred in the provide
+ // expression. For example, if the PROVIDE command is:
+ //
+ // PROVIDE(v = a + b + c);
+ //
+ // then provideMap should contain the mapping: 'v' -> ['a', 'b', 'c']
+ llvm::StringMap<llvm::StringSet<>> provideMap;
};
LLVM_LIBRARY_VISIBILITY extern std::unique_ptr<LinkerScript> script;
diff --git a/lld/ELF/ScriptParser.cpp b/lld/ELF/ScriptParser.cpp
index 3bb1de99480f30..e9953f44156e31 100644
--- a/lld/ELF/ScriptParser.cpp
+++ b/lld/ELF/ScriptParser.cpp
@@ -36,6 +36,7 @@
#include "llvm/Support/TimeProfiler.h"
#include <cassert>
#include <limits>
+#include <optional>
#include <vector>
using namespace llvm;
@@ -138,6 +139,10 @@ class ScriptParser final : ScriptLexer {
// A set to detect an INCLUDE() cycle.
StringSet<> seen;
+
+ // If we are currently parsing a PROVIDE|PROVIDE_HIDDEN command,
+ // then this member is set to the provide symbol name.
+ std::optional<llvm::StringRef> activeProvideSym;
};
} // namespace
@@ -1055,6 +1060,9 @@ SymbolAssignment *ScriptParser::readProvideHidden(bool provide, bool hidden) {
;
return nullptr;
}
+ llvm::SaveAndRestore saveActiveProvideSym(activeProvideSym);
+ if (provide)
+ activeProvideSym = name;
SymbolAssignment *cmd = readSymbolAssignment(name);
cmd->provide = provide;
cmd->hidden = hidden;
@@ -1570,7 +1578,10 @@ Expr ScriptParser::readPrimary() {
tok = unquote(tok);
else if (!isValidSymbolName(tok))
setError("malformed number: " + tok);
- script->referencedSymbols.push_back(tok);
+ if (activeProvideSym)
+ script->provideMap[activeProvideSym.value()].insert(tok);
+ else
+ script->referencedSymbols.push_back(tok);
return [=] { return script->getSymbolValue(tok, location); };
}
diff --git a/lld/test/ELF/linkerscript/symbolreferenced.s b/lld/test/ELF/linkerscript/symbolreferenced.s
index ba7a7721ea9697..587446b1dcb52a 100644
--- a/lld/test/ELF/linkerscript/symbolreferenced.s
+++ b/lld/test/ELF/linkerscript/symbolreferenced.s
@@ -23,9 +23,16 @@
# CHECK: 0000000000001000 a f1
# CHECK-NEXT: 0000000000001000 A f2
-# CHECK-NEXT: 0000000000001000 a g1
-# CHECK-NEXT: 0000000000001000 A g2
+# CHECK-NEXT: 0000000000001000 A f3
+# CHECK-NEXT: 0000000000001000 A f4
+# CHECK-NEXT: 0000000000001000 A f5
+# CHECK-NEXT: 0000000000001000 A f6
+# CHECK-NEXT: 0000000000001000 A f7
# CHECK-NEXT: 0000000000001000 A newsym
+# CHECK-NOT: g1
+# CHECK-NOT: g2
+# CHECK-NOT: unused
+# CHECK-NOT: another_unused
# RUN: not ld.lld -T chain2.t a.o 2>&1 | FileCheck %s --check-prefix=ERR --implicit-check-not=error:
# ERR-COUNT-3: error: chain2.t:1: symbol not found: undef
@@ -40,13 +47,19 @@ patatino:
movl newsym, %eax
#--- chain.t
-PROVIDE(f2 = 0x1000);
+PROVIDE(f7 = 0x1000);
+PROVIDE(f5 = f6);
+PROVIDE(f6 = f7);
+PROVIDE(f4 = f5);
+PROVIDE(f3 = f4);
+PROVIDE(f2 = f3);
PROVIDE_HIDDEN(f1 = f2);
PROVIDE(newsym = f1);
PROVIDE(g2 = 0x1000);
PROVIDE_HIDDEN(g1 = g2);
PROVIDE(unused = g1);
+PROVIDE_HIDDEN(another_unused = g1);
#--- chain2.t
PROVIDE(f2 = undef);
|
1b58d0d
to
51c2939
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch. The code structure looks really good.
lld/ELF/Driver.cpp
Outdated
// Keeps track of references from which PROVIDE symbols have been added to the | ||
// symbol table. | ||
llvm::StringSet<> addedRefsFromProvideSym; | ||
for (StringRef provideSym : script->provideMap.keys()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://llvm.org/docs/CodingStandards.html#beware-of-non-determinism-due-to-ordering-of-pointers
Consider MapVector<StringRef, SmallVector<StringRef,0> >
(the inner data structure doesn't have to be a Set, which is expensive)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the comment. I did not think about non-determinism previously. I have now changed the code to use MapVector<StringRef, SmallVector<StringRef, 0>>
.
I want to add one thing: Set is relatively more expensive than vector, however, it can help in avoiding more expensive computations. For example, currently, we store symbols referenced in the linker script in SmallVector<llvm::StringRef, 0> LinkerScript::referencedSymbols;
. We have to do some processing on each symbol present in this list, such as, adding the symbol to the symbol table. If a symbol S
is referenced 10 times in the linker script, then it will be present in referencedSymbols
10 times as well and therefore it will be processed 10 times. In such a case, would it be better to use Set instead of a vector?
51c2939
to
7928b14
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
7928b14
to
535ab47
Compare
Gentle ping for review @MaskRay |
lld/ELF/SymbolTable.cpp
Outdated
|
||
Symbol *SymbolTable::addUnusedUndefined(StringRef name, uint8_t binding) { | ||
return addSymbol(Undefined{ctx.internalFile, name, binding, STV_DEFAULT, 0}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ensure EOL at file end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for catching it. Should we add insertNewlineAtEOF option to .clang-format
?
lld/ELF/LinkerScript.cpp
Outdated
static void | ||
addProvideSymReferences(StringRef provideSym, | ||
llvm::StringSet<> &addedRefsFromProvideSym) { | ||
assert(LinkerScript::shouldAddProvideSym(provideSym) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can define the static function as a lambda in addScriptReferencedSymbolsToSymTable
instead so that you can just reference the local variable addedRefsFromProvideSym
.
Then this assert doesn't seem that useful since the code is very short.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This static function is recursive. Lambdas cannot be recursive without using more complicated mechanism such as std::function
or adding an explicit parameter that would refer to the lambda in the lambda call operator. These techniques are explained here: Recursive lambda functions in C++14
Please give your suggestion regarding what should we do here. std::function
has performance cost due to the use of type-erasure and passing the lambda to the lambda call (for ex, L(L, x, ...)
) leads to (slightly) ugly code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am writing a version that uses a stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See https://github.com/MaskRay/llvm-project/tree/my-provide
Feel free to cherry pick it into this patch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for providing the stack-based implementation. I have cherry-picked your commit into this patch.
34a77c8
to
16fad83
Compare
Previously, linker was unnecessarily including a PROVIDE symbol which was referenced by another unused PROVIDE symbol. For example, if a linker script contained the below code and 'not_used_sym' provide symbol is not included, then linker was still unnecessarily including 'foo' PROVIDE symbol because it was referenced by 'not_used_sym'. This commit fixes this behavior. PROVIDE(not_used_sym = foo) PROVIDE(foo = 0x1000) This commit fixes this behavior by using dfs-like algorithm to find all the symbols referenced in provide expressions of included provide symbols. This commit also fixes the issue of unused section not being garbage-collected if a symbol of the section is referenced by an unused PROVIDE symbol. Closes llvm#74771 Closes llvm#84730 Co-authored-by: Fangrui Song <[email protected]>
16fad83
to
65fd9d7
Compare
Congratulations on the first patch:) Very well prepared. I am landing this patch manually as you have a hidden email address. This repository prefers a non-hidden email address |
Landed as ebb326a |
Thank you :).
Thank you for informing about this. I will fix the hidden-email issue. |
Hi @partaror , I think this commit has caused some link failures in edge cases. I have two related examples here which didn't fail before this commit landed, and do fail immediately after it. Input file
Input file
Input file
Link script
Link script
Build commands:
Both links fail with errors like this:
My analysis: In both of these links, the entry point is in the object file In the In the |
edit: Thanks for the report. I am testing a patch. |
When archive member extraction involving ENTRY happens after `addScriptReferencedSymbolsToSymTable`, `addScriptReferencedSymbolsToSymTable` may fail to define some PROVIDE symbols used by ENTRY. This is an edge case that regressed after llvm#84512. (The interaction with PROVIDE and ENTRY-in-archive was not considered before). Fixes: ebb326a
@statham-arm Created #87530 . Will it address your issue? |
Hi @statham-arm Thank you for reporting the issue and detailed analysis. @MaskRay has the fix ready. Thank you @MaskRay for the quick resolution. |
When archive member extraction involving ENTRY happens after `addScriptReferencedSymbolsToSymTable`, `addScriptReferencedSymbolsToSymTable` may fail to define some PROVIDE symbols used by ENTRY. This is an edge case that regressed after #84512. (The interaction with PROVIDE and ENTRY-in-archive was not considered before). While here, also ensure that --undefined-glob extracted object files are parsed before `addScriptReferencedSymbolsToSymTable`. Fixes: ebb326a Pull Request: #87530
Previously, linker was unnecessarily including a PROVIDE symbol which was referenced by another unused PROVIDE symbol. For example, if a linker script contained the below code and 'not_used_sym' provide symbol is not included, then linker was still unnecessarily including 'foo' PROVIDE symbol because it was referenced by 'not_used_sym'. This commit fixes this behavior.
PROVIDE(not_used_sym = foo)
PROVIDE(foo = 0x1000)
This commit fixes this behavior by using dfs-like algorithm to find all the symbols referenced in provide expressions of included provide symbols.
This commit also fixes the issue of unused section not being garbage-collected if a symbol of the section is referenced by an unused PROVIDE symbol.
Closes #74771
Closes #84730