-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[Object][Archive][NFC] Create all symbolic files objects before calculating offsets. #85229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…lating offsets. This is refactoring preparing to move UseECMap computation to the archive writer. We currently require writeArchive caller to pass that. This is not practical for llvm-ar, which currently interprets at most one passed object. For a reliable UseECMap, we need to interpret all symbolic objects: we may have, for example, a list of x86_64 files followed by aarch64 file, which indicates that we should use EC map for x86_64 objects. This commit interprets symbolic files in a separated pass, which will be a convenient place to implement UseECMap computation in the follow up. I think it also makes accessing the next member for AIX big archive offset computation a bit easier.
@llvm/pr-subscribers-llvm-binary-utilities Author: Jacek Caban (cjacek) ChangesThis is refactoring preparing to move UseECMap computation to the archive writer. We currently require writeArchive caller to pass that. This is not practical for llvm-ar, which currently interprets at most one passed object. For a reliable UseECMap, we need to interpret all symbolic objects: we may have, for example, a list of x86_64 files followed by aarch64 file, which indicates that we should use EC map for x86_64 objects. This commit interprets symbolic files in a separated pass, which will be a convenient place to implement UseECMap computation in the follow up. I think it also makes accessing the next member for AIX big archive offset computation a bit easier. Full diff: https://github.com/llvm/llvm-project/pull/85229.diff 1 Files Affected:
diff --git a/llvm/lib/Object/ArchiveWriter.cpp b/llvm/lib/Object/ArchiveWriter.cpp
index aa57e55de70c8d..97d0e4f75be8e0 100644
--- a/llvm/lib/Object/ArchiveWriter.cpp
+++ b/llvm/lib/Object/ArchiveWriter.cpp
@@ -795,23 +795,31 @@ computeMemberData(raw_ostream &StringTable, raw_ostream &SymNames,
Entry.second = Entry.second > 1 ? 1 : 0;
}
+ std::vector<std::unique_ptr<SymbolicFile>> SymFiles;
+
+ if (NeedSymbols != SymtabWritingMode::NoSymtab || isAIXBigArchive(Kind)) {
+ for (const NewArchiveMember &M : NewMembers) {
+ Expected<std::unique_ptr<SymbolicFile>> SymFileOrErr =
+ getSymbolicFile(M.Buf->getMemBufferRef(), Context);
+ if (!SymFileOrErr)
+ return createFileError(M.MemberName, SymFileOrErr.takeError());
+ SymFiles.push_back(std::move(*SymFileOrErr));
+ }
+ }
+
// The big archive format needs to know the offset of the previous member
// header.
uint64_t PrevOffset = 0;
uint64_t NextMemHeadPadSize = 0;
- std::unique_ptr<SymbolicFile> CurSymFile;
- std::unique_ptr<SymbolicFile> NextSymFile;
- uint16_t Index = 0;
- for (auto M = NewMembers.begin(); M < NewMembers.end(); ++M) {
+ for (uint32_t Index = 0; Index < NewMembers.size(); ++Index) {
+ const NewArchiveMember *M = &NewMembers[Index];
std::string Header;
raw_string_ostream Out(Header);
MemoryBufferRef Buf = M->Buf->getMemBufferRef();
StringRef Data = Thin ? "" : Buf.getBuffer();
- Index++;
-
// ld64 expects the members to be 8-byte aligned for 64-bit content and at
// least 4-byte aligned for 32-bit content. Opt for the larger encoding
// uniformly. This matches the behaviour with cctools and ensures that ld64
@@ -837,29 +845,9 @@ computeMemberData(raw_ostream &StringTable, raw_ostream &SymNames,
std::move(StringMsg), object::object_error::parse_failed);
}
- if (NeedSymbols != SymtabWritingMode::NoSymtab || isAIXBigArchive(Kind)) {
- auto SetNextSymFile = [&NextSymFile,
- &Context](MemoryBufferRef Buf,
- StringRef MemberName) -> Error {
- Expected<std::unique_ptr<SymbolicFile>> SymFileOrErr =
- getSymbolicFile(Buf, Context);
- if (!SymFileOrErr)
- return createFileError(MemberName, SymFileOrErr.takeError());
- NextSymFile = std::move(*SymFileOrErr);
- return Error::success();
- };
-
- if (M == NewMembers.begin())
- if (Error Err = SetNextSymFile(Buf, M->MemberName))
- return std::move(Err);
-
- CurSymFile = std::move(NextSymFile);
-
- if ((M + 1) != NewMembers.end())
- if (Error Err = SetNextSymFile((M + 1)->Buf->getMemBufferRef(),
- (M + 1)->MemberName))
- return std::move(Err);
- }
+ std::unique_ptr<SymbolicFile> CurSymFile;
+ if (!SymFiles.empty())
+ CurSymFile = std::move(SymFiles[Index]);
// In the big archive file format, we need to calculate and include the next
// member offset and previous member offset in the file member header.
@@ -880,13 +868,13 @@ computeMemberData(raw_ostream &StringTable, raw_ostream &SymNames,
// If there is another member file after this, we need to calculate the
// padding before the header.
- if ((M + 1) != NewMembers.end()) {
- uint64_t OffsetToNextMemData = NextOffset +
- sizeof(object::BigArMemHdrType) +
- alignTo((M + 1)->MemberName.size(), 2);
+ if (Index + 1 != SymFiles.size()) {
+ uint64_t OffsetToNextMemData =
+ NextOffset + sizeof(object::BigArMemHdrType) +
+ alignTo(NewMembers[Index + 1].MemberName.size(), 2);
NextMemHeadPadSize =
alignToPowerOf2(OffsetToNextMemData,
- getMemberAlignment(NextSymFile.get())) -
+ getMemberAlignment(SymFiles[Index + 1].get())) -
OffsetToNextMemData;
NextOffset += NextMemHeadPadSize;
}
@@ -902,7 +890,7 @@ computeMemberData(raw_ostream &StringTable, raw_ostream &SymNames,
std::vector<unsigned> Symbols;
if (NeedSymbols != SymtabWritingMode::NoSymtab) {
Expected<std::vector<unsigned>> SymbolsOrErr =
- getSymbols(CurSymFile.get(), Index, SymNames, SymMap);
+ getSymbols(CurSymFile.get(), Index + 1, SymNames, SymMap);
if (!SymbolsOrErr)
return createFileError(M->MemberName, SymbolsOrErr.takeError());
Symbols = std::move(*SymbolsOrErr);
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really great cleanup IMHO, I found the NextSymFile
thing very confusing when dealing with this code.
I think this PR could have done with some eyes from the AIX developer (@diggerlin, I think) and myself as people previously involved (and not all that long ago) with this area of code. Taking a look at the change, I'm a little concerned about the potential impact on memory usage of this version of the PR. Previously, the code would only need one symbolic file to be in memory at a time, but now it requires all of them. I don't have any numbers, but I wouldn't be surprised in a large archive with many members with lots of symbols (not an unusual thing) if this change has drastically increased memory usage. I think it would be a good idea to do some performance analysis of the before and after of this change for particularly large archives. |
Please note that symbolic file is stored in |
Oh, you're right. I looked through this code at least 3 times before posting my comment, looking for anything like this and still didn't see it. Sorry for the noise. |
This is refactoring preparing to move UseECMap computation to the archive writer. We currently require writeArchive caller to pass that. This is not practical for llvm-ar, which currently interprets at most one passed object. For a reliable UseECMap, we need to interpret all symbolic objects: we may have, for example, a list of x86_64 files followed by aarch64 file, which indicates that we should use EC map for x86_64 objects.
This commit interprets symbolic files in a separated pass, which will be a convenient place to implement UseECMap computation in the follow up. I think it also makes accessing the next member for AIX big archive offset computation a bit easier.