-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[Modules] No transitive source location change #86912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
ChuanqiXu9
merged 1 commit into
llvm:main
from
ChuanqiXu9:no-transitve-source-location-change
Apr 30, 2024
+269
−162
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,28 +6,33 @@ | |
// | ||
//===----------------------------------------------------------------------===// | ||
// | ||
// Source locations are stored pervasively in the AST, making up a third of | ||
// the size of typical serialized files. Storing them efficiently is important. | ||
// We wish to encode the SourceLocation from other module file not dependent | ||
// on the other module file. So that the source location changes from other | ||
// module file may not affect the contents of the current module file. Then the | ||
// users don't need to recompile the whole project due to a new line in a module | ||
// unit in the root of the dependency graph. | ||
// | ||
// We use integers optimized by VBR-encoding, because: | ||
// - when abbreviations cannot be used, VBR6 encoding is our only choice | ||
// - in the worst case a SourceLocation can be ~any 32-bit number, but in | ||
// practice they are highly predictable | ||
// To achieve this, we need to encode the index of the module file into the | ||
// encoding of the source location. The encoding of the source location may be: | ||
// | ||
// We encode the integer so that likely values encode as small numbers that | ||
// turn into few VBR chunks: | ||
// - the invalid sentinel location is a very common value: it encodes as 0 | ||
// - the "macro or not" bit is stored at the bottom of the integer | ||
// (rather than at the top, as in memory), so macro locations can have | ||
// small representations. | ||
// - related locations (e.g. of a left and right paren pair) are usually | ||
// similar, so when encoding a sequence of locations we store only | ||
// differences between successive elements. | ||
// |-----------------------|-----------------------| | ||
// | A | B | C | | ||
// | ||
// * A: 32 bit. The index of the module file in the module manager + 1. The +1 | ||
// here is necessary since we wish 0 stands for the current module file. | ||
// * B: 31 bit. The offset of the source location to the module file containing | ||
// it. | ||
// * C: The macro bit. We rotate it to the lowest bit so that we can save some | ||
// space in case the index of the module file is 0. | ||
// | ||
// Specially, if the index of the module file is 0, we allow to encode a | ||
// sequence of locations we store only differences between successive elements. | ||
// | ||
//===----------------------------------------------------------------------===// | ||
|
||
#include <climits> | ||
#include "clang/Basic/SourceLocation.h" | ||
#include "llvm/Support/MathExtras.h" | ||
#include <climits> | ||
|
||
#ifndef LLVM_CLANG_SERIALIZATION_SOURCELOCATIONENCODING_H | ||
#define LLVM_CLANG_SERIALIZATION_SOURCELOCATIONENCODING_H | ||
|
@@ -52,9 +57,13 @@ class SourceLocationEncoding { | |
friend SourceLocationSequence; | ||
|
||
public: | ||
static uint64_t encode(SourceLocation Loc, | ||
SourceLocationSequence * = nullptr); | ||
static SourceLocation decode(uint64_t, SourceLocationSequence * = nullptr); | ||
using RawLocEncoding = uint64_t; | ||
|
||
static RawLocEncoding encode(SourceLocation Loc, UIntTy BaseOffset, | ||
unsigned BaseModuleFileIndex, | ||
SourceLocationSequence * = nullptr); | ||
static std::pair<SourceLocation, unsigned> | ||
decode(RawLocEncoding, SourceLocationSequence * = nullptr); | ||
}; | ||
|
||
/// Serialized encoding of a sequence of SourceLocations. | ||
|
@@ -149,14 +158,44 @@ class SourceLocationSequence::State { | |
operator SourceLocationSequence *() { return &Seq; } | ||
}; | ||
|
||
inline uint64_t SourceLocationEncoding::encode(SourceLocation Loc, | ||
SourceLocationSequence *Seq) { | ||
return Seq ? Seq->encode(Loc) : encodeRaw(Loc.getRawEncoding()); | ||
inline SourceLocationEncoding::RawLocEncoding | ||
SourceLocationEncoding::encode(SourceLocation Loc, UIntTy BaseOffset, | ||
unsigned BaseModuleFileIndex, | ||
SourceLocationSequence *Seq) { | ||
// If the source location is a local source location, we can try to optimize | ||
// the similar sequences to only record the differences. | ||
if (!BaseOffset) | ||
return Seq ? Seq->encode(Loc) : encodeRaw(Loc.getRawEncoding()); | ||
|
||
if (Loc.isInvalid()) | ||
return 0; | ||
|
||
// Otherwise, the higher bits are used to store the module file index, | ||
// so it is meaningless to optimize the source locations into small | ||
// integers. Let's try to always use the raw encodings. | ||
assert(Loc.getOffset() >= BaseOffset); | ||
Loc = Loc.getLocWithOffset(-BaseOffset); | ||
RawLocEncoding Encoded = encodeRaw(Loc.getRawEncoding()); | ||
|
||
// 16 bits should be sufficient to store the module file index. | ||
assert(BaseModuleFileIndex < (1 << 16)); | ||
Encoded |= (RawLocEncoding)BaseModuleFileIndex << 32; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Were you going to change this to only reserve 16 bits for module index? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Now I assert it should be less than |
||
return Encoded; | ||
} | ||
inline SourceLocation | ||
SourceLocationEncoding::decode(uint64_t Encoded, SourceLocationSequence *Seq) { | ||
return Seq ? Seq->decode(Encoded) | ||
: SourceLocation::getFromRawEncoding(decodeRaw(Encoded)); | ||
inline std::pair<SourceLocation, unsigned> | ||
SourceLocationEncoding::decode(RawLocEncoding Encoded, | ||
SourceLocationSequence *Seq) { | ||
unsigned ModuleFileIndex = Encoded >> 32; | ||
|
||
if (!ModuleFileIndex) | ||
return {Seq ? Seq->decode(Encoded) | ||
: SourceLocation::getFromRawEncoding(decodeRaw(Encoded)), | ||
ModuleFileIndex}; | ||
|
||
Encoded &= llvm::maskTrailingOnes<RawLocEncoding>(32); | ||
SourceLocation Loc = SourceLocation::getFromRawEncoding(decodeRaw(Encoded)); | ||
|
||
return {Loc, ModuleFileIndex}; | ||
} | ||
|
||
} // namespace clang | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the -2 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot
- 2
in Serialization and SourceManager. I just tried to follow that. Sadly I didn't find clear comments explaining this. By reading the codes, I think it is trying to skip 2 dummy SLocEntries (the invalid source location and invalid expansion location, seeSourceManager::clearIDTables()
) since they are not sensitive to the context.