-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[serialization] No transitive type change #92511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-modules Author: Chuanqi Xu (ChuanqiXu9) ChangesFollowing of #92085. motivationThe motivation is still cutting of the unnecessary change in the dependency chain. See the above link (recursively) for details. And this will be the last patch of the exampleAfter this patch, finally we are able to cut of unnecessary change of types. For example,
The BMI of implementation detailsThe key idea of this patch is similar with the previous patches: extend the 32bits type ID to 64bits so that we can store the module file index in the higher bits. Then the encoding of the type ID is independent on the imported modules. But there are two differences from the previous patches:
The first difference make we have some more slightly complex bit operations. And the second difference makes the patch much simpler than the previous ones. Patch is 28.70 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/92511.diff 11 Files Affected:
diff --git a/clang/include/clang/Serialization/ASTBitCodes.h b/clang/include/clang/Serialization/ASTBitCodes.h
index 1fd482b5aff0e..486d5f4042c61 100644
--- a/clang/include/clang/Serialization/ASTBitCodes.h
+++ b/clang/include/clang/Serialization/ASTBitCodes.h
@@ -26,6 +26,7 @@
#include "clang/Serialization/SourceLocationEncoding.h"
#include "llvm/ADT/DenseMapInfo.h"
#include "llvm/Bitstream/BitCodes.h"
+#include "llvm/Support/MathExtras.h"
#include <cassert>
#include <cstdint>
@@ -70,38 +71,53 @@ using DeclID = DeclIDBase::DeclID;
/// An ID number that refers to a type in an AST file.
///
-/// The ID of a type is partitioned into two parts: the lower
+/// The ID of a type is partitioned into three parts:
+/// - the lower
/// three bits are used to store the const/volatile/restrict
-/// qualifiers (as with QualType) and the upper bits provide a
-/// type index. The type index values are partitioned into two
+/// qualifiers (as with QualType).
+/// - the upper 29 bits provide a type index in the corresponding
+/// module file.
+/// - the upper 32 bits provide a module file index.
+///
+/// The type index values are partitioned into two
/// sets. The values below NUM_PREDEF_TYPE_IDs are predefined type
/// IDs (based on the PREDEF_TYPE_*_ID constants), with 0 as a
/// placeholder for "no type". Values from NUM_PREDEF_TYPE_IDs are
/// other types that have serialized representations.
-using TypeID = uint32_t;
+using TypeID = uint64_t;
/// A type index; the type ID with the qualifier bits removed.
+/// Keep structure alignment 32-bit since the blob is assumed as 32-bit
+/// aligned.
class TypeIdx {
+ uint32_t ModuleFileIndex = 0;
uint32_t Idx = 0;
public:
TypeIdx() = default;
- explicit TypeIdx(uint32_t index) : Idx(index) {}
+ explicit TypeIdx(uint32_t Idx) : ModuleFileIndex(0), Idx(Idx) {}
+
+ explicit TypeIdx(uint32_t ModuleFileIdx, uint32_t Idx)
+ : ModuleFileIndex(ModuleFileIdx), Idx(Idx) {}
+
+ uint32_t getModuleFileIndex() const { return ModuleFileIndex; }
- uint32_t getIndex() const { return Idx; }
+ uint64_t getValue() const { return ((uint64_t)ModuleFileIndex << 32) | Idx; }
TypeID asTypeID(unsigned FastQuals) const {
if (Idx == uint32_t(-1))
return TypeID(-1);
- return (Idx << Qualifiers::FastWidth) | FastQuals;
+ unsigned Index = (Idx << Qualifiers::FastWidth) | FastQuals;
+ return ((uint64_t)ModuleFileIndex << 32) | Index;
}
static TypeIdx fromTypeID(TypeID ID) {
if (ID == TypeID(-1))
return TypeIdx(-1);
- return TypeIdx(ID >> Qualifiers::FastWidth);
+ return TypeIdx(ID >> 32, (ID & llvm::maskTrailingOnes<TypeID>(32)) >>
+ Qualifiers::FastWidth);
}
};
diff --git a/clang/include/clang/Serialization/ASTReader.h b/clang/include/clang/Serialization/ASTReader.h
index 1da9123280f26..1c7ad2ea08314 100644
--- a/clang/include/clang/Serialization/ASTReader.h
+++ b/clang/include/clang/Serialization/ASTReader.h
@@ -487,14 +487,6 @@ class ASTReader
/// ID = (I + 1) << FastQual::Width has already been loaded
llvm::PagedVector<QualType> TypesLoaded;
- using GlobalTypeMapType =
- ContinuousRangeMap<serialization::TypeID, ModuleFile *, 4>;
-
- /// Mapping from global type IDs to the module in which the
- /// type resides along with the offset that should be added to the
- /// global type ID to produce a local ID.
- GlobalTypeMapType GlobalTypeMap;
-
/// Declarations that have already been loaded from the chain.
///
/// When the pointer at index I is non-NULL, the declaration with ID
@@ -1420,8 +1412,8 @@ class ASTReader
RecordLocation(ModuleFile *M, uint64_t O) : F(M), Offset(O) {}
};
- QualType readTypeRecord(unsigned Index);
- RecordLocation TypeCursorForIndex(unsigned Index);
+ QualType readTypeRecord(serialization::TypeID ID);
+ RecordLocation TypeCursorForIndex(serialization::TypeID ID);
void LoadedDecl(unsigned Index, Decl *D);
Decl *ReadDeclRecord(GlobalDeclID ID);
void markIncompleteDeclChain(Decl *D);
@@ -1533,6 +1525,11 @@ class ASTReader
std::pair<ModuleFile *, unsigned>
translateIdentifierIDToIndex(serialization::IdentifierID ID) const;
+ /// Translate an \param TypeID ID to the index of TypesLoaded
+ /// array and the corresponding module file.
+ std::pair<ModuleFile *, unsigned>
+ translateTypeIDToIndex(serialization::TypeID ID) const;
+
public:
/// Load the AST file and validate its contents against the given
/// Preprocessor.
@@ -1881,10 +1878,11 @@ class ASTReader
QualType GetType(serialization::TypeID ID);
/// Resolve a local type ID within a given AST file into a type.
- QualType getLocalType(ModuleFile &F, unsigned LocalID);
+ QualType getLocalType(ModuleFile &F, serialization::TypeID LocalID);
/// Map a local type ID within a given AST file into a global type ID.
- serialization::TypeID getGlobalTypeID(ModuleFile &F, unsigned LocalID) const;
+ serialization::TypeID getGlobalTypeID(ModuleFile &F,
+ serialization::TypeID LocalID) const;
/// Read a type from the current position in the given record, which
/// was read from the given AST file.
@@ -1906,6 +1904,7 @@ class ASTReader
/// if the declaration is not from a module file.
ModuleFile *getOwningModuleFile(const Decl *D) const;
ModuleFile *getOwningModuleFile(GlobalDeclID ID) const;
+ ModuleFile *getOwningModuleFile(serialization::TypeID ID) const;
/// Returns the source location for the decl \p ID.
SourceLocation getSourceLocationForDeclID(GlobalDeclID ID);
diff --git a/clang/include/clang/Serialization/ASTRecordReader.h b/clang/include/clang/Serialization/ASTRecordReader.h
index d00fb182f05f4..2561418b78ca7 100644
--- a/clang/include/clang/Serialization/ASTRecordReader.h
+++ b/clang/include/clang/Serialization/ASTRecordReader.h
@@ -163,7 +163,7 @@ class ASTRecordReader
void readTypeLoc(TypeLoc TL, LocSeq *Seq = nullptr);
/// Map a local type ID within a given AST file to a global type ID.
- serialization::TypeID getGlobalTypeID(unsigned LocalID) const {
+ serialization::TypeID getGlobalTypeID(serialization::TypeID LocalID) const {
return Reader->getGlobalTypeID(*F, LocalID);
}
diff --git a/clang/include/clang/Serialization/ModuleFile.h b/clang/include/clang/Serialization/ModuleFile.h
index 075a07a81d637..84d0585eaceba 100644
--- a/clang/include/clang/Serialization/ModuleFile.h
+++ b/clang/include/clang/Serialization/ModuleFile.h
@@ -482,9 +482,6 @@ class ModuleFile {
/// the global type ID space.
serialization::TypeID BaseTypeIndex = 0;
- /// Remapping table for type IDs in this module.
- ContinuousRangeMap<uint32_t, int, 2> TypeRemap;
-
// === Miscellaneous ===
/// Diagnostic IDs and their mappings that the user changed.
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index b4f2bd182e473..d4783f9e6156b 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -3357,20 +3357,11 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile &F,
"duplicate TYPE_OFFSET record in AST file");
F.TypeOffsets = reinterpret_cast<const UnalignedUInt64 *>(Blob.data());
F.LocalNumTypes = Record[0];
- unsigned LocalBaseTypeIndex = Record[1];
F.BaseTypeIndex = getTotalNumTypes();
- if (F.LocalNumTypes > 0) {
- // Introduce the global -> local mapping for types within this module.
- GlobalTypeMap.insert(std::make_pair(getTotalNumTypes(), &F));
-
- // Introduce the local -> global mapping for types within this module.
- F.TypeRemap.insertOrReplace(
- std::make_pair(LocalBaseTypeIndex,
- F.BaseTypeIndex - LocalBaseTypeIndex));
-
+ if (F.LocalNumTypes > 0)
TypesLoaded.resize(TypesLoaded.size() + F.LocalNumTypes);
- }
+
break;
}
@@ -4033,7 +4024,6 @@ void ASTReader::ReadModuleOffsetMap(ModuleFile &F) const {
RemapBuilder PreprocessedEntityRemap(F.PreprocessedEntityRemap);
RemapBuilder SubmoduleRemap(F.SubmoduleRemap);
RemapBuilder SelectorRemap(F.SelectorRemap);
- RemapBuilder TypeRemap(F.TypeRemap);
auto &ImportedModuleVector = F.DependentModules;
assert(ImportedModuleVector.empty());
@@ -4069,8 +4059,6 @@ void ASTReader::ReadModuleOffsetMap(ModuleFile &F) const {
endian::readNext<uint32_t, llvm::endianness::little>(Data);
uint32_t SelectorIDOffset =
endian::readNext<uint32_t, llvm::endianness::little>(Data);
- uint32_t TypeIndexOffset =
- endian::readNext<uint32_t, llvm::endianness::little>(Data);
auto mapOffset = [&](uint32_t Offset, uint32_t BaseOffset,
RemapBuilder &Remap) {
@@ -4085,7 +4073,6 @@ void ASTReader::ReadModuleOffsetMap(ModuleFile &F) const {
PreprocessedEntityRemap);
mapOffset(SubmoduleIDOffset, OM->BaseSubmoduleID, SubmoduleRemap);
mapOffset(SelectorIDOffset, OM->BaseSelectorID, SelectorRemap);
- mapOffset(TypeIndexOffset, OM->BaseTypeIndex, TypeRemap);
}
}
@@ -5064,12 +5051,12 @@ void ASTReader::InitializeContext() {
// Load the special types.
if (SpecialTypes.size() >= NumSpecialTypeIDs) {
- if (unsigned String = SpecialTypes[SPECIAL_TYPE_CF_CONSTANT_STRING]) {
+ if (TypeID String = SpecialTypes[SPECIAL_TYPE_CF_CONSTANT_STRING]) {
if (!Context.CFConstantStringTypeDecl)
Context.setCFConstantStringType(GetType(String));
}
- if (unsigned File = SpecialTypes[SPECIAL_TYPE_FILE]) {
+ if (TypeID File = SpecialTypes[SPECIAL_TYPE_FILE]) {
QualType FileType = GetType(File);
if (FileType.isNull()) {
Error("FILE type is NULL");
@@ -5090,7 +5077,7 @@ void ASTReader::InitializeContext() {
}
}
- if (unsigned Jmp_buf = SpecialTypes[SPECIAL_TYPE_JMP_BUF]) {
+ if (TypeID Jmp_buf = SpecialTypes[SPECIAL_TYPE_JMP_BUF]) {
QualType Jmp_bufType = GetType(Jmp_buf);
if (Jmp_bufType.isNull()) {
Error("jmp_buf type is NULL");
@@ -5111,7 +5098,7 @@ void ASTReader::InitializeContext() {
}
}
- if (unsigned Sigjmp_buf = SpecialTypes[SPECIAL_TYPE_SIGJMP_BUF]) {
+ if (TypeID Sigjmp_buf = SpecialTypes[SPECIAL_TYPE_SIGJMP_BUF]) {
QualType Sigjmp_bufType = GetType(Sigjmp_buf);
if (Sigjmp_bufType.isNull()) {
Error("sigjmp_buf type is NULL");
@@ -5129,25 +5116,24 @@ void ASTReader::InitializeContext() {
}
}
- if (unsigned ObjCIdRedef
- = SpecialTypes[SPECIAL_TYPE_OBJC_ID_REDEFINITION]) {
+ if (TypeID ObjCIdRedef = SpecialTypes[SPECIAL_TYPE_OBJC_ID_REDEFINITION]) {
if (Context.ObjCIdRedefinitionType.isNull())
Context.ObjCIdRedefinitionType = GetType(ObjCIdRedef);
}
- if (unsigned ObjCClassRedef
- = SpecialTypes[SPECIAL_TYPE_OBJC_CLASS_REDEFINITION]) {
+ if (TypeID ObjCClassRedef =
+ SpecialTypes[SPECIAL_TYPE_OBJC_CLASS_REDEFINITION]) {
if (Context.ObjCClassRedefinitionType.isNull())
Context.ObjCClassRedefinitionType = GetType(ObjCClassRedef);
}
- if (unsigned ObjCSelRedef
- = SpecialTypes[SPECIAL_TYPE_OBJC_SEL_REDEFINITION]) {
+ if (TypeID ObjCSelRedef =
+ SpecialTypes[SPECIAL_TYPE_OBJC_SEL_REDEFINITION]) {
if (Context.ObjCSelRedefinitionType.isNull())
Context.ObjCSelRedefinitionType = GetType(ObjCSelRedef);
}
- if (unsigned Ucontext_t = SpecialTypes[SPECIAL_TYPE_UCONTEXT_T]) {
+ if (TypeID Ucontext_t = SpecialTypes[SPECIAL_TYPE_UCONTEXT_T]) {
QualType Ucontext_tType = GetType(Ucontext_t);
if (Ucontext_tType.isNull()) {
Error("ucontext_t type is NULL");
@@ -6632,10 +6618,8 @@ void ASTReader::ReadPragmaDiagnosticMappings(DiagnosticsEngine &Diag) {
}
/// Get the correct cursor and offset for loading a type.
-ASTReader::RecordLocation ASTReader::TypeCursorForIndex(unsigned Index) {
- GlobalTypeMapType::iterator I = GlobalTypeMap.find(Index);
- assert(I != GlobalTypeMap.end() && "Corrupted global type map");
- ModuleFile *M = I->second;
+ASTReader::RecordLocation ASTReader::TypeCursorForIndex(TypeID ID) {
+ auto [M, Index] = translateTypeIDToIndex(ID);
return RecordLocation(M, M->TypeOffsets[Index - M->BaseTypeIndex].get() +
M->DeclsBlockStartOffset);
}
@@ -6656,10 +6640,10 @@ static std::optional<Type::TypeClass> getTypeClassForCode(TypeCode code) {
/// routine actually reads the record corresponding to the type at the given
/// location. It is a helper routine for GetType, which deals with reading type
/// IDs.
-QualType ASTReader::readTypeRecord(unsigned Index) {
+QualType ASTReader::readTypeRecord(TypeID ID) {
assert(ContextObj && "reading type with no AST context");
ASTContext &Context = *ContextObj;
- RecordLocation Loc = TypeCursorForIndex(Index);
+ RecordLocation Loc = TypeCursorForIndex(ID);
BitstreamCursor &DeclsCursor = Loc.F->DeclsCursor;
// Keep track of where we are in the stream, then jump back there
@@ -7100,14 +7084,25 @@ TypeSourceInfo *ASTRecordReader::readTypeSourceInfo() {
return TInfo;
}
+std::pair<ModuleFile *, unsigned>
+ASTReader::translateTypeIDToIndex(serialization::TypeID ID) const {
+ unsigned Index =
+ (ID & llvm::maskTrailingOnes<TypeID>(32)) >> Qualifiers::FastWidth;
+
+ ModuleFile *OwningModuleFile = getOwningModuleFile(ID);
+ assert(OwningModuleFile &&
+ "untranslated type ID or local type ID shouldn't be in TypesLoaded");
+ return {OwningModuleFile, OwningModuleFile->BaseTypeIndex + Index};
+}
+
QualType ASTReader::GetType(TypeID ID) {
assert(ContextObj && "reading type with no AST context");
ASTContext &Context = *ContextObj;
unsigned FastQuals = ID & Qualifiers::FastMask;
- unsigned Index = ID >> Qualifiers::FastWidth;
- if (Index < NUM_PREDEF_TYPE_IDS) {
+ if (uint64_t Index = ID >> Qualifiers::FastWidth;
+ Index < NUM_PREDEF_TYPE_IDS) {
QualType T;
switch ((PredefinedTypeIDs)Index) {
case PREDEF_TYPE_LAST_ID:
@@ -7376,10 +7371,11 @@ QualType ASTReader::GetType(TypeID ID) {
return T.withFastQualifiers(FastQuals);
}
- Index -= NUM_PREDEF_TYPE_IDS;
+ unsigned Index = translateTypeIDToIndex(ID).second;
+
assert(Index < TypesLoaded.size() && "Type index out-of-range");
if (TypesLoaded[Index].isNull()) {
- TypesLoaded[Index] = readTypeRecord(Index);
+ TypesLoaded[Index] = readTypeRecord(ID);
if (TypesLoaded[Index].isNull())
return QualType();
@@ -7392,27 +7388,28 @@ QualType ASTReader::GetType(TypeID ID) {
return TypesLoaded[Index].withFastQualifiers(FastQuals);
}
-QualType ASTReader::getLocalType(ModuleFile &F, unsigned LocalID) {
+QualType ASTReader::getLocalType(ModuleFile &F, TypeID LocalID) {
return GetType(getGlobalTypeID(F, LocalID));
}
-serialization::TypeID
-ASTReader::getGlobalTypeID(ModuleFile &F, unsigned LocalID) const {
- unsigned FastQuals = LocalID & Qualifiers::FastMask;
- unsigned LocalIndex = LocalID >> Qualifiers::FastWidth;
-
- if (LocalIndex < NUM_PREDEF_TYPE_IDS)
+serialization::TypeID ASTReader::getGlobalTypeID(ModuleFile &F,
+ TypeID LocalID) const {
+ if ((LocalID >> Qualifiers::FastWidth) < NUM_PREDEF_TYPE_IDS)
return LocalID;
if (!F.ModuleOffsetMap.empty())
ReadModuleOffsetMap(F);
- ContinuousRangeMap<uint32_t, int, 2>::iterator I
- = F.TypeRemap.find(LocalIndex - NUM_PREDEF_TYPE_IDS);
- assert(I != F.TypeRemap.end() && "Invalid index into type index remap");
+ unsigned ModuleFileIndex = LocalID >> 32;
+ LocalID &= llvm::maskTrailingOnes<TypeID>(32);
- unsigned GlobalIndex = LocalIndex + I->second;
- return (GlobalIndex << Qualifiers::FastWidth) | FastQuals;
+ if (ModuleFileIndex == 0)
+ LocalID -= NUM_PREDEF_TYPE_IDS << Qualifiers::FastWidth;
+
+ ModuleFile &MF =
+ ModuleFileIndex ? *F.DependentModules[ModuleFileIndex - 1] : F;
+ ModuleFileIndex = MF.Index + 1;
+ return ((uint64_t)ModuleFileIndex << 32) | LocalID;
}
TemplateArgumentLocInfo
@@ -7650,6 +7647,16 @@ ModuleFile *ASTReader::getOwningModuleFile(GlobalDeclID ID) const {
return &getModuleManager()[ModuleFileIndex - 1];
}
+ModuleFile *ASTReader::getOwningModuleFile(TypeID ID) const {
+ if (ID < NUM_PREDEF_TYPE_IDS)
+ return nullptr;
+
+ uint64_t ModuleFileIndex = ID >> 32;
+ assert(ModuleFileIndex && "Untranslated Local Decl?");
+
+ return &getModuleManager()[ModuleFileIndex - 1];
+}
+
ModuleFile *ASTReader::getOwningModuleFile(const Decl *D) const {
if (!D->isFromASTFile())
return nullptr;
@@ -8167,7 +8174,6 @@ LLVM_DUMP_METHOD void ASTReader::dump() {
llvm::errs() << "*** PCH/ModuleFile Remappings:\n";
dumpModuleIDMap("Global bit offset map", GlobalBitOffsetsMap);
dumpModuleIDMap("Global source location entry map", GlobalSLocEntryMap);
- dumpModuleIDMap("Global type map", GlobalTypeMap);
dumpModuleIDMap("Global macro map", GlobalMacroMap);
dumpModuleIDMap("Global submodule map", GlobalSubmoduleMap);
dumpModuleIDMap("Global selector map", GlobalSelectorMap);
diff --git a/clang/lib/Serialization/ASTWriter.cpp b/clang/lib/Serialization/ASTWriter.cpp
index 0008eaab42cdf..4a0c9425f6902 100644
--- a/clang/lib/Serialization/ASTWriter.cpp
+++ b/clang/lib/Serialization/ASTWriter.cpp
@@ -3263,17 +3263,18 @@ void ASTWriter::WritePragmaDiagnosticMappings(const DiagnosticsEngine &Diag,
/// Write the representation of a type to the AST stream.
void ASTWriter::WriteType(QualType T) {
TypeIdx &IdxRef = TypeIdxs[T];
- if (IdxRef.getIndex() == 0) // we haven't seen this type before.
+ if (IdxRef.getValue() == 0) // we haven't seen this type before.
IdxRef = TypeIdx(NextTypeID++);
TypeIdx Idx = IdxRef;
- assert(Idx.getIndex() >= FirstTypeID && "Re-writing a type from a prior AST");
+ assert(Idx.getModuleFileIndex() == 0 && "Re-writing a type from a prior AST");
+ assert(Idx.getValue() >= FirstTypeID && "Writing predefined type");
// Emit the type's representation.
uint64_t Offset = ASTTypeWriter(*this).write(T) - DeclTypesBlockStartOffset;
// Record the offset for this type.
- unsigned Index = Idx.getIndex() - FirstTypeID;
+ uint64_t Index = Idx.getValue() - FirstTypeID;
if (TypeOffsets.size() == Index)
TypeOffsets.emplace_back(Offset);
else if (TypeOffsets.size() < Index) {
@@ -3346,12 +3347,10 @@ void ASTWriter::WriteTypeDeclOffsets() {
auto Abbrev = std::make_shared<BitCodeAbbrev>();
Abbrev->Add(BitCodeAbbrevOp(TYPE_OFFSET));
Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 32)); // # of types
- Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 32)); // base type index
Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob)); // types block
unsigned TypeOffsetAbbrev = Stream.EmitAbbrev(std::move(Abbrev));
{
- RecordData::value_type Record[] = {TYPE_OFFSET, TypeOffsets.size(),
- FirstTypeID - NUM_PREDEF_TYPE_IDS};
+ RecordData::value_type Record[] = {TYPE_OFFSET, TypeOffsets.size()};
Stream.EmitRecordWithBlob(TypeOffsetAbbrev, Record, bytes(TypeOffsets));
}
@@ -5428,7 +5427,6 @@ ASTFileSignature ASTWriter::WriteASTCore(Sema &SemaRef, StringRef isysroot,
M.NumPreprocessedEntities);
writeBaseIDOrNone(M.BaseSubmoduleID, M.LocalNumSubmodules);
writeBaseIDOrNone(M.BaseSelectorID, M.LocalNumSelectors);
- writeBaseIDOrNone(M.BaseTypeIndex, M.LocalNumTypes);
}
}
RecordData::value_type Record[] = {MODULE_OFFSET_MAP};
@@ -6093,7 +6091,7 @@ TypeID ASTWriter::GetOrCreateTypeID(QualType T) {
assert(!T.getLocalFastQualifiers());
TypeIdx &Idx = TypeIdxs[T];
- if (Idx.getIndex() == 0) {
+ if (Idx.getValue() == 0) {
if (DoneWritingDeclsAndTypes) {
assert(0 && "New type seen after serializing all the types to emit!");
return TypeIdx();
@@ -6608,11 +6606,9 @@ void ASTWriter::ReaderInitialized(ASTReader *Reader) {
// Note, this will get called multiple times, once one the reader starts up
// and again each time it's done reading a PCH or module.
- FirstTypeID = NUM_PREDEF_TYPE_IDS + Chain->getTotalNumTypes();
FirstMacroID = NUM_PREDEF_MACRO_IDS + Chain->getTotalNumMacros();
FirstSubmoduleID = NUM_PREDEF_SUBMODULE_IDS + Chain->getTotalNumSubmodules();
FirstSelect...
[truncated]
|
@jansvoboda11 @Bigcheese gentle ping |
c612b56
to
a2fb7f5
Compare
2265f12
to
1a83e2b
Compare
a2fb7f5
to
e8f756e
Compare
1a83e2b
to
57cfb2b
Compare
@jansvoboda11 @ilya-biryukov would you like to take look? |
I am happy to take a look, but will run out of time for it today. I will definitely do it tomorrow, though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the change! I have done a round of review and left a few suggestion and also have a bunch of questions. I am sorry if some of them may look too obvious, I did try to dig into the code where I could, but Clang's serialization is complicated and some things are easier researched through a conversation than through looking at code. Please bear with me, I promise to get better in the upcoming reviews.
Here are a few extra question that came to mind too.
Question 1: Do we have estimates on how much bigger the PCMs become? Did you observer any negative performance impact?
I would expect TypeID
s to be much more common than decls, and the corresponding size increases to be more significant. We've already lost ~6% from DeclID change, I am slightly worried the types are going to be a bigger hit.
Question 2: could you explain why we the PCM in your example was changing before? Was there some base offset inherited from the PCM files we depended on?
|
||
// Otherwise, keep the highest ID since the module file comes later has | ||
// higher module file indexes. | ||
if (Idx.getValue() >= StoredIdx.getValue()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (Idx.getValue() >= StoredIdx.getValue()) | |
if (Idx.getModuleFileIndex() >= StoredIdx.getModuleFileIndex()) |
This seems to align better with what the comment for the code are saying. If this does not work, I might be missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, done.
clang/test/Modules/pr59999.cppm
Outdated
// RUN: rm -rf %t | ||
// RUN: mkdir -p %t | ||
// RUN: split-file %s %t | ||
// RUNX: rm -rf %t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these leftovers from previous versions of the patch that need to be reverted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry. This is something left during debugging. I will remove it.
@@ -7392,27 +7388,28 @@ QualType ASTReader::GetType(TypeID ID) { | |||
return TypesLoaded[Index].withFastQualifiers(FastQuals); | |||
} | |||
|
|||
QualType ASTReader::getLocalType(ModuleFile &F, unsigned LocalID) { | |||
QualType ASTReader::getLocalType(ModuleFile &F, TypeID LocalID) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we clarify what LocalID
means here in a comment somewhere?
IIUC, the ModuleFileIndex
there is an index into F.TransitiveImports
rather than into a "global" module manager.
Previously, we had distinction between unsigned
in parameters to this function and TypeID
as a return type.
Maybe we should keep this here and accept uint64_t
to avoid confusing the common global module file index and a local file index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we clarify what LocalID means here in a comment somewhere?
In ASTReader.h, there is a single line comment: Resolve a local type ID within a given AST file into a type.
. I add a new line to explain it as "A local type ID is only meaningful with the corresponding module file. See the implementation of getGlobalTypeID for details.".
IIUC, the ModuleFileIndex there is an index into F.TransitiveImports rather than into a "global" module manager.
Yes, this is correct for local ID. But I feel it might be odd to put this in the comment. I feel, it might be good to know, we can use ASTReader::getGlobalTypeID
to convert a <ModuleFile, LocalTypeID>
pair to a GlobalTypeID
. And the concrete process is the implementation details to GlobalTypeID`.
Previously, we had distinction between unsigned in parameters to this function and TypeID as a return type.
Maybe we should keep this here and accept uint64_t to avoid confusing the common global module file index and a local file index?
I don't feel it better. uint64_t
doesn't provide any type information. To be best, maybe we can do something like for LocalDeclID
and GlobalDeclID
. I only did that for DeclID since I feel the DeclID is the most complicated and the TypeID are much more simpler. I think we can this later if we feel it is necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I feel TypeID
is the "wrong" type here is because after reading the comments it feels that the module file index is always global. Keeping TypeID
here is probably okay, but we should put this distinction into a comment somewhere (either a comment to getLocalType
that says the TypeID we pass here is special because the index is relative to F
or a comment to the TypeID
itself that explains that module file index can be either global or relative to a module file).
I would probably vouch for putting this into getLocalType
to avoid overloading the readers of the code that does not touch getLocalType
with this special case (it seems to be quite limited anyway, we quickly read the value from the AST and convert it right away, same for writing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I tried to address this by introducing a new type alias using LocalTypeID = TypeID;
with a FIXME that we'd better to make them classes like we did for DeclID. Then at least I believe the interface will have better readability.
You're welcome.
I did and my local results shows, I see less than 1% change with this patch. This fits my understanding somehow. Since the
Yes. In the higher level, it should be easy to understand. There is a new type in
And this is exactly what the series patch want to eliminate. |
41291b2
to
7dad94d
Compare
e8f756e
to
a2f8832
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! With all the explanation, the patch makes a lot of sense.
Generally LG, I just have a few more NITs and feel it's good to go.
I am surprised to hear we have less TypeIDs than DeclIDs. My mental model was that for every type Decl we introduce, we'd have at least a few types with it (the type itself, pointer to that type, reference to that type, functions returning that type, etc). And every non-type declaration will have it's own type (the functions often have unique signatures).
Do we end up reusing the types from the current compilation and other modules we imported often enough that the amount of newly introduced types ends up being small? Or is my logic flawed in some ways?
/// module file. | ||
/// - the upper 32 bits provide a module file index. | ||
/// | ||
/// The type index values are partitioned into two | ||
/// sets. The values below NUM_PREDEF_TYPE_IDs are predefined type | ||
/// IDs (based on the PREDEF_TYPE_*_ID constants), with 0 as a | ||
/// placeholder for "no type". Values from NUM_PREDEF_TYPE_IDs are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: maybe explain in a comment that predefined types always have a module file index of 0 (i.e. do not belong to any module file)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -7151,15 +7135,45 @@ TypeSourceInfo *ASTRecordReader::readTypeSourceInfo() { | |||
return TInfo; | |||
} | |||
|
|||
static unsigned getIndexForTypeID(serialization::TypeID ID) { | |||
return (ID & llvm::maskTrailingOnes<TypeID>(32)) >> Qualifiers::FastWidth; | |||
; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: remove redundant semicolon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
return ID >> 32; | ||
} | ||
|
||
static bool isPredefinedTypes(serialization::TypeID ID) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: maybe rename to isPredefined
or isPredefinedType
? the plural form of types
does not make sense here since we pass a single value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done with isPredefinedType
@@ -7392,27 +7388,28 @@ QualType ASTReader::GetType(TypeID ID) { | |||
return TypesLoaded[Index].withFastQualifiers(FastQuals); | |||
} | |||
|
|||
QualType ASTReader::getLocalType(ModuleFile &F, unsigned LocalID) { | |||
QualType ASTReader::getLocalType(ModuleFile &F, TypeID LocalID) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I feel TypeID
is the "wrong" type here is because after reading the comments it feels that the module file index is always global. Keeping TypeID
here is probably okay, but we should put this distinction into a comment somewhere (either a comment to getLocalType
that says the TypeID we pass here is special because the index is relative to F
or a comment to the TypeID
itself that explains that module file index can be either global or relative to a module file).
I would probably vouch for putting this into getLocalType
to avoid overloading the readers of the code that does not touch getLocalType
with this special case (it seems to be quite limited anyway, we quickly read the value from the AST and convert it right away, same for writing)
Thanks.
Since most of the time, the types are reused. For example, if we have,
In the above example, in module B, there will be (in my mind, I didn't experiment actually) 6 new decls (a new So I feel it pretty makes sense that the number of type ID is much less than Decl ID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! The thing with the types does make sense now, I am (expectedly) just bad at statistical reasoning. It does make sense that there are many declarations that share the types. I suspect we won't see any significant increases in our internal toolchain release, but we'll reach out if we do find some pathological cases (that's probably very unlikely).
LGTM!
Thanks for reviewing |
Following of llvm#92085. #### motivation The motivation is still cutting of the unnecessary change in the dependency chain. See the above link (recursively) for details. And this will be the last patch of the `no-transitive-*-change` series. If there are any following patches, they might be C++20 Named modules specific to handle special grammars like `ADL` (See the reply in https://discourse.llvm.org/t/rfc-c-20-modules-introduce-thin-bmi-and-decls-hash/74755/53 for example). So they won't affect the whole serialization part as the series patch did. #### example After this patch, finally we are able to cut of unnecessary change of types. For example, ``` //--- m-partA.cppm export module m:partA; //--- m-partA.v1.cppm export module m:partA; namespace NS { class A { public: int getValue() { return 43; } }; } //--- m-partB.cppm export module m:partB; export inline int getB() { return 430; } //--- m.cppm export module m; export import :partA; export import :partB; //--- useBOnly.cppm export module useBOnly; import m; export inline int get() { return getB(); } ``` The BMI of `useBOnly.cppm` is expected to not change if we only add a new class in `m:partA`. This will be pretty useful in practice. #### implementation details The key idea of this patch is similar with the previous patches: extend the 32bits type ID to 64bits so that we can store the module file index in the higher bits. Then the encoding of the type ID is independent on the imported modules. But there are two differences from the previous patches: - TypeID is not completely an index of serialized types. We used the lower 3 bits to store the qualifiers. - TypeID won't take part in any lookup process. So the uses of TypeID is much less than the previous patches. The first difference make we have some more slightly complex bit operations. And the second difference makes the patch much simpler than the previous ones.
(Some backgrounds, not required to read: https://discourse.llvm.org/t/rfc-c-20-modules-introduce-thin-bmi-and-decls-hash/74755) This is the document part for the no-transitive-change (#86912, #92083, #92085, #92511) to provide the ability for build system to skip some unnecessary recompilations. See the patch for examples.
Summary: (Some backgrounds, not required to read: https://discourse.llvm.org/t/rfc-c-20-modules-introduce-thin-bmi-and-decls-hash/74755) This is the document part for the no-transitive-change (#86912, #92083, #92085, #92511) to provide the ability for build system to skip some unnecessary recompilations. See the patch for examples. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60251509
Following of #92085.
motivation
The motivation is still cutting of the unnecessary change in the dependency chain. See the above link (recursively) for details.
And this will be the last patch of the
no-transitive-*-change
series. If there are any following patches, they might be C++20 Named modules specific to handle special grammars likeADL
(See the reply in https://discourse.llvm.org/t/rfc-c-20-modules-introduce-thin-bmi-and-decls-hash/74755/53 for example). So they won't affect the whole serialization part as the series patch did.example
After this patch, finally we are able to cut of unnecessary change of types. For example,
The BMI of
useBOnly.cppm
is expected to not change if we only add a new class inm:partA
. This will be pretty useful in practice.implementation details
The key idea of this patch is similar with the previous patches: extend the 32bits type ID to 64bits so that we can store the module file index in the higher bits. Then the encoding of the type ID is independent on the imported modules.
But there are two differences from the previous patches:
The first difference make we have some more slightly complex bit operations. And the second difference makes the patch much simpler than the previous ones.