Skip to content

[serialization] No transitive type change #92511

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 21, 2024

Conversation

ChuanqiXu9
Copy link
Member

Following of #92085.

motivation

The motivation is still cutting of the unnecessary change in the dependency chain. See the above link (recursively) for details.

And this will be the last patch of the no-transitive-*-change series. If there are any following patches, they might be C++20 Named modules specific to handle special grammars like ADL (See the reply in https://discourse.llvm.org/t/rfc-c-20-modules-introduce-thin-bmi-and-decls-hash/74755/53 for example). So they won't affect the whole serialization part as the series patch did.

example

After this patch, finally we are able to cut of unnecessary change of types. For example,


//--- m-partA.cppm
export module m:partA;

//--- m-partA.v1.cppm
export module m:partA;

namespace NS {
    class A {
        public:
            int getValue() {
                return 43;
            }
    };
}

//--- m-partB.cppm
export module m:partB;

export inline int getB() {
    return 430;
}

//--- m.cppm
export module m;
export import :partA;
export import :partB;

//--- useBOnly.cppm
export module useBOnly;
import m;

export inline int get() {
    return getB();
}

The BMI of useBOnly.cppm is expected to not change if we only add a new class in m:partA. This will be pretty useful in practice.

implementation details

The key idea of this patch is similar with the previous patches: extend the 32bits type ID to 64bits so that we can store the module file index in the higher bits. Then the encoding of the type ID is independent on the imported modules.

But there are two differences from the previous patches:

  • TypeID is not completely an index of serialized types. We used the lower 3 bits to store the qualifiers.
  • TypeID won't take part in any lookup process. So the uses of TypeID is much less than the previous patches.

The first difference make we have some more slightly complex bit operations. And the second difference makes the patch much simpler than the previous ones.

@ChuanqiXu9 ChuanqiXu9 added the clang:modules C++20 modules and Clang Header Modules label May 17, 2024
@ChuanqiXu9 ChuanqiXu9 self-assigned this May 17, 2024
@llvmbot llvmbot added the clang Clang issues not falling into any other category label May 17, 2024
@llvmbot
Copy link
Member

llvmbot commented May 17, 2024

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-modules

Author: Chuanqi Xu (ChuanqiXu9)

Changes

Following of #92085.

motivation

The motivation is still cutting of the unnecessary change in the dependency chain. See the above link (recursively) for details.

And this will be the last patch of the no-transitive-*-change series. If there are any following patches, they might be C++20 Named modules specific to handle special grammars like ADL (See the reply in https://discourse.llvm.org/t/rfc-c-20-modules-introduce-thin-bmi-and-decls-hash/74755/53 for example). So they won't affect the whole serialization part as the series patch did.

example

After this patch, finally we are able to cut of unnecessary change of types. For example,


//--- m-partA.cppm
export module m:partA;

//--- m-partA.v1.cppm
export module m:partA;

namespace NS {
    class A {
        public:
            int getValue() {
                return 43;
            }
    };
}

//--- m-partB.cppm
export module m:partB;

export inline int getB() {
    return 430;
}

//--- m.cppm
export module m;
export import :partA;
export import :partB;

//--- useBOnly.cppm
export module useBOnly;
import m;

export inline int get() {
    return getB();
}

The BMI of useBOnly.cppm is expected to not change if we only add a new class in m:partA. This will be pretty useful in practice.

implementation details

The key idea of this patch is similar with the previous patches: extend the 32bits type ID to 64bits so that we can store the module file index in the higher bits. Then the encoding of the type ID is independent on the imported modules.

But there are two differences from the previous patches:

  • TypeID is not completely an index of serialized types. We used the lower 3 bits to store the qualifiers.
  • TypeID won't take part in any lookup process. So the uses of TypeID is much less than the previous patches.

The first difference make we have some more slightly complex bit operations. And the second difference makes the patch much simpler than the previous ones.


Patch is 28.70 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/92511.diff

11 Files Affected:

  • (modified) clang/include/clang/Serialization/ASTBitCodes.h (+24-8)
  • (modified) clang/include/clang/Serialization/ASTReader.h (+11-12)
  • (modified) clang/include/clang/Serialization/ASTRecordReader.h (+1-1)
  • (modified) clang/include/clang/Serialization/ModuleFile.h (-3)
  • (modified) clang/lib/Serialization/ASTReader.cpp (+55-49)
  • (modified) clang/lib/Serialization/ASTWriter.cpp (+18-13)
  • (modified) clang/lib/Serialization/ModuleFile.cpp (-1)
  • (modified) clang/test/Modules/no-transitive-decls-change.cppm (+1-11)
  • (modified) clang/test/Modules/no-transitive-identifier-change.cppm (-3)
  • (added) clang/test/Modules/no-transitive-type-change.cppm (+68)
  • (modified) clang/test/Modules/pr59999.cppm (+18-18)
diff --git a/clang/include/clang/Serialization/ASTBitCodes.h b/clang/include/clang/Serialization/ASTBitCodes.h
index 1fd482b5aff0e..486d5f4042c61 100644
--- a/clang/include/clang/Serialization/ASTBitCodes.h
+++ b/clang/include/clang/Serialization/ASTBitCodes.h
@@ -26,6 +26,7 @@
 #include "clang/Serialization/SourceLocationEncoding.h"
 #include "llvm/ADT/DenseMapInfo.h"
 #include "llvm/Bitstream/BitCodes.h"
+#include "llvm/Support/MathExtras.h"
 #include <cassert>
 #include <cstdint>
 
@@ -70,38 +71,53 @@ using DeclID = DeclIDBase::DeclID;
 
 /// An ID number that refers to a type in an AST file.
 ///
-/// The ID of a type is partitioned into two parts: the lower
+/// The ID of a type is partitioned into three parts:
+/// - the lower
 /// three bits are used to store the const/volatile/restrict
-/// qualifiers (as with QualType) and the upper bits provide a
-/// type index. The type index values are partitioned into two
+/// qualifiers (as with QualType).
+/// - the upper 29 bits provide a type index in the corresponding
+/// module file.
+/// - the upper 32 bits provide a module file index.
+///
+/// The type index values are partitioned into two
 /// sets. The values below NUM_PREDEF_TYPE_IDs are predefined type
 /// IDs (based on the PREDEF_TYPE_*_ID constants), with 0 as a
 /// placeholder for "no type". Values from NUM_PREDEF_TYPE_IDs are
 /// other types that have serialized representations.
-using TypeID = uint32_t;
+using TypeID = uint64_t;
 
 /// A type index; the type ID with the qualifier bits removed.
+/// Keep structure alignment 32-bit since the blob is assumed as 32-bit
+/// aligned.
 class TypeIdx {
+  uint32_t ModuleFileIndex = 0;
   uint32_t Idx = 0;
 
 public:
   TypeIdx() = default;
-  explicit TypeIdx(uint32_t index) : Idx(index) {}
+  explicit TypeIdx(uint32_t Idx) : ModuleFileIndex(0), Idx(Idx) {}
+
+  explicit TypeIdx(uint32_t ModuleFileIdx, uint32_t Idx)
+      : ModuleFileIndex(ModuleFileIdx), Idx(Idx) {}
+
+  uint32_t getModuleFileIndex() const { return ModuleFileIndex; }
 
-  uint32_t getIndex() const { return Idx; }
+  uint64_t getValue() const { return ((uint64_t)ModuleFileIndex << 32) | Idx; }
 
   TypeID asTypeID(unsigned FastQuals) const {
     if (Idx == uint32_t(-1))
       return TypeID(-1);
 
-    return (Idx << Qualifiers::FastWidth) | FastQuals;
+    unsigned Index = (Idx << Qualifiers::FastWidth) | FastQuals;
+    return ((uint64_t)ModuleFileIndex << 32) | Index;
   }
 
   static TypeIdx fromTypeID(TypeID ID) {
     if (ID == TypeID(-1))
       return TypeIdx(-1);
 
-    return TypeIdx(ID >> Qualifiers::FastWidth);
+    return TypeIdx(ID >> 32, (ID & llvm::maskTrailingOnes<TypeID>(32)) >>
+                                 Qualifiers::FastWidth);
   }
 };
 
diff --git a/clang/include/clang/Serialization/ASTReader.h b/clang/include/clang/Serialization/ASTReader.h
index 1da9123280f26..1c7ad2ea08314 100644
--- a/clang/include/clang/Serialization/ASTReader.h
+++ b/clang/include/clang/Serialization/ASTReader.h
@@ -487,14 +487,6 @@ class ASTReader
   /// ID = (I + 1) << FastQual::Width has already been loaded
   llvm::PagedVector<QualType> TypesLoaded;
 
-  using GlobalTypeMapType =
-      ContinuousRangeMap<serialization::TypeID, ModuleFile *, 4>;
-
-  /// Mapping from global type IDs to the module in which the
-  /// type resides along with the offset that should be added to the
-  /// global type ID to produce a local ID.
-  GlobalTypeMapType GlobalTypeMap;
-
   /// Declarations that have already been loaded from the chain.
   ///
   /// When the pointer at index I is non-NULL, the declaration with ID
@@ -1420,8 +1412,8 @@ class ASTReader
     RecordLocation(ModuleFile *M, uint64_t O) : F(M), Offset(O) {}
   };
 
-  QualType readTypeRecord(unsigned Index);
-  RecordLocation TypeCursorForIndex(unsigned Index);
+  QualType readTypeRecord(serialization::TypeID ID);
+  RecordLocation TypeCursorForIndex(serialization::TypeID ID);
   void LoadedDecl(unsigned Index, Decl *D);
   Decl *ReadDeclRecord(GlobalDeclID ID);
   void markIncompleteDeclChain(Decl *D);
@@ -1533,6 +1525,11 @@ class ASTReader
   std::pair<ModuleFile *, unsigned>
   translateIdentifierIDToIndex(serialization::IdentifierID ID) const;
 
+  /// Translate an \param TypeID ID to the index of TypesLoaded
+  /// array and the corresponding module file.
+  std::pair<ModuleFile *, unsigned>
+  translateTypeIDToIndex(serialization::TypeID ID) const;
+
 public:
   /// Load the AST file and validate its contents against the given
   /// Preprocessor.
@@ -1881,10 +1878,11 @@ class ASTReader
   QualType GetType(serialization::TypeID ID);
 
   /// Resolve a local type ID within a given AST file into a type.
-  QualType getLocalType(ModuleFile &F, unsigned LocalID);
+  QualType getLocalType(ModuleFile &F, serialization::TypeID LocalID);
 
   /// Map a local type ID within a given AST file into a global type ID.
-  serialization::TypeID getGlobalTypeID(ModuleFile &F, unsigned LocalID) const;
+  serialization::TypeID getGlobalTypeID(ModuleFile &F,
+                                        serialization::TypeID LocalID) const;
 
   /// Read a type from the current position in the given record, which
   /// was read from the given AST file.
@@ -1906,6 +1904,7 @@ class ASTReader
   /// if the declaration is not from a module file.
   ModuleFile *getOwningModuleFile(const Decl *D) const;
   ModuleFile *getOwningModuleFile(GlobalDeclID ID) const;
+  ModuleFile *getOwningModuleFile(serialization::TypeID ID) const;
 
   /// Returns the source location for the decl \p ID.
   SourceLocation getSourceLocationForDeclID(GlobalDeclID ID);
diff --git a/clang/include/clang/Serialization/ASTRecordReader.h b/clang/include/clang/Serialization/ASTRecordReader.h
index d00fb182f05f4..2561418b78ca7 100644
--- a/clang/include/clang/Serialization/ASTRecordReader.h
+++ b/clang/include/clang/Serialization/ASTRecordReader.h
@@ -163,7 +163,7 @@ class ASTRecordReader
   void readTypeLoc(TypeLoc TL, LocSeq *Seq = nullptr);
 
   /// Map a local type ID within a given AST file to a global type ID.
-  serialization::TypeID getGlobalTypeID(unsigned LocalID) const {
+  serialization::TypeID getGlobalTypeID(serialization::TypeID LocalID) const {
     return Reader->getGlobalTypeID(*F, LocalID);
   }
 
diff --git a/clang/include/clang/Serialization/ModuleFile.h b/clang/include/clang/Serialization/ModuleFile.h
index 075a07a81d637..84d0585eaceba 100644
--- a/clang/include/clang/Serialization/ModuleFile.h
+++ b/clang/include/clang/Serialization/ModuleFile.h
@@ -482,9 +482,6 @@ class ModuleFile {
   /// the global type ID space.
   serialization::TypeID BaseTypeIndex = 0;
 
-  /// Remapping table for type IDs in this module.
-  ContinuousRangeMap<uint32_t, int, 2> TypeRemap;
-
   // === Miscellaneous ===
 
   /// Diagnostic IDs and their mappings that the user changed.
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index b4f2bd182e473..d4783f9e6156b 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -3357,20 +3357,11 @@ llvm::Error ASTReader::ReadASTBlock(ModuleFile &F,
             "duplicate TYPE_OFFSET record in AST file");
       F.TypeOffsets = reinterpret_cast<const UnalignedUInt64 *>(Blob.data());
       F.LocalNumTypes = Record[0];
-      unsigned LocalBaseTypeIndex = Record[1];
       F.BaseTypeIndex = getTotalNumTypes();
 
-      if (F.LocalNumTypes > 0) {
-        // Introduce the global -> local mapping for types within this module.
-        GlobalTypeMap.insert(std::make_pair(getTotalNumTypes(), &F));
-
-        // Introduce the local -> global mapping for types within this module.
-        F.TypeRemap.insertOrReplace(
-          std::make_pair(LocalBaseTypeIndex,
-                         F.BaseTypeIndex - LocalBaseTypeIndex));
-
+      if (F.LocalNumTypes > 0)
         TypesLoaded.resize(TypesLoaded.size() + F.LocalNumTypes);
-      }
+
       break;
     }
 
@@ -4033,7 +4024,6 @@ void ASTReader::ReadModuleOffsetMap(ModuleFile &F) const {
   RemapBuilder PreprocessedEntityRemap(F.PreprocessedEntityRemap);
   RemapBuilder SubmoduleRemap(F.SubmoduleRemap);
   RemapBuilder SelectorRemap(F.SelectorRemap);
-  RemapBuilder TypeRemap(F.TypeRemap);
 
   auto &ImportedModuleVector = F.DependentModules;
   assert(ImportedModuleVector.empty());
@@ -4069,8 +4059,6 @@ void ASTReader::ReadModuleOffsetMap(ModuleFile &F) const {
         endian::readNext<uint32_t, llvm::endianness::little>(Data);
     uint32_t SelectorIDOffset =
         endian::readNext<uint32_t, llvm::endianness::little>(Data);
-    uint32_t TypeIndexOffset =
-        endian::readNext<uint32_t, llvm::endianness::little>(Data);
 
     auto mapOffset = [&](uint32_t Offset, uint32_t BaseOffset,
                          RemapBuilder &Remap) {
@@ -4085,7 +4073,6 @@ void ASTReader::ReadModuleOffsetMap(ModuleFile &F) const {
               PreprocessedEntityRemap);
     mapOffset(SubmoduleIDOffset, OM->BaseSubmoduleID, SubmoduleRemap);
     mapOffset(SelectorIDOffset, OM->BaseSelectorID, SelectorRemap);
-    mapOffset(TypeIndexOffset, OM->BaseTypeIndex, TypeRemap);
   }
 }
 
@@ -5064,12 +5051,12 @@ void ASTReader::InitializeContext() {
 
   // Load the special types.
   if (SpecialTypes.size() >= NumSpecialTypeIDs) {
-    if (unsigned String = SpecialTypes[SPECIAL_TYPE_CF_CONSTANT_STRING]) {
+    if (TypeID String = SpecialTypes[SPECIAL_TYPE_CF_CONSTANT_STRING]) {
       if (!Context.CFConstantStringTypeDecl)
         Context.setCFConstantStringType(GetType(String));
     }
 
-    if (unsigned File = SpecialTypes[SPECIAL_TYPE_FILE]) {
+    if (TypeID File = SpecialTypes[SPECIAL_TYPE_FILE]) {
       QualType FileType = GetType(File);
       if (FileType.isNull()) {
         Error("FILE type is NULL");
@@ -5090,7 +5077,7 @@ void ASTReader::InitializeContext() {
       }
     }
 
-    if (unsigned Jmp_buf = SpecialTypes[SPECIAL_TYPE_JMP_BUF]) {
+    if (TypeID Jmp_buf = SpecialTypes[SPECIAL_TYPE_JMP_BUF]) {
       QualType Jmp_bufType = GetType(Jmp_buf);
       if (Jmp_bufType.isNull()) {
         Error("jmp_buf type is NULL");
@@ -5111,7 +5098,7 @@ void ASTReader::InitializeContext() {
       }
     }
 
-    if (unsigned Sigjmp_buf = SpecialTypes[SPECIAL_TYPE_SIGJMP_BUF]) {
+    if (TypeID Sigjmp_buf = SpecialTypes[SPECIAL_TYPE_SIGJMP_BUF]) {
       QualType Sigjmp_bufType = GetType(Sigjmp_buf);
       if (Sigjmp_bufType.isNull()) {
         Error("sigjmp_buf type is NULL");
@@ -5129,25 +5116,24 @@ void ASTReader::InitializeContext() {
       }
     }
 
-    if (unsigned ObjCIdRedef
-          = SpecialTypes[SPECIAL_TYPE_OBJC_ID_REDEFINITION]) {
+    if (TypeID ObjCIdRedef = SpecialTypes[SPECIAL_TYPE_OBJC_ID_REDEFINITION]) {
       if (Context.ObjCIdRedefinitionType.isNull())
         Context.ObjCIdRedefinitionType = GetType(ObjCIdRedef);
     }
 
-    if (unsigned ObjCClassRedef
-          = SpecialTypes[SPECIAL_TYPE_OBJC_CLASS_REDEFINITION]) {
+    if (TypeID ObjCClassRedef =
+            SpecialTypes[SPECIAL_TYPE_OBJC_CLASS_REDEFINITION]) {
       if (Context.ObjCClassRedefinitionType.isNull())
         Context.ObjCClassRedefinitionType = GetType(ObjCClassRedef);
     }
 
-    if (unsigned ObjCSelRedef
-          = SpecialTypes[SPECIAL_TYPE_OBJC_SEL_REDEFINITION]) {
+    if (TypeID ObjCSelRedef =
+            SpecialTypes[SPECIAL_TYPE_OBJC_SEL_REDEFINITION]) {
       if (Context.ObjCSelRedefinitionType.isNull())
         Context.ObjCSelRedefinitionType = GetType(ObjCSelRedef);
     }
 
-    if (unsigned Ucontext_t = SpecialTypes[SPECIAL_TYPE_UCONTEXT_T]) {
+    if (TypeID Ucontext_t = SpecialTypes[SPECIAL_TYPE_UCONTEXT_T]) {
       QualType Ucontext_tType = GetType(Ucontext_t);
       if (Ucontext_tType.isNull()) {
         Error("ucontext_t type is NULL");
@@ -6632,10 +6618,8 @@ void ASTReader::ReadPragmaDiagnosticMappings(DiagnosticsEngine &Diag) {
 }
 
 /// Get the correct cursor and offset for loading a type.
-ASTReader::RecordLocation ASTReader::TypeCursorForIndex(unsigned Index) {
-  GlobalTypeMapType::iterator I = GlobalTypeMap.find(Index);
-  assert(I != GlobalTypeMap.end() && "Corrupted global type map");
-  ModuleFile *M = I->second;
+ASTReader::RecordLocation ASTReader::TypeCursorForIndex(TypeID ID) {
+  auto [M, Index] = translateTypeIDToIndex(ID);
   return RecordLocation(M, M->TypeOffsets[Index - M->BaseTypeIndex].get() +
                                M->DeclsBlockStartOffset);
 }
@@ -6656,10 +6640,10 @@ static std::optional<Type::TypeClass> getTypeClassForCode(TypeCode code) {
 /// routine actually reads the record corresponding to the type at the given
 /// location. It is a helper routine for GetType, which deals with reading type
 /// IDs.
-QualType ASTReader::readTypeRecord(unsigned Index) {
+QualType ASTReader::readTypeRecord(TypeID ID) {
   assert(ContextObj && "reading type with no AST context");
   ASTContext &Context = *ContextObj;
-  RecordLocation Loc = TypeCursorForIndex(Index);
+  RecordLocation Loc = TypeCursorForIndex(ID);
   BitstreamCursor &DeclsCursor = Loc.F->DeclsCursor;
 
   // Keep track of where we are in the stream, then jump back there
@@ -7100,14 +7084,25 @@ TypeSourceInfo *ASTRecordReader::readTypeSourceInfo() {
   return TInfo;
 }
 
+std::pair<ModuleFile *, unsigned>
+ASTReader::translateTypeIDToIndex(serialization::TypeID ID) const {
+  unsigned Index =
+      (ID & llvm::maskTrailingOnes<TypeID>(32)) >> Qualifiers::FastWidth;
+
+  ModuleFile *OwningModuleFile = getOwningModuleFile(ID);
+  assert(OwningModuleFile &&
+         "untranslated type ID or local type ID shouldn't be in TypesLoaded");
+  return {OwningModuleFile, OwningModuleFile->BaseTypeIndex + Index};
+}
+
 QualType ASTReader::GetType(TypeID ID) {
   assert(ContextObj && "reading type with no AST context");
   ASTContext &Context = *ContextObj;
 
   unsigned FastQuals = ID & Qualifiers::FastMask;
-  unsigned Index = ID >> Qualifiers::FastWidth;
 
-  if (Index < NUM_PREDEF_TYPE_IDS) {
+  if (uint64_t Index = ID >> Qualifiers::FastWidth;
+      Index < NUM_PREDEF_TYPE_IDS) {
     QualType T;
     switch ((PredefinedTypeIDs)Index) {
     case PREDEF_TYPE_LAST_ID:
@@ -7376,10 +7371,11 @@ QualType ASTReader::GetType(TypeID ID) {
     return T.withFastQualifiers(FastQuals);
   }
 
-  Index -= NUM_PREDEF_TYPE_IDS;
+  unsigned Index = translateTypeIDToIndex(ID).second;
+
   assert(Index < TypesLoaded.size() && "Type index out-of-range");
   if (TypesLoaded[Index].isNull()) {
-    TypesLoaded[Index] = readTypeRecord(Index);
+    TypesLoaded[Index] = readTypeRecord(ID);
     if (TypesLoaded[Index].isNull())
       return QualType();
 
@@ -7392,27 +7388,28 @@ QualType ASTReader::GetType(TypeID ID) {
   return TypesLoaded[Index].withFastQualifiers(FastQuals);
 }
 
-QualType ASTReader::getLocalType(ModuleFile &F, unsigned LocalID) {
+QualType ASTReader::getLocalType(ModuleFile &F, TypeID LocalID) {
   return GetType(getGlobalTypeID(F, LocalID));
 }
 
-serialization::TypeID
-ASTReader::getGlobalTypeID(ModuleFile &F, unsigned LocalID) const {
-  unsigned FastQuals = LocalID & Qualifiers::FastMask;
-  unsigned LocalIndex = LocalID >> Qualifiers::FastWidth;
-
-  if (LocalIndex < NUM_PREDEF_TYPE_IDS)
+serialization::TypeID ASTReader::getGlobalTypeID(ModuleFile &F,
+                                                 TypeID LocalID) const {
+  if ((LocalID >> Qualifiers::FastWidth) < NUM_PREDEF_TYPE_IDS)
     return LocalID;
 
   if (!F.ModuleOffsetMap.empty())
     ReadModuleOffsetMap(F);
 
-  ContinuousRangeMap<uint32_t, int, 2>::iterator I
-    = F.TypeRemap.find(LocalIndex - NUM_PREDEF_TYPE_IDS);
-  assert(I != F.TypeRemap.end() && "Invalid index into type index remap");
+  unsigned ModuleFileIndex = LocalID >> 32;
+  LocalID &= llvm::maskTrailingOnes<TypeID>(32);
 
-  unsigned GlobalIndex = LocalIndex + I->second;
-  return (GlobalIndex << Qualifiers::FastWidth) | FastQuals;
+  if (ModuleFileIndex == 0)
+    LocalID -= NUM_PREDEF_TYPE_IDS << Qualifiers::FastWidth;
+
+  ModuleFile &MF =
+      ModuleFileIndex ? *F.DependentModules[ModuleFileIndex - 1] : F;
+  ModuleFileIndex = MF.Index + 1;
+  return ((uint64_t)ModuleFileIndex << 32) | LocalID;
 }
 
 TemplateArgumentLocInfo
@@ -7650,6 +7647,16 @@ ModuleFile *ASTReader::getOwningModuleFile(GlobalDeclID ID) const {
   return &getModuleManager()[ModuleFileIndex - 1];
 }
 
+ModuleFile *ASTReader::getOwningModuleFile(TypeID ID) const {
+  if (ID < NUM_PREDEF_TYPE_IDS)
+    return nullptr;
+
+  uint64_t ModuleFileIndex = ID >> 32;
+  assert(ModuleFileIndex && "Untranslated Local Decl?");
+
+  return &getModuleManager()[ModuleFileIndex - 1];
+}
+
 ModuleFile *ASTReader::getOwningModuleFile(const Decl *D) const {
   if (!D->isFromASTFile())
     return nullptr;
@@ -8167,7 +8174,6 @@ LLVM_DUMP_METHOD void ASTReader::dump() {
   llvm::errs() << "*** PCH/ModuleFile Remappings:\n";
   dumpModuleIDMap("Global bit offset map", GlobalBitOffsetsMap);
   dumpModuleIDMap("Global source location entry map", GlobalSLocEntryMap);
-  dumpModuleIDMap("Global type map", GlobalTypeMap);
   dumpModuleIDMap("Global macro map", GlobalMacroMap);
   dumpModuleIDMap("Global submodule map", GlobalSubmoduleMap);
   dumpModuleIDMap("Global selector map", GlobalSelectorMap);
diff --git a/clang/lib/Serialization/ASTWriter.cpp b/clang/lib/Serialization/ASTWriter.cpp
index 0008eaab42cdf..4a0c9425f6902 100644
--- a/clang/lib/Serialization/ASTWriter.cpp
+++ b/clang/lib/Serialization/ASTWriter.cpp
@@ -3263,17 +3263,18 @@ void ASTWriter::WritePragmaDiagnosticMappings(const DiagnosticsEngine &Diag,
 /// Write the representation of a type to the AST stream.
 void ASTWriter::WriteType(QualType T) {
   TypeIdx &IdxRef = TypeIdxs[T];
-  if (IdxRef.getIndex() == 0) // we haven't seen this type before.
+  if (IdxRef.getValue() == 0) // we haven't seen this type before.
     IdxRef = TypeIdx(NextTypeID++);
   TypeIdx Idx = IdxRef;
 
-  assert(Idx.getIndex() >= FirstTypeID && "Re-writing a type from a prior AST");
+  assert(Idx.getModuleFileIndex() == 0 && "Re-writing a type from a prior AST");
+  assert(Idx.getValue() >= FirstTypeID && "Writing predefined type");
 
   // Emit the type's representation.
   uint64_t Offset = ASTTypeWriter(*this).write(T) - DeclTypesBlockStartOffset;
 
   // Record the offset for this type.
-  unsigned Index = Idx.getIndex() - FirstTypeID;
+  uint64_t Index = Idx.getValue() - FirstTypeID;
   if (TypeOffsets.size() == Index)
     TypeOffsets.emplace_back(Offset);
   else if (TypeOffsets.size() < Index) {
@@ -3346,12 +3347,10 @@ void ASTWriter::WriteTypeDeclOffsets() {
   auto Abbrev = std::make_shared<BitCodeAbbrev>();
   Abbrev->Add(BitCodeAbbrevOp(TYPE_OFFSET));
   Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 32)); // # of types
-  Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 32)); // base type index
   Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob)); // types block
   unsigned TypeOffsetAbbrev = Stream.EmitAbbrev(std::move(Abbrev));
   {
-    RecordData::value_type Record[] = {TYPE_OFFSET, TypeOffsets.size(),
-                                       FirstTypeID - NUM_PREDEF_TYPE_IDS};
+    RecordData::value_type Record[] = {TYPE_OFFSET, TypeOffsets.size()};
     Stream.EmitRecordWithBlob(TypeOffsetAbbrev, Record, bytes(TypeOffsets));
   }
 
@@ -5428,7 +5427,6 @@ ASTFileSignature ASTWriter::WriteASTCore(Sema &SemaRef, StringRef isysroot,
                           M.NumPreprocessedEntities);
         writeBaseIDOrNone(M.BaseSubmoduleID, M.LocalNumSubmodules);
         writeBaseIDOrNone(M.BaseSelectorID, M.LocalNumSelectors);
-        writeBaseIDOrNone(M.BaseTypeIndex, M.LocalNumTypes);
       }
     }
     RecordData::value_type Record[] = {MODULE_OFFSET_MAP};
@@ -6093,7 +6091,7 @@ TypeID ASTWriter::GetOrCreateTypeID(QualType T) {
     assert(!T.getLocalFastQualifiers());
 
     TypeIdx &Idx = TypeIdxs[T];
-    if (Idx.getIndex() == 0) {
+    if (Idx.getValue() == 0) {
       if (DoneWritingDeclsAndTypes) {
         assert(0 && "New type seen after serializing all the types to emit!");
         return TypeIdx();
@@ -6608,11 +6606,9 @@ void ASTWriter::ReaderInitialized(ASTReader *Reader) {
 
   // Note, this will get called multiple times, once one the reader starts up
   // and again each time it's done reading a PCH or module.
-  FirstTypeID = NUM_PREDEF_TYPE_IDS + Chain->getTotalNumTypes();
   FirstMacroID = NUM_PREDEF_MACRO_IDS + Chain->getTotalNumMacros();
   FirstSubmoduleID = NUM_PREDEF_SUBMODULE_IDS + Chain->getTotalNumSubmodules();
   FirstSelect...
[truncated]

@ChuanqiXu9
Copy link
Member Author

@jansvoboda11 @Bigcheese gentle ping

@ChuanqiXu9 ChuanqiXu9 force-pushed the users/ChuanqiXu9/NoTtansitiveIdentifierChange branch from c612b56 to a2fb7f5 Compare May 27, 2024 07:31
@ChuanqiXu9 ChuanqiXu9 force-pushed the users/ChuanqiXu9/NoTransitiveTypeChange branch from 2265f12 to 1a83e2b Compare May 27, 2024 10:52
@ChuanqiXu9 ChuanqiXu9 force-pushed the users/ChuanqiXu9/NoTtansitiveIdentifierChange branch from a2fb7f5 to e8f756e Compare June 4, 2024 07:40
@ChuanqiXu9 ChuanqiXu9 force-pushed the users/ChuanqiXu9/NoTransitiveTypeChange branch from 1a83e2b to 57cfb2b Compare June 4, 2024 07:51
@ChuanqiXu9 ChuanqiXu9 requested a review from ilya-biryukov June 18, 2024 01:44
@ChuanqiXu9
Copy link
Member Author

@jansvoboda11 @ilya-biryukov would you like to take look?

@ilya-biryukov
Copy link
Contributor

I am happy to take a look, but will run out of time for it today. I will definitely do it tomorrow, though.

Copy link
Contributor

@ilya-biryukov ilya-biryukov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change! I have done a round of review and left a few suggestion and also have a bunch of questions. I am sorry if some of them may look too obvious, I did try to dig into the code where I could, but Clang's serialization is complicated and some things are easier researched through a conversation than through looking at code. Please bear with me, I promise to get better in the upcoming reviews.

Here are a few extra question that came to mind too.

Question 1: Do we have estimates on how much bigger the PCMs become? Did you observer any negative performance impact?

I would expect TypeIDs to be much more common than decls, and the corresponding size increases to be more significant. We've already lost ~6% from DeclID change, I am slightly worried the types are going to be a bigger hit.

Question 2: could you explain why we the PCM in your example was changing before? Was there some base offset inherited from the PCM files we depended on?


// Otherwise, keep the highest ID since the module file comes later has
// higher module file indexes.
if (Idx.getValue() >= StoredIdx.getValue())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (Idx.getValue() >= StoredIdx.getValue())
if (Idx.getModuleFileIndex() >= StoredIdx.getModuleFileIndex())

This seems to align better with what the comment for the code are saying. If this does not work, I might be missing something.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, done.

// RUN: rm -rf %t
// RUN: mkdir -p %t
// RUN: split-file %s %t
// RUNX: rm -rf %t
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these leftovers from previous versions of the patch that need to be reverted?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry. This is something left during debugging. I will remove it.

@@ -7392,27 +7388,28 @@ QualType ASTReader::GetType(TypeID ID) {
return TypesLoaded[Index].withFastQualifiers(FastQuals);
}

QualType ASTReader::getLocalType(ModuleFile &F, unsigned LocalID) {
QualType ASTReader::getLocalType(ModuleFile &F, TypeID LocalID) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we clarify what LocalID means here in a comment somewhere?
IIUC, the ModuleFileIndex there is an index into F.TransitiveImports rather than into a "global" module manager.

Previously, we had distinction between unsigned in parameters to this function and TypeID as a return type.
Maybe we should keep this here and accept uint64_t to avoid confusing the common global module file index and a local file index?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we clarify what LocalID means here in a comment somewhere?

In ASTReader.h, there is a single line comment: Resolve a local type ID within a given AST file into a type.. I add a new line to explain it as "A local type ID is only meaningful with the corresponding module file. See the implementation of getGlobalTypeID for details.".

IIUC, the ModuleFileIndex there is an index into F.TransitiveImports rather than into a "global" module manager.

Yes, this is correct for local ID. But I feel it might be odd to put this in the comment. I feel, it might be good to know, we can use ASTReader::getGlobalTypeID to convert a <ModuleFile, LocalTypeID> pair to a GlobalTypeID. And the concrete process is the implementation details to GlobalTypeID`.

Previously, we had distinction between unsigned in parameters to this function and TypeID as a return type.
Maybe we should keep this here and accept uint64_t to avoid confusing the common global module file index and a local file index?

I don't feel it better. uint64_t doesn't provide any type information. To be best, maybe we can do something like for LocalDeclID and GlobalDeclID. I only did that for DeclID since I feel the DeclID is the most complicated and the TypeID are much more simpler. I think we can this later if we feel it is necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I feel TypeID is the "wrong" type here is because after reading the comments it feels that the module file index is always global. Keeping TypeID here is probably okay, but we should put this distinction into a comment somewhere (either a comment to getLocalType that says the TypeID we pass here is special because the index is relative to F or a comment to the TypeID itself that explains that module file index can be either global or relative to a module file).
I would probably vouch for putting this into getLocalType to avoid overloading the readers of the code that does not touch getLocalType with this special case (it seems to be quite limited anyway, we quickly read the value from the AST and convert it right away, same for writing)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I tried to address this by introducing a new type alias using LocalTypeID = TypeID; with a FIXME that we'd better to make them classes like we did for DeclID. Then at least I believe the interface will have better readability.

@ChuanqiXu9
Copy link
Member Author

Thanks for the change! I have done a round of review and left a few suggestion and also have a bunch of questions. I am sorry if some of them may look too obvious, I did try to dig into the code where I could, but Clang's serialization is complicated and some things are easier researched through a conversation than through looking at code. Please bear with me, I promise to get better in the upcoming reviews.

You're welcome.

Here are a few extra question that came to mind too.

Question 1: Do we have estimates on how much bigger the PCMs become? Did you observer any negative performance impact?

I would expect TypeIDs to be much more common than decls, and the corresponding size increases to be more significant. We've already lost ~6% from DeclID change, I am slightly worried the types are going to be a bigger hit.

I did and my local results shows, I see less than 1% change with this patch. This fits my understanding somehow. Since the TypeID is much less common than DeclID. This matches my experience in coding. The DeclID patch is the hardest one and the TypeID patch is the easiest one.

Question 2: could you explain why we the PCM in your example was changing before? Was there some base offset inherited from the PCM files we depended on?

Yes. In the higher level, it should be easy to understand. There is a new type in m-partA.v1.cppm. And in the low level, when we write a module file, we would record the type offset for the module files we depend on:

writeBaseIDOrNone(M.BaseTypeIndex, M.LocalNumTypes);

And this is exactly what the series patch want to eliminate.

@ChuanqiXu9 ChuanqiXu9 force-pushed the users/ChuanqiXu9/NoTransitiveTypeChange branch from 41291b2 to 7dad94d Compare June 20, 2024 03:20
@ChuanqiXu9 ChuanqiXu9 force-pushed the users/ChuanqiXu9/NoTtansitiveIdentifierChange branch from e8f756e to a2f8832 Compare June 20, 2024 05:28
Base automatically changed from users/ChuanqiXu9/NoTtansitiveIdentifierChange to main June 20, 2024 05:30
Copy link
Contributor

@ilya-biryukov ilya-biryukov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! With all the explanation, the patch makes a lot of sense.
Generally LG, I just have a few more NITs and feel it's good to go.

I am surprised to hear we have less TypeIDs than DeclIDs. My mental model was that for every type Decl we introduce, we'd have at least a few types with it (the type itself, pointer to that type, reference to that type, functions returning that type, etc). And every non-type declaration will have it's own type (the functions often have unique signatures).
Do we end up reusing the types from the current compilation and other modules we imported often enough that the amount of newly introduced types ends up being small? Or is my logic flawed in some ways?

/// module file.
/// - the upper 32 bits provide a module file index.
///
/// The type index values are partitioned into two
/// sets. The values below NUM_PREDEF_TYPE_IDs are predefined type
/// IDs (based on the PREDEF_TYPE_*_ID constants), with 0 as a
/// placeholder for "no type". Values from NUM_PREDEF_TYPE_IDs are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: maybe explain in a comment that predefined types always have a module file index of 0 (i.e. do not belong to any module file)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -7151,15 +7135,45 @@ TypeSourceInfo *ASTRecordReader::readTypeSourceInfo() {
return TInfo;
}

static unsigned getIndexForTypeID(serialization::TypeID ID) {
return (ID & llvm::maskTrailingOnes<TypeID>(32)) >> Qualifiers::FastWidth;
;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: remove redundant semicolon.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return ID >> 32;
}

static bool isPredefinedTypes(serialization::TypeID ID) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: maybe rename to isPredefined or isPredefinedType? the plural form of types does not make sense here since we pass a single value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with isPredefinedType

@@ -7392,27 +7388,28 @@ QualType ASTReader::GetType(TypeID ID) {
return TypesLoaded[Index].withFastQualifiers(FastQuals);
}

QualType ASTReader::getLocalType(ModuleFile &F, unsigned LocalID) {
QualType ASTReader::getLocalType(ModuleFile &F, TypeID LocalID) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I feel TypeID is the "wrong" type here is because after reading the comments it feels that the module file index is always global. Keeping TypeID here is probably okay, but we should put this distinction into a comment somewhere (either a comment to getLocalType that says the TypeID we pass here is special because the index is relative to F or a comment to the TypeID itself that explains that module file index can be either global or relative to a module file).
I would probably vouch for putting this into getLocalType to avoid overloading the readers of the code that does not touch getLocalType with this special case (it seems to be quite limited anyway, we quickly read the value from the AST and convert it right away, same for writing)

@ChuanqiXu9
Copy link
Member Author

Thanks! With all the explanation, the patch makes a lot of sense. Generally LG, I just have a few more NITs and feel it's good to go.

Thanks.

I am surprised to hear we have less TypeIDs than DeclIDs. My mental model was that for every type Decl we introduce, we'd have at least a few types with it (the type itself, pointer to that type, reference to that type, functions returning that type, etc). And every non-type declaration will have it's own type (the functions often have unique signatures). Do we end up reusing the types from the current compilation and other modules we imported often enough that the amount of newly introduced types ends up being small? Or is my logic flawed in some ways?

Since most of the time, the types are reused. For example, if we have,

export module A;
export struct A { int a; }
export A getA(A) { return {43}; }
export module B;
import A;
export struct B {
     A v1;
     A v2;
     A v3;
};

export A getAFromB(A p) { ... }

In the above example, in module B, there will be (in my mind, I didn't experiment actually) 6 new decls (a new struct decl for B, 3 new field decl for v1, v2 and v3. a new function decl for getAFromB and a new parm decl for p). However, there will only be only one new type, the struct type B. The function type A *(A) is reused.

So I feel it pretty makes sense that the number of type ID is much less than Decl ID.

Copy link
Contributor

@ilya-biryukov ilya-biryukov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The thing with the types does make sense now, I am (expectedly) just bad at statistical reasoning. It does make sense that there are many declarations that share the types. I suspect we won't see any significant increases in our internal toolchain release, but we'll reach out if we do find some pathological cases (that's probably very unlikely).

LGTM!

@ChuanqiXu9
Copy link
Member Author

Thanks for reviewing

@ChuanqiXu9 ChuanqiXu9 merged commit 03921b9 into main Jun 21, 2024
7 checks passed
@ChuanqiXu9 ChuanqiXu9 deleted the users/ChuanqiXu9/NoTransitiveTypeChange branch June 21, 2024 01:21
AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this pull request Jul 9, 2024
Following of llvm#92085. 

#### motivation

The motivation is still cutting of the unnecessary change in the
dependency chain. See the above link (recursively) for details.

And this will be the last patch of the `no-transitive-*-change` series.
If there are any following patches, they might be C++20 Named modules
specific to handle special grammars like `ADL` (See the reply in
https://discourse.llvm.org/t/rfc-c-20-modules-introduce-thin-bmi-and-decls-hash/74755/53
for example). So they won't affect the whole serialization part as the
series patch did.

#### example

After this patch, finally we are able to cut of unnecessary change of
types. For example,

```

//--- m-partA.cppm
export module m:partA;

//--- m-partA.v1.cppm
export module m:partA;

namespace NS {
    class A {
        public:
            int getValue() {
                return 43;
            }
    };
}

//--- m-partB.cppm
export module m:partB;

export inline int getB() {
    return 430;
}

//--- m.cppm
export module m;
export import :partA;
export import :partB;

//--- useBOnly.cppm
export module useBOnly;
import m;

export inline int get() {
    return getB();
}
```

The BMI of `useBOnly.cppm` is expected to not change if we only add a
new class in `m:partA`. This will be pretty useful in practice.

#### implementation details

The key idea of this patch is similar with the previous patches: extend
the 32bits type ID to 64bits so that we can store the module file index
in the higher bits. Then the encoding of the type ID is independent on
the imported modules.

But there are two differences from the previous patches:
- TypeID is not completely an index of serialized types. We used the
lower 3 bits to store the qualifiers.
- TypeID won't take part in any lookup process. So the uses of TypeID is
much less than the previous patches.

The first difference make we have some more slightly complex bit
operations. And the second difference makes the patch much simpler than
the previous ones.
ChuanqiXu9 added a commit that referenced this pull request Jul 19, 2024
(Some backgrounds, not required to read:
https://discourse.llvm.org/t/rfc-c-20-modules-introduce-thin-bmi-and-decls-hash/74755)

This is the document part for the no-transitive-change
(#86912,
#92083,
#92085,
#92511) to provide the ability
for build system to skip some unnecessary recompilations.

See the patch for examples.
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
Summary:
(Some backgrounds, not required to read:
https://discourse.llvm.org/t/rfc-c-20-modules-introduce-thin-bmi-and-decls-hash/74755)

This is the document part for the no-transitive-change
(#86912,
#92083,
#92085,
#92511) to provide the ability
for build system to skip some unnecessary recompilations.

See the patch for examples.

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60251509
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants