Skip to content

[LLDB][NativePDB] Resolve declaration for tag types #152579

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Aug 13, 2025

Conversation

Nerixyz
Copy link
Contributor

@Nerixyz Nerixyz commented Aug 7, 2025

Tag types like stucts or enums didn't have a declaration attached to them. The source locations are present in the IPI stream in LF_UDT_MOD_SRC_LINE records:

   0x101F | LF_UDT_MOD_SRC_LINE [size = 18, hash = 0x1C63]
            udt = 0x1058, mod = 3, file = 1, line = 0
   0x2789 | LF_UDT_MOD_SRC_LINE [size = 18, hash = 0x1E5A]
            udt = 0x1253, mod = 35, file = 93, line = 17069

The file is an ID in the string table /names:

     ID | String
      1 | '\<unknown>'
     12 | 'D:\a\_work\1\s\src\ExternalAPIs\WindowsSDKInc\c\Include\10.0.22621.0\um\wingdi.h'
     93 | 'D:\a\_work\1\s\src\ExternalAPIs\WindowsSDKInc\c\Include\10.0.22621.0\um\winnt.h'

Here, we're not interested in mod. This would indicate which module contributed the UDT.

I was looking at Rustc's PDB and found that it uses <unknown> for some types, so I added a check for that.

This makes two DIA PDB shell tests to work with the native PDB plugin.

@Nerixyz Nerixyz requested a review from JDevlieghere as a code owner August 7, 2025 19:12
@llvmbot llvmbot added the lldb label Aug 7, 2025
@llvmbot
Copy link
Member

llvmbot commented Aug 7, 2025

@llvm/pr-subscribers-lldb

Author: nerix (Nerixyz)

Changes

Tag types like stucts or enums didn't have a declaration attached to them. The source locations are present in the IPI stream in LF_UDT_MOD_SRC_LINE records:

   0x101F | LF_UDT_MOD_SRC_LINE [size = 18, hash = 0x1C63]
            udt = 0x1058, mod = 3, file = 1, line = 0
   0x2789 | LF_UDT_MOD_SRC_LINE [size = 18, hash = 0x1E5A]
            udt = 0x1253, mod = 35, file = 93, line = 17069

The file is an ID in the string table /names:

     ID | String
      1 | '\&lt;unknown&gt;'
     12 | 'D:\a\_work\1\s\src\ExternalAPIs\WindowsSDKInc\c\Include\10.0.22621.0\um\wingdi.h'
     93 | 'D:\a\_work\1\s\src\ExternalAPIs\WindowsSDKInc\c\Include\10.0.22621.0\um\winnt.h'

Here, we're not interested in mod. This would indicate which module contributed the UDT.

I was looking at Rustc's PDB and found that it uses &lt;unknown&gt; for some types, so I added a check for that.

This makes two DIA PDB shell tests to work with the native PDB plugin.


Full diff: https://github.com/llvm/llvm-project/pull/152579.diff

4 Files Affected:

  • (modified) lldb/source/Plugins/SymbolFile/NativePDB/SymbolFileNativePDB.cpp (+54-3)
  • (modified) lldb/source/Plugins/SymbolFile/NativePDB/SymbolFileNativePDB.h (+10)
  • (modified) lldb/test/Shell/SymbolFile/PDB/class-layout.test (+11-1)
  • (modified) lldb/test/Shell/SymbolFile/PDB/enums-layout.test (+6)
diff --git a/lldb/source/Plugins/SymbolFile/NativePDB/SymbolFileNativePDB.cpp b/lldb/source/Plugins/SymbolFile/NativePDB/SymbolFileNativePDB.cpp
index dcea33dd9f854..684d46d1da4e6 100644
--- a/lldb/source/Plugins/SymbolFile/NativePDB/SymbolFileNativePDB.cpp
+++ b/lldb/source/Plugins/SymbolFile/NativePDB/SymbolFileNativePDB.cpp
@@ -643,8 +643,7 @@ SymbolFileNativePDB::CreateClassStructUnion(PdbTypeSymId type_id,
 
   std::string uname = GetUnqualifiedTypeName(record);
 
-  // FIXME: Search IPI stream for LF_UDT_MOD_SRC_LINE.
-  Declaration decl;
+  Declaration decl = ResolveUdtDeclaration(type_id);
   return MakeType(toOpaqueUid(type_id), ConstString(uname), size, nullptr,
                   LLDB_INVALID_UID, Type::eEncodingIsUID, decl, ct,
                   Type::ResolveState::Forward);
@@ -667,7 +666,7 @@ lldb::TypeSP SymbolFileNativePDB::CreateTagType(PdbTypeSymId type_id,
                                                 CompilerType ct) {
   std::string uname = GetUnqualifiedTypeName(er);
 
-  Declaration decl;
+  Declaration decl = ResolveUdtDeclaration(type_id);
   TypeSP underlying_type = GetOrCreateType(er.UnderlyingType);
 
   return MakeType(
@@ -2441,3 +2440,55 @@ SymbolFileNativePDB::GetContextForType(TypeIndex ti) {
   }
   return ctx;
 }
+
+void SymbolFileNativePDB::CacheUdtDeclarations() {
+  if (m_has_cached_udt_declatations)
+    return;
+  m_has_cached_udt_declatations = true;
+
+  for (CVType cvt : m_index->ipi().typeArray()) {
+    if (cvt.kind() != LF_UDT_MOD_SRC_LINE)
+      continue;
+
+    UdtModSourceLineRecord udt_mod_src;
+    llvm::cantFail(TypeDeserializer::deserializeAs(cvt, udt_mod_src));
+    // Some types might be contributed by multiple modules. We assume that they
+    // all point to the same file and line because we can only provide one
+    // location.
+    m_udt_declarations.try_emplace(udt_mod_src.UDT,
+                                   udt_mod_src.SourceFile.getIndex(),
+                                   udt_mod_src.LineNumber);
+  }
+}
+
+Declaration SymbolFileNativePDB::ResolveUdtDeclaration(PdbTypeSymId type_id) {
+  CacheUdtDeclarations();
+  auto it = m_udt_declarations.find(type_id.index);
+  if (it == m_udt_declarations.end())
+    return Declaration();
+
+  auto [file_index, line] = it->second;
+  auto string_table = m_index->pdb().getStringTable();
+  if (!string_table) {
+    LLDB_LOG_ERROR(GetLog(LLDBLog::Symbols), string_table.takeError(),
+                   "Failed to get string table: {0}");
+    return Declaration();
+  }
+
+  llvm::Expected<llvm::StringRef> file_name =
+      string_table->getStringTable().getString(file_index);
+  if (!file_name) {
+    LLDB_LOG_ERROR(GetLog(LLDBLog::Symbols), file_name.takeError(),
+                   "Failed to get string with id {1}: {0}", file_index);
+    return Declaration();
+  }
+
+  // rustc sets the filename to "<unknown>" for some files
+  if (*file_name == "\\<unknown>")
+    return Declaration();
+
+  FileSpec::Style style = file_name->starts_with("/")
+                              ? FileSpec::Style::posix
+                              : FileSpec::Style::windows;
+  return Declaration(FileSpec(*file_name, style), line);
+}
diff --git a/lldb/source/Plugins/SymbolFile/NativePDB/SymbolFileNativePDB.h b/lldb/source/Plugins/SymbolFile/NativePDB/SymbolFileNativePDB.h
index eda375d4cebe7..2e8270794af1a 100644
--- a/lldb/source/Plugins/SymbolFile/NativePDB/SymbolFileNativePDB.h
+++ b/lldb/source/Plugins/SymbolFile/NativePDB/SymbolFileNativePDB.h
@@ -260,6 +260,9 @@ class SymbolFileNativePDB : public SymbolFileCommon {
 
   std::vector<CompilerContext> GetContextForType(llvm::codeview::TypeIndex ti);
 
+  void CacheUdtDeclarations();
+  Declaration ResolveUdtDeclaration(PdbTypeSymId type_id);
+
   llvm::BumpPtrAllocator m_allocator;
 
   lldb::addr_t m_obj_load_address = 0;
@@ -281,6 +284,13 @@ class SymbolFileNativePDB : public SymbolFileCommon {
   llvm::DenseMap<llvm::codeview::TypeIndex, llvm::codeview::TypeIndex>
       m_parent_types;
 
+  /// type index -> (filename index, line)
+  ///
+  /// The filename index is an index into the `/names` section (string table)
+  llvm::DenseMap<llvm::codeview::TypeIndex, std::pair<uint32_t, uint32_t>>
+      m_udt_declarations;
+  bool m_has_cached_udt_declatations = false;
+
   lldb_private::UniqueCStringMap<uint32_t> m_type_base_names;
 };
 
diff --git a/lldb/test/Shell/SymbolFile/PDB/class-layout.test b/lldb/test/Shell/SymbolFile/PDB/class-layout.test
index e9a7d1c0daa9e..eca910e997e40 100644
--- a/lldb/test/Shell/SymbolFile/PDB/class-layout.test
+++ b/lldb/test/Shell/SymbolFile/PDB/class-layout.test
@@ -12,9 +12,19 @@ RUN: lldb-test symbols %t.dir/ClassLayoutTest.cpp.exe | FileCheck --check-prefix
 RUN: lldb-test symbols %t.dir/ClassLayoutTest.cpp.exe | FileCheck --check-prefix=BASE %s
 RUN: lldb-test symbols %t.dir/ClassLayoutTest.cpp.exe | FileCheck --check-prefix=FRIEND %s
 RUN: lldb-test symbols %t.dir/ClassLayoutTest.cpp.exe | FileCheck --check-prefix=CLASS %s
+RUN: env LLDB_USE_NATIVE_PDB_READER=1 lldb-test symbols %t.dir/ClassLayoutTest.cpp.exe | FileCheck %s
+RUN: env LLDB_USE_NATIVE_PDB_READER=1 lldb-test symbols %t.dir/ClassLayoutTest.cpp.exe | FileCheck --check-prefix=ENUM %s
+RUN: env LLDB_USE_NATIVE_PDB_READER=1 lldb-test symbols %t.dir/ClassLayoutTest.cpp.exe | FileCheck --check-prefix=UNION %s
+RUN: env LLDB_USE_NATIVE_PDB_READER=1 lldb-test symbols %t.dir/ClassLayoutTest.cpp.exe | FileCheck --check-prefix=STRUCT %s
+RUN: env LLDB_USE_NATIVE_PDB_READER=1 lldb-test symbols %t.dir/ClassLayoutTest.cpp.exe | FileCheck --check-prefix=COMPLEX %s
+RUN: env LLDB_USE_NATIVE_PDB_READER=1 lldb-test symbols %t.dir/ClassLayoutTest.cpp.exe | FileCheck --check-prefix=LIST %s
+RUN: env LLDB_USE_NATIVE_PDB_READER=1 lldb-test symbols %t.dir/ClassLayoutTest.cpp.exe | FileCheck --check-prefix=UNNAMED-STRUCT %s
+RUN: env LLDB_USE_NATIVE_PDB_READER=1 lldb-test symbols %t.dir/ClassLayoutTest.cpp.exe | FileCheck --check-prefix=BASE %s
+RUN: env LLDB_USE_NATIVE_PDB_READER=1 lldb-test symbols %t.dir/ClassLayoutTest.cpp.exe | FileCheck --check-prefix=FRIEND %s
+RUN: env LLDB_USE_NATIVE_PDB_READER=1 lldb-test symbols %t.dir/ClassLayoutTest.cpp.exe | FileCheck --check-prefix=CLASS %s
 
 CHECK: Module [[MOD:.*]]
-CHECK: SymbolFile pdb ([[MOD]])
+CHECK: SymbolFile {{(native-)?}}pdb ([[MOD]])
 CHECK: {{^[0-9A-F]+}}:   CompileUnit{{[{]0x[0-9a-f]+[}]}}, language = "c++", file = '{{.*}}\ClassLayoutTest.cpp'
 
 ENUM:  name = "Enum", size = 4,  decl = ClassLayoutTest.cpp:5
diff --git a/lldb/test/Shell/SymbolFile/PDB/enums-layout.test b/lldb/test/Shell/SymbolFile/PDB/enums-layout.test
index 6f861c6d65adf..9766d6f8b0324 100644
--- a/lldb/test/Shell/SymbolFile/PDB/enums-layout.test
+++ b/lldb/test/Shell/SymbolFile/PDB/enums-layout.test
@@ -7,6 +7,12 @@ RUN: lldb-test symbols %t.dir/SimpleTypesTest.cpp.enums.exe | FileCheck --check-
 RUN: lldb-test symbols %t.dir/SimpleTypesTest.cpp.enums.exe | FileCheck --check-prefix=UCHAR-ENUM %s
 RUN: lldb-test symbols %t.dir/SimpleTypesTest.cpp.enums.exe | FileCheck --check-prefix=CLASS-ENUM %s
 RUN: lldb-test symbols %t.dir/SimpleTypesTest.cpp.enums.exe | FileCheck --check-prefix=STRUCT-ENUM %s
+RUN: env LLDB_USE_NATIVE_PDB_READER=1 lldb-test symbols %t.dir/SimpleTypesTest.cpp.enums.exe | FileCheck --check-prefix=ENUM %s
+RUN: env LLDB_USE_NATIVE_PDB_READER=1 lldb-test symbols %t.dir/SimpleTypesTest.cpp.enums.exe | FileCheck --check-prefix=CONST-ENUM %s
+RUN: env LLDB_USE_NATIVE_PDB_READER=1 lldb-test symbols %t.dir/SimpleTypesTest.cpp.enums.exe | FileCheck --check-prefix=EMPTY-ENUM %s
+RUN: env LLDB_USE_NATIVE_PDB_READER=1 lldb-test symbols %t.dir/SimpleTypesTest.cpp.enums.exe | FileCheck --check-prefix=UCHAR-ENUM %s
+RUN: env LLDB_USE_NATIVE_PDB_READER=1 lldb-test symbols %t.dir/SimpleTypesTest.cpp.enums.exe | FileCheck --check-prefix=CLASS-ENUM %s
+RUN: env LLDB_USE_NATIVE_PDB_READER=1 lldb-test symbols %t.dir/SimpleTypesTest.cpp.enums.exe | FileCheck --check-prefix=STRUCT-ENUM %s
 
 ; FIXME: PDB does not have information about scoped enumeration (Enum class) so the  
 ; compiler type used is the same as the one for unscoped enumeration.

@@ -12,9 +12,19 @@ RUN: lldb-test symbols %t.dir/ClassLayoutTest.cpp.exe | FileCheck --check-prefix
RUN: lldb-test symbols %t.dir/ClassLayoutTest.cpp.exe | FileCheck --check-prefix=BASE %s
RUN: lldb-test symbols %t.dir/ClassLayoutTest.cpp.exe | FileCheck --check-prefix=FRIEND %s
RUN: lldb-test symbols %t.dir/ClassLayoutTest.cpp.exe | FileCheck --check-prefix=CLASS %s
RUN: env LLDB_USE_NATIVE_PDB_READER=1 lldb-test symbols %t.dir/ClassLayoutTest.cpp.exe | FileCheck %s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it a bit awkward that we have NativePDB tests running outside of the SymbolFile/NativePDB test directory. Can we just put this test into the NativePDB directory too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to reply to this...

My concern was that this would now duplicate the test input and the expected output. We do the mixed tests in Shell/SymbolFile/PDB a few times.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm ok, lets just make sure we don't lose this coverage when we remove the DIA PDB plugin

Comment on lines 2486 to 2635
// rustc sets the filename to "<unknown>" for some files
if (*file_name == "\\<unknown>")
return Declaration();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to test this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, I added a test, noticed that LLVM emits the information as LF_UDT_SRC_LINE and handled that as well. We tested with clang-cl + MSVC link before.
I think MSVC's linker rewrites LF_UDT_SRC_LINE to LF_UDT_MOD_SRC_LINE to save space, but I haven't verified that. Either way, we need to handle both.

Comment on lines 2490 to 2492
FileSpec::Style style = file_name->starts_with("/")
? FileSpec::Style::posix
: FileSpec::Style::windows;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the file_name in this context? When would it start with a / or \?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I compiled Rust code, filenames from the standard library in the string table showed up as

   5631 | '/rustc/430d6eddfc6a455ca4a0137c0822a982cccd3b2b/library\core\src\ptr\mod.rs'
   5707 | '/rustc/430d6eddfc6a455ca4a0137c0822a982cccd3b2b/library\core\src\mem\mod.rs'
   ...

I just noticed the backslashes which can't occur on posix - so these are probably paths relative to C:\.

Copy link
Contributor

@ZequanWu ZequanWu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Nerixyz Nerixyz force-pushed the feat/lldb-npdb-udt-decl branch from 0c2607d to bd4bb25 Compare August 13, 2025 09:09
@Michael137 Michael137 merged commit 3f61e4e into llvm:main Aug 13, 2025
9 checks passed
@Michael137
Copy link
Member

@Nerixyz I suggest you request commit access. Can be done here: https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access

That way you can merge your own PRs in the future

@Nerixyz Nerixyz deleted the feat/lldb-npdb-udt-decl branch August 13, 2025 13:59
@kazutakahirata
Copy link
Contributor

@Nerixyz I've landed bfc331e to fix warnings from this PR. Thanks!

@Nerixyz
Copy link
Contributor Author

Nerixyz commented Aug 13, 2025

@Nerixyz I've landed bfc331e to fix warnings from this PR. Thanks!

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants