Skip to content

[lldb][Expression] Printing union with self-referencing member field crashes lldb #68135

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Michael137 opened this issue Oct 3, 2023 · 13 comments · Fixed by #68300
Closed

[lldb][Expression] Printing union with self-referencing member field crashes lldb #68135

Michael137 opened this issue Oct 3, 2023 · 13 comments · Fixed by #68300
Assignees
Labels

Comments

@Michael137
Copy link
Member

Michael137 commented Oct 3, 2023

//struct Attrs // << this works                 
union Attrs // << this crashes                  
{                                                                  
    static const Attrs regular;                 
};                                                                 
                                                                   
const Attrs Attrs::regular{};
                                                                   
int main() {                                                       
    Attrs d;                                    
    return 0;                                                      
}                                                                  
clang++ main.cpp -g -O0 -std=c++2a
./bin/lldb a.out -o "br se -p return -X main" -o run -o "frame var d"

(on llvm.org lldb build at commit 0bfaed8c612705cfa8c5382d26d8089a0a26386b)

This will infinitely recurse in ASTContext::getASTRecordLayout. Possibly the same crash as:

@Michael137 Michael137 added the lldb label Oct 3, 2023
@Michael137 Michael137 self-assigned this Oct 3, 2023
@llvmbot
Copy link
Member

llvmbot commented Oct 3, 2023

@llvm/issue-subscribers-lldb

``` //struct DependentDylibAttributes // << this works union DependentDylibAttributes // << this crashes { static const DependentDylibAttributes regular; };

const DependentDylibAttributes DependentDylibAttributes::regular{};

int main() {
DependentDylibAttributes d;
return 0;
}


clang++ dyld.cpp -g -O0 -std=c++2a
./bin/lldb a.out -o "br se -p return -X main" -o run -o "frame var d"

(on llvm.org lldb build at commit `0bfaed8c612705cfa8c5382d26d8089a0a26386b`)

This will infinitely recurse in `ASTContext::getASTRecordLayout`. Possibly the same crash as:
* https://github.com/llvm/llvm-project/issues/43604
* https://github.com/llvm/llvm-project/issues/63667
* https://github.com/llvm/llvm-project/issues/64628
* https://github.com/llvm/llvm-project/issues/66335
</details>

@Michael137
Copy link
Member Author

Michael137 commented Oct 4, 2023

Oh looks like for unions we're mistakenly adding an unnamed padding FieldDecl in DWARFASTParserClang::ParseSingleMember. This confuses the RecordLayoutBuilder because the Attrs type now claims to have a FieldDecl when it actually doesn't; regular should be a VarDecl (at least that's what it is when dumping the actual clang AST of this file).

@Michael137
Copy link
Member Author

// Handle static members, which is any member that doesn't have a bit or a  
// byte member offset.                                                      
if (attrs.member_byte_offset == UINT32_MAX &&                               
    attrs.data_bit_offset == UINT64_MAX) {                                  
  Type *var_type = die.ResolveTypeUID(die: attrs.encoding_form.Reference());

This is what we do for structures that have static members. But this doesn't happen for unions.

Because of this:

MemberAttributes::MemberAttributes(const DWARFDIE &die,                         
                                   const DWARFDIE &parent_die,                  
                                   ModuleSP module_sp) {                        
  member_byte_offset = (parent_die.Tag() == DW_TAG_union_type) ? 0 : UINT32_MAX;

@Michael137
Copy link
Member Author

That logic was added in: aaae5f8

[DWARFASTParserClang] Start with member offset of 0 for members of union types.

Summary:
GCC does not emit DW_AT_data_member_location for members of a union.
Starting with a 0 value for member locations helps is reading union types
in such cases.

@Michael137
Copy link
Member Author

Michael137 commented Oct 4, 2023

So it looks like the change in aaae5f8 directly conflicts with the change in #55040.

The latter makes it so we always check the presence of member_byte_offset to determine whether we have a static data member. Whereas the former unconditionally adds a non-sentinel default member_byte_offset to all union data members (regardless of whether they're static or not).

The effect of this is that LLDB never correctly detects that a union has static data members.

From both commit descriptions it sounds like both changes were done to support GCC (it neither emits a DW_AT_data_member_location for union members, nor does it emit DW_AT_external in some cases for static data members, e.g., in anonymous namespaces).

AFAICT, the change in aaae5f8 is only necessary so that we stop claiming non-static data members are static for GCC-built binaries.

I see that DW_AT_declaration gets attached to non-static data members in both gcc and clang (and regardless of anonymous namespace). Could we use that as a heuristic? Then the solution could be as simple as:

diff --git a/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp b/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
index aa45076bf2f7..8aaae7772df2 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
+++ b/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
@@ -6,6 +6,7 @@
 //
 //===----------------------------------------------------------------------===//
 
+#include <climits>
 #include <cstdlib>
 
 #include "DWARFASTParser.h"
@@ -2494,8 +2495,9 @@ struct MemberAttributes {
   DWARFFormValue encoding_form;
   /// Indicates the byte offset of the word from the base address of the
   /// structure.
-  uint32_t member_byte_offset;
+  uint32_t member_byte_offset = UINT32_MAX;
   bool is_artificial = false;
+  bool is_declaration = false;
 };
 
 /// Parsed form of all attributes that are relevant for parsing Objective-C
@@ -2668,8 +2670,6 @@ DiscriminantValue &VariantPart::discriminant() { return this->_discriminant; }
 MemberAttributes::MemberAttributes(const DWARFDIE &die,
                                    const DWARFDIE &parent_die,
                                    ModuleSP module_sp) {
-  member_byte_offset = (parent_die.Tag() == DW_TAG_union_type) ? 0 : UINT32_MAX;
-
   DWARFAttributes attributes = die.GetAttributes();
   for (size_t i = 0; i < attributes.Size(); ++i) {
     const dw_attr_t attr = attributes.AttributeAtIndex(i);
@@ -2697,6 +2697,9 @@ MemberAttributes::MemberAttributes(const DWARFDIE &die,
       case DW_AT_data_bit_offset:
         data_bit_offset = form_value.Unsigned();
         break;
+      case DW_AT_declaration:
+        is_declaration = form_value.Boolean();
+        break;
       case DW_AT_data_member_location:
         if (form_value.BlockData()) {
           Value initialValue(0);
@@ -2935,10 +2938,11 @@ void DWARFASTParserClang::ParseSingleMember(
   if (class_is_objc_object_or_interface)
     attrs.accessibility = eAccessNone;
 
-  // Handle static members, which is any member that doesn't have a bit or a
-  // byte member offset.
-  if (attrs.member_byte_offset == UINT32_MAX &&
-      attrs.data_bit_offset == UINT64_MAX) {
+  // Handle static members, which is any member that's a non-defining declaration.
+  // GCC never emits a DW_AT_data_member_location for members of a union, so
+  // we can't use it as a heuristic to determine whether a data member is static
+  // or not.
+  if (attrs.is_declaration) {
     Type *var_type = die.ResolveTypeUID(attrs.encoding_form.Reference());

@clayborg @jingham Any preference on what to do here? Looks like GCC still isn't attaching DW_AT_data_member_location to non-static union members (whereas clang does).

CC @dwblaikie @adrian-prantl @labath because DWARF

@Michael137
Copy link
Member Author

Michael137 commented Oct 4, 2023

As an aside, do we need to emit the DW_AT_data_member_location for union members in clang at all? If they are all going to be 0 anyway?

@Michael137
Copy link
Member Author

Michael137 commented Oct 4, 2023

Hmmm looks like DWARFv5 says we should emit static data members as DW_TAG_variable instead of DW_TAG_member?

5.7.7 Class Variable Entries
A class variable (“static data member” in C++) is a variable shared by all
instances of a class. It is represented by a debugging information entry with the
tag DW_TAG_variable.

That would make it much easier to differentiate static vs non-static members. Don't think clang does this yet (gcc does)

@Michael137
Copy link
Member Author

Michael137 commented Oct 4, 2023

So to summarise:

  1. Could we use DW_AT_declaration instead of DW_AT_external/DW_AT_member_location as a heuristic for whether a member is a static? This would avoid the anonymous namespace problem with GCC. If so, then we can also stop doing the default-initialisation of the member_byte_offset
  2. Should clang emit DW_TAG_variable for static data members in DWARFv5 and make LLDB understand this?
  3. Should clang emit the DW_AT_data_member_location for union members at all if they are all going to be 0 anyway? I guess the space savings of this wouldn't warrant the complexity for consumers, although that would align with what gcc is doing

@adrian-prantl
Copy link
Collaborator

Should clang emit DW_TAG_variable for static data members in DWARFv5 and make LLDB understand this?

If the spec says so, I think that's an easy yes. You might need to make the behavior adjustable for some debugger tunings (though I'd argue the default should emit standards-conformant DWARF).

@adrian-prantl
Copy link
Collaborator

adrian-prantl commented Oct 4, 2023

CC @dwblaikie

@dwblaikie
Copy link
Collaborator

Summary is, if I'm reading this right:

  • LLDB used to detect static members by the presence of DW_AT_external, but that's not correct/portable, you could have non-external static members if they were members of a non-linkage class, such as a class in an anonymous namespace
  • LLDB then started using the presence of DW_AT_data_member_offset to differentiate static and non-static member variables
  • LLDB also started assuming members of a union have a DW_AT_data_member_offset of zero, because GCC emits union non-static members without a DW_AT_data_member_offset (since they'd all be zero anyway) - unclear if that's to-spec or not? shrug
  • That broke static-member detection inside unions because all the members got an implicit DW_AT_data_member_offset

So we're discussing changing teh way clang emits static members to use DW_TAG_variable instead of DW_TAG_member to make it less ambiguous/solve the above issues "once and for all" fingers crossed

And you've tagged me as a heads up that we might be/are going to go in this direction/check I'm cool with it - yeah, sounds good to me. Doing things more clearly, and if that happens to be "more like GCC" I'm generally for it - they're still the bigger fish in the DWARF sea, and it's usually a safe direction to head unless there's extenuating circumstances.

At least our internal use ships pretty close to LLDB, so we don't have a particular need for some clang mode that's deeply backwards compatible (the inverse is somewhat true - be good if LLDB didn't just totally give up on doing some of the old heuristics - still able to debug old binaries as best it can, which seems like it wouldn't be too expensive/problematic to maintain for a while at least) & this shouldn't present a GDB compatibility issue, since it'll be closer to what GCC produces anyway.

@Michael137
Copy link
Member Author

Michael137 commented Oct 4, 2023

Summary is, if I'm reading this right:

  • LLDB used to detect static members by the presence of DW_AT_external, but that's not correct/portable, you could have non-external static members if they were members of a non-linkage class, such as a class in an anonymous namespace
  • LLDB then started using the presence of DW_AT_data_member_offset to differentiate static and non-static member variables
  • LLDB also started assuming members of a union have a DW_AT_data_member_offset of zero, because GCC emits union non-static members without a DW_AT_data_member_offset (since they'd all be zero anyway) - unclear if that's to-spec or not? shrug
  • That broke static-member detection inside unions because all the members got an implicit DW_AT_data_member_offset

So we're discussing changing teh way clang emits static members to use DW_TAG_variable instead of DW_TAG_member to make it less ambiguous/solve the above issues "once and for all" fingers crossed

And you've tagged me as a heads up that we might be/are going to go in this direction/check I'm cool with it - yeah, sounds good to me. Doing things more clearly, and if that happens to be "more like GCC" I'm generally for it - they're still the bigger fish in the DWARF sea, and it's usually a safe direction to head unless there's extenuating circumstances.

At least our internal use ships pretty close to LLDB, so we don't have a particular need for some clang mode that's deeply backwards compatible (the inverse is somewhat true - be good if LLDB didn't just totally give up on doing some of the old heuristics - still able to debug old binaries as best it can, which seems like it wouldn't be too expensive/problematic to maintain for a while at least) & this shouldn't present a GDB compatibility issue, since it'll be closer to what GCC produces anyway.

Yup! those are exactly the issues

Alright sounds like we'll resolve this by simply implementing the DWARFv5 "DW_TAG_variable for static data members" feature. LLDB should mostly work out of the box with that.

To avoid crashing on older DWARF i think we could add a best effort attempt to detect the cases where we might have a static data member in a union and do the right thing. E.g., by adding back the check for DW_AT_external. These two solutions won't conflict with each other.

Michael137 added a commit to Michael137/llvm-project that referenced this issue Oct 5, 2023
…ic data members

**Background**

Prior to DWARFv4, there was no clear normative text on how to handle
static data members. Non-normative text suggested we compilers should
use `DW_AT_external` to mark static data members of structrues/unions.
Clang does this consistently. However, GCC doesn't, e.g., when the
structure/union is in an anonymous namespace (which is C++ standard
conformant). Additionally, GCC never emits `DW_AT_data_member_location`s
for union members (regardless of storage linkage and storage duration).

Since DWARFv5 (issue 161118.1), static data members get emitted as
`DW_TAG_variable`.

LLDB used to differentiate between static and non-static members by
checking the `DW_AT_external` flag and the absence of
`DW_AT_data_member_location`. With D18008 LLDB started to pretend
that union members always have a `0` `DW_AT_data_member_location`
by default (because GCC never emits these locations).

In D124409 LLDB stopped checking the `DW_AT_external` flag to account
for the case where GCC doesn't emit the flag for types in anonymous
namespaces; instead we only check for presence of
`DW_AT_data_member_location`s.

The combination of these changes then meant that LLDB would never
correctly detect that a union has static data members.

**Solution**

Instead of unconditionally initializing the `member_byte_offset` to `0`
specifically for union members, this patch proposes to check for both
the absence of `DW_AT_data_member_location` and `DW_AT_declaration`,
which consistently gets emitted for static data members on GCC and
Clang.

We initialize the `member_byte_offset` to `0` anyway if we determine
it wasn't a static. So removing the special case for unions makes this
code simpler to reason about.

Long-term, we should just use DWARFv5's new representation for static
data members.

Fixes llvm#68135
@Michael137
Copy link
Member Author

Proposed solution to handle pre-DWARFv5 here: #68300

Michael137 added a commit that referenced this issue Oct 6, 2023
…ic data members (#68300)

**Background**

Prior to DWARFv4, there was no clear normative text on how to handle
static data members. Non-normative text suggested that compilers should
use `DW_AT_external` to mark static data members of structrues/unions.
Clang does this consistently. However, GCC doesn't, e.g., when the
structure/union is in an anonymous namespace (which is C++ standard
conformant). Additionally, GCC never emits `DW_AT_data_member_location`s
for union members (regardless of storage linkage and storage duration).

Since DWARFv5 (issue 161118.1), static data members get emitted as
`DW_TAG_variable`.

LLDB used to differentiate between static and non-static members by
checking the `DW_AT_external` flag and the absence of
`DW_AT_data_member_location`. With
[D18008](https://reviews.llvm.org/D18008) LLDB started to pretend that
union members always have a `0` `DW_AT_data_member_location` by default
(because GCC never emits these locations).

In [D124409](https://reviews.llvm.org/D124409) LLDB stopped checking the
`DW_AT_external` flag to account for the case where GCC doesn't emit the
flag for types in anonymous namespaces; instead we only check for
presence of `DW_AT_data_member_location`s.

The combination of these changes then meant that LLDB would never
correctly detect that a union has static data members.

**Solution**

Instead of unconditionally initializing the `member_byte_offset` to `0`
specifically for union members, this patch proposes to check for both
the absence of `DW_AT_data_member_location` and `DW_AT_declaration`,
which consistently gets emitted for static data members on GCC and
Clang.

We initialize the `member_byte_offset` to `0` anyway if we determine it
wasn't a static. So removing the special case for unions makes this code
simpler to reason about.

Long-term, we should just use DWARFv5's new representation for static
data members.

Fixes #68135
Michael137 added a commit to Michael137/llvm-project that referenced this issue Oct 6, 2023
…ic data members (llvm#68300)

**Background**

Prior to DWARFv4, there was no clear normative text on how to handle
static data members. Non-normative text suggested that compilers should
use `DW_AT_external` to mark static data members of structrues/unions.
Clang does this consistently. However, GCC doesn't, e.g., when the
structure/union is in an anonymous namespace (which is C++ standard
conformant). Additionally, GCC never emits `DW_AT_data_member_location`s
for union members (regardless of storage linkage and storage duration).

Since DWARFv5 (issue 161118.1), static data members get emitted as
`DW_TAG_variable`.

LLDB used to differentiate between static and non-static members by
checking the `DW_AT_external` flag and the absence of
`DW_AT_data_member_location`. With
[D18008](https://reviews.llvm.org/D18008) LLDB started to pretend that
union members always have a `0` `DW_AT_data_member_location` by default
(because GCC never emits these locations).

In [D124409](https://reviews.llvm.org/D124409) LLDB stopped checking the
`DW_AT_external` flag to account for the case where GCC doesn't emit the
flag for types in anonymous namespaces; instead we only check for
presence of `DW_AT_data_member_location`s.

The combination of these changes then meant that LLDB would never
correctly detect that a union has static data members.

**Solution**

Instead of unconditionally initializing the `member_byte_offset` to `0`
specifically for union members, this patch proposes to check for both
the absence of `DW_AT_data_member_location` and `DW_AT_declaration`,
which consistently gets emitted for static data members on GCC and
Clang.

We initialize the `member_byte_offset` to `0` anyway if we determine it
wasn't a static. So removing the special case for unions makes this code
simpler to reason about.

Long-term, we should just use DWARFv5's new representation for static
data members.

Fixes llvm#68135

(cherry picked from commit f74aaca)
Michael137 added a commit to Michael137/llvm-project that referenced this issue Oct 6, 2023
…ic data members (llvm#68300)

**Background**

Prior to DWARFv4, there was no clear normative text on how to handle
static data members. Non-normative text suggested that compilers should
use `DW_AT_external` to mark static data members of structrues/unions.
Clang does this consistently. However, GCC doesn't, e.g., when the
structure/union is in an anonymous namespace (which is C++ standard
conformant). Additionally, GCC never emits `DW_AT_data_member_location`s
for union members (regardless of storage linkage and storage duration).

Since DWARFv5 (issue 161118.1), static data members get emitted as
`DW_TAG_variable`.

LLDB used to differentiate between static and non-static members by
checking the `DW_AT_external` flag and the absence of
`DW_AT_data_member_location`. With
[D18008](https://reviews.llvm.org/D18008) LLDB started to pretend that
union members always have a `0` `DW_AT_data_member_location` by default
(because GCC never emits these locations).

In [D124409](https://reviews.llvm.org/D124409) LLDB stopped checking the
`DW_AT_external` flag to account for the case where GCC doesn't emit the
flag for types in anonymous namespaces; instead we only check for
presence of `DW_AT_data_member_location`s.

The combination of these changes then meant that LLDB would never
correctly detect that a union has static data members.

**Solution**

Instead of unconditionally initializing the `member_byte_offset` to `0`
specifically for union members, this patch proposes to check for both
the absence of `DW_AT_data_member_location` and `DW_AT_declaration`,
which consistently gets emitted for static data members on GCC and
Clang.

We initialize the `member_byte_offset` to `0` anyway if we determine it
wasn't a static. So removing the special case for unions makes this code
simpler to reason about.

Long-term, we should just use DWARFv5's new representation for static
data members.

Fixes llvm#68135

(cherry picked from commit f74aaca)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants