-
Notifications
You must be signed in to change notification settings - Fork 900
MPIR_proctable external variable not accessible in gdb in v4.0.6rc #8563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@louisespellacy-arm can you try this change to see if it resolves this issue: It did for me testing locally. |
@awlauria Works for me! Thanks for the quick response. Updated dwarf looks like:
|
@louisespellacy-arm No problem. is this also needed on the v3.* branches? |
IIRC the original fix was merged into v.3.1.x branch so it will need to go where the last fix was put. |
Thanks, opened on v3.1.x as well. |
@louisespellacy-arm Austen and I have been discussing this and trying to understand both the issue and his proposed fix. I'm concerned that we don't understand why his change fixes the issue for you. I am admittedly not a DWARF expert, so there's probably a subtlety being lost on me. I see the dwarf change, but I don't see how it affects anything in gdb. I constructed a small case which should be similar to Open MPI: ompi-mpir-proctable-issue.tar.gz It has:
If I build this, I can still If I move the definition of I do see some changes from So I guess I'm wondering: exactly what is going on here? What is the change that you need, and why? This is apparently a very subtle issue, and I'd like to understand and document it properly. |
I also see the same behavior as @jsquyres described on Power9 + rhel8.2. The dwarf output is different, but the gdb behavior remains intact. gcc version 8.3.1 |
Hi both - I am unclear from @jsquyres comment above - were you not able to reproduce the issue in gdb relating to MPIR_proctable?
|
@louisespellacy-arm we are able to reproduce in OMPI. But we are trying to understand why Also, we are unable to reproduce similar behavior in a stand-alone toy program. We're just looking for an explanation of why this works, so we can be sure the problem is actually fixed and not just masked somehow. Any input you could provide would be appreciated - for our own understanding and so we can be sure it is correctly fixed. |
@louisespellacy-arm From the toy program I attached in the above comment:
|
The reproducer is a statically built application, while in the case of OMPI we are looking at symbols pulled from a shared library. Two different cases, I would think. |
Ah, how easily we get spoiled by Libtool making shared libraries for us... I had errors in Makefile. Fixed, and now I have a proper shared ompi-mpir-proctable-issue.tar.gz
gdb behavior is still good:
|
I can confirm that just moving the struct declaration to the header makes it accesible in gdb. However, |
Hi All - I think the issue may be that I compared the I tried to replicate what I think the scenario is in:
I've tested this theory by manually adding
and re-creating I believe by adding the debug information for |
Thanks for investigating this further @louisespellacy-arm. Probably the right thing to do would be to compile orted_submit.c with -g. The easy way to do that is just to append it to CFLAGS, but that will cause all of orte to be compiled with -g. I don't think that's a bad thing, but will defer to @jsquyres |
@awlauria @jsquyres I think that your fix (moving struct MPIR_PROCDESC) is still the solution - keeping all the MPIR declarations in one library - I was just providing an explanation as to the behaviour we were seeing. |
We had PR's like #8422 to move things into a subdirectory to ensure that they get compiled with I guess I'm confused as to why compiling |
@jsquyres I could find
But as I said - I'm not a DWARF expert. I think the main issue is that we need debug information for all parts of |
That makes sense. I'm fine with merging the PR's as is. Thanks @louisespellacy-arm |
Please give me a little time to review. |
I did some more testing, this time with OMPI itself instead of a small example. I'm pretty sure that @louisespellacy-arm hit the nail on the head, above:
Meaning:
Meaning: if we move just the @louisespellacy-arm Can you verify if that's the case for you? If so, I would propose amending all the PRs to:
I'm sorry to be so pedantic. But this MPIR linker stuff is complicated and it really takes a deep dive to understand what and why. Putting comments and commit messages in there will definitely help whoever comes after us and has to maintain this MPIR stuff. Thanks! |
Sounds good, I'll amend the patch. @jsquyres |
Make sure the definition of the MPIR_Proctable is in a header file that is included in the file orted_mpir_breakpoint.c, which is compiled with -g and compiled without optimizations. Otherwise, the debugger (such as gdb) won't know the complete definition of the proctable, preventing it from being able to read it. Since the MPIR_proctable should be accessed from orted_submit.c and orted_mpir_breakpoint.c, move it to the mpir_orted.h header file. See issue: open-mpi#8563 Signed-off-by: Austen Lauria <[email protected]>
Make sure the definition of the MPIR_Proctable is in a header file that is included in the file orted_mpir_breakpoint.c, which is compiled with -g and compiled without optimizations. Otherwise, the debugger (such as gdb) won't know the complete definition of the proctable, preventing it from being able to read it. Since the MPIR_proctable should be accessed from orted_submit.c and orted_mpir_breakpoint.c, move it to the mpir_orted.h header file. See issue: open-mpi#8563 Signed-off-by: Austen Lauria <[email protected]> (cherry picked from commit a71fbaf)
Make sure the definition of the MPIR_Proctable is in a header file that is included in the file orted_mpir_breakpoint.c, which is compiled with -g and compiled without optimizations. Otherwise, the debugger (such as gdb) won't know the complete definition of the proctable, preventing it from being able to read it. Since the MPIR_proctable should be accessed from orted_submit.c and orted_mpir_breakpoint.c, move it to the mpir_orted.h header file. See issue: open-mpi#8563 Signed-off-by: Austen Lauria <[email protected]> (cherry picked from commit a71fbaf)
All PR's have been merged, closing issue. Thanks @louisespellacy-arm for reporting and your help in this. |
Thank you for taking the time to submit an issue!
Background information
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
v4.0.6rc2
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
From openmpi-v4.0.6rc2 tar gz with GCC 8.3.0 or 10.2.0 or PGI 20.1.
Please describe the system on which you are running
Details of the problem
Following changes made in #7757, MPIR_proctable is not accessible via gdb.
When comparing the DWARF output, the symbols for MPIR_proctable were previously found in libopen-rte.so with the following entries:
However, the symbols are now in libopen-orted-mpir.so as expected from change #7757 but they are all marked as external:
The resulting behaviour is that the values in the MPIR_proctable are not queriable in gdb.
The text was updated successfully, but these errors were encountered: