Skip to content

Support instruction-level debugging in pdb #103049

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
5 tasks done
gaogaotiantian opened this issue Mar 27, 2023 · 11 comments
Open
5 tasks done

Support instruction-level debugging in pdb #103049

gaogaotiantian opened this issue Mar 27, 2023 · 11 comments
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@gaogaotiantian
Copy link
Member

gaogaotiantian commented Mar 27, 2023

Feature or enhancement

Support instruction-level debugging in pdb

  • li and lli to display instructions with source code
  • current instruction display before interaction
  • si and ni command
  • test
  • documentation

Pitch

pdb could provide a better debugging experience by supporting instruction level debugging. We already have most of the utilities but we need to put them together.

The new commands will be introduced: li(listinst), lli(longlistinst), si(stepinst) and ni(nextinst) (Another candidate would be dis, which is short for import dis; dis.dis()).

li and lli will list source file with the instructions.

(Pdb) lli
  4     def f():
               0 RESUME                   0
  5         a = [1, 2, 3]
               2 BUILD_LIST               0
               4 LOAD_CONST               1 ((1, 2, 3))
               6 LIST_EXTEND              1
               8 STORE_FAST               0 (a)
  6         breakpoint()
              10 LOAD_GLOBAL              1 (NULL + breakpoint)
              20 CALL                     0
              30 POP_TOP
  7  ->     g(a)
              32 LOAD_GLOBAL              3 (NULL + g)
              42 LOAD_FAST                0 (a)
              44 CALL                     1
     -->      54 POP_TOP
              56 RETURN_CONST             0 (None)

si will step one instruction ahead and ni will stay in this frame. We have opcode event to support this.

> /home/gaogaotiantian/programs/mycpython/example.py(7)f()
-> g(a)
(Pdb) ni
> /home/gaogaotiantian/programs/mycpython/example.py(7)f()
-> g(a)
-->      42 LOAD_FAST                0 (a)
(Pdb) ni
> /home/gaogaotiantian/programs/mycpython/example.py(7)f()
-> g(a)
-->      44 CALL                     1
(Pdb) si
--Call--
> /home/gaogaotiantian/programs/mycpython/example.py(1)g()
-> def g(x):
-->       0 RESUME                   0

Previous discussion

Did not find any.

Linked PRs

@gaogaotiantian
Copy link
Member Author

I linked a prototype for the instruction display. Please let me know if we want to proceed on this feature. I can either do two PRs(I would assume si is going to be rather complicated) or do a big one.

@ambv
Copy link
Contributor

ambv commented Mar 27, 2023

We don't want a dis command as it's easy for users to add an alias if they so choose. Using .pdbrc they can define aliases that are always available.

As for working with opcodes, that's an interesting idea! But do we need a separate assembly mode? It feels to me like it would be more flexible to add new commands for this, like lo | listopcodes and llo | longlistopcodes. What do you think?

@gaogaotiantian
Copy link
Member Author

I was actually torn between the two ideas - whether to overload the existing command with a state flipper, or to add two extra commands. There are two reasons I'm leaning slightly toward the state solution:

  1. In my imagine, users either need to debug on instruction level or not. Keeping two very similar functions in the same command seems a bit clearer. It can inherit all the features/options from l and ll.
  2. With the state, we can do some other tricks like display the instruction after the breakpoint(step) in assembly mode.

I can go along with the extra commands as well, the debugger I was using for C in my prev company uses more like a state switch solution. We can only display the current instruction when the user uses si(or in the future other instruction-level commands).

Actually there's another possibility - we overload everything. In assembly mode, step means stepinstruction. I believe this is how windbgx works. If the users switched to "assembly mode", clicking "step" actually steps a single instruction. That's also one possible solution.

The reason I brought up a new command for dis is that there are equivalents in gcc(and probably other debuggers). True the user can achieve that with .pdbrc with alias, but that's true for a lot of other commands(whatis, interact, retval ...). I think the ultimate decision falls on - will this introduce more trouble for the users don't use it, or will it benefits more for the users who do.

Also I guess there are a couple of names for the instruction. disassembly is one and opcode you used. I always think opcode is referring to the actual instruction type like LOAD_GLOBAL or CACHE, whereas instruction is the full package with arguments, line number, positions and stuff. That's also how it's used in dis docs if I understand correctly.

On the side note, now that I think about it, si and ni are two different commands and we need to separate them.

@artemmukhin
Copy link
Contributor

I am excited about this proposal. Indeed, there are plenty of options for how this could be implemented. I would like to know your thoughts on how LLDB handles disassembly.

LLDB has separate commands for source and instruction level stepping:

  • n for source level single step
  • ni for instruction level single step

Additionally, LLDB provides settings to control whether to display disassembly when stopped. I took a couple of screenshots for demonstration.

By default, LLDB only shows source code when stopped, even with instruction level stepping:

no-disassembly

Although it does not display disassembly, it marks the corresponding piece of code that is being executed.

By setting settings set stop-disassembly-display always, you can examine the source code and the assembly code at the same time:

show-disassembly

Do you think pdb could provide a similar user experience? And how beneficial would such behaviour be for Python programmers?

@gaogaotiantian
Copy link
Member Author

pdb currently does not have a settings command to handle all potential settings (maybe for the next pdb we should consider that). assem would be the first state command if introduced.

I guess having assem state determine whether to step inst or line is not that great an idea on command line tools. It's nice on GUI where the button overloading has more benifits. On command line tools, I guess users would like more distinct commands.

Personally, I'd like my debugger to show some difference when I do ni. Just a note here that we can do specific code now with position. I still think displaying the current instruction after ni would be a better user experience. pdb only list a single source line, compared to lldb which has more context. So adding another instruction line would be cheaper(screen space wise) than lldb. Also, it we do not have state, we won't have a choice - so it's either with or without. I'd lean forward to with.

As for the benefits, I'm not sure. I would guess most of the Python users debug their program with print(). Even among the people who are using pdb, most of them probably are not familiar with bytecodes. However, the number of users of Python is so large that if a small portion of a small portion is a significant number. At least for me, I often want to see the actual compiled bytecode of the function to see what's really going on there. (Thus the thought of dis, which would be super convenient).

I'm totally fine with li and lli instead of assem. Just want to hear more from the actual users.

@gaogaotiantian
Copy link
Member Author

Dear nosy:

I've finished a draft for the implementation. I decided to go no-state. Separate commands for all instruction-related stuff. So, li, lli, si and ni. Also when you do si and ni, the current instruction will be displayed in the prompt.

Anyone has suggestions/questions on this? Once the implementation is reviewed and confirmed, I can work on the tests and docs.

@artemmukhin
Copy link
Contributor

Overall, this approach looks good to me! I agree that it better suits CLI than having a separate assembly mode. I also like that ni and si show me both the current source line and the current instruction.

One more thing to consider is the terminal width. Disassembly output can be wide, at least because of the full file paths:

(Pdb) li
  1  ->	def foo(a, b):
     -->       2 LOAD_CONST               0 (<code object foo at 0x101651210, file "/very/long/full/path/to/foo.py", line 1>)

However, I do not have a particular idea of how to approach that properly.

@gaogaotiantian
Copy link
Member Author

One more thing to consider is the terminal width. Disassembly output can be wide, at least because of the full file paths:

Unfortunately, this piece is using the internal function of dis directly, which will provide a familiar experience to dis users (assuming most of the people interested in instructions are using dis). dis has similar issues and I guess there's no perfect way to deal with it.

@gaogaotiantian
Copy link
Member Author

Hi all, the feature, test and docs are all finished and ready for review now.

@SonOfLilit
Copy link

SonOfLilit commented Jul 5, 2023

This is very technically cool, but I can't think of any use cases for end users and it's a lot of code. (Am I just not creative enough?)

I can, however, imagine that core devs working on the Python bytecode compiler will get benefit from it. Maybe we should get buy in from some potential users before complicating Bdb?

@gaogaotiantian
Copy link
Member Author

This is very technically cool, but I can't think of any use cases for end users and it's a lot of code. (Am I just not creative enough?)

I can, however, imagine that core devs working on the Python bytecode compiler will get benefit from it. Maybe we should get buy in from some potential users before complicating Bdb?

Python has f_trace_opcodes, so the ability to trace and debug opcode (instruction) is natural. The end user of Python is a large base, I for example, often need the capability to debug on instruction level. It's also very helpful to debug CPython itself.

All I'm saying is, there are different needs for different developers - for a lot of developers, they do not even use debuggers,. Having a working solution for debugging instructions does not make pdb worse. And yes, there are more code in bdb, but most of them are on isolated paths that are very specific to instruction tracing. It does not impact the current bdb responsibilities.

For use cases, there is a very common pattern in Python - one-liners. Often they consist of multiple expressions. As of now, pdb can only execute it as a full line, and the debuggability within the line is horrible. Having an instruction-level debugging in pdb would solve that.

However, with the new PEP 669, this work is blocked by the implementation of #103615 , which I'm also responsible for. So this feature will only be visited later.

@iritkatriel iritkatriel added the stdlib Python modules in the Lib dir label Nov 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

5 participants