-
-
Notifications
You must be signed in to change notification settings - Fork 32k
gh-114863: What's new in Python 3.13: JIT compiler #114862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
47a682f
2ff848a
b6a94a4
15b4cea
a77c76e
4f4d4ce
22b0565
63fc77b
a318554
68d60e4
18c4ba8
2312af0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -484,45 +484,123 @@ Optimizations | |
FreeBSD and Solaris. See the ``subprocess`` section above for details. | ||
(Contributed by Jakub Kulik in :gh:`113117`.) | ||
|
||
|
||
|
||
.. _whatsnew313-jit-compiler: | ||
|
||
|
||
Experimental JIT Compiler | ||
========================= | ||
|
||
When CPython is configured using the ``--enable-experimental-jit`` option, | ||
a just-in-time compiler is added which can speed up some Python programs. | ||
:Editor: Guido van Rossum, Ken Jin | ||
|
||
When CPython is configured using the ``--enable-experimental-jit`` build-time | ||
option, a just-in-time compiler is added which can speed up some Python | ||
programs. The internal architecture is roughly as follows. | ||
|
||
The internal architecture is roughly as follows. | ||
|
||
* We start with specialized *Tier 1 bytecode*. | ||
See :ref:`What's new in 3.11 <whatsnew311-pep659>` for details. | ||
Intermediate Representation | ||
--------------------------- | ||
|
||
* When the Tier 1 bytecode gets hot enough, it gets translated | ||
to a new, purely internal *Tier 2 IR*, a.k.a. micro-ops ("uops"). | ||
We start with specialized *Tier 1 bytecode*. | ||
See :ref:`What's new in 3.11 <whatsnew311-pep659>` for details. | ||
|
||
* The Tier 2 IR uses the same stack-based VM as Tier 1, but the | ||
instruction format is better suited to translation to machine code. | ||
When the Tier 1 bytecode gets hot enough, the interpreter creates | ||
straight-line sequences of bytecode known as "traces", and translates that | ||
to a new, purely internal *Tier 2 IR*, a.k.a. micro-ops ("uops"). | ||
These straight-line sequences can cross function call boundaries, | ||
allowing more effective optimizations, listed in the next section. | ||
|
||
* We have several optimization passes for Tier 2 IR, which are applied | ||
before it is interpreted or translated to machine code. | ||
The Tier 2 IR uses the same stack-based VM as Tier 1, but the | ||
instruction format is better suited to translation to machine code. | ||
|
||
* There is a Tier 2 interpreter, but it is mostly intended for debugging | ||
the earlier stages of the optimization pipeline. If the JIT is not | ||
enabled, the Tier 2 interpreter can be invoked by passing Python the | ||
``-X uops`` option or by setting the ``PYTHON_UOPS`` environment | ||
variable to ``1``. | ||
(Tier 2 IR contributed by Mark Shannon and Guido van Rossum.) | ||
|
||
* When the ``--enable-experimental-jit`` option is used, the optimized | ||
Tier 2 IR is translated to machine code, which is then executed. | ||
This does not require additional runtime options. | ||
|
||
* The machine code translation process uses an architecture called | ||
*copy-and-patch*. It has no runtime dependencies, but there is a new | ||
build-time dependency on LLVM. | ||
Optimizations | ||
------------- | ||
|
||
We have several optimization and analysis passes for Tier 2 IR, which | ||
are applied before Tier 2 IR is interpreted or translated to machine code. | ||
These optimizations take unoptimized Tier 2 IR and produce optimized Tier 2 | ||
IR: | ||
|
||
* This section is non-exhaustive and will be updated with further | ||
optimizations, until CPython 3.13's beta release. | ||
|
||
* Type propagation -- through forward | ||
`data-flow analysis <https://clang.llvm.org/docs/DataFlowAnalysisIntro.html>`_, | ||
we infer and deduce information about types. | ||
|
||
* Constant propagation -- through forward data-flow analysis, we can | ||
evaluate in advance bytecode which we know operate on constants. | ||
|
||
* Guard elimination -- through a combination of constant and type information, | ||
we can eliminate type checks and other guards associated with operations. | ||
These guards validate specialized operations, but add a slight bit of | ||
overhead. For example, integer addition needs a type check that checks | ||
both operands are integers. If we know that a integer guards' operands | ||
are guaranteed to be integers, we can safely eliminate it. | ||
|
||
* Loop splitting -- after the first iteration, we gain a lot more type | ||
information. Thus, we peel the first iteration of loops to produce | ||
an optimized body that exploits this additional type information. | ||
This also achieves a similar effect to an optimization called | ||
loop-invariant code motion, but only for guards. | ||
|
||
* Globals to constant promotion -- global value loads become constant | ||
loads, speeding them up and also allowing for more constant propagation. | ||
Comment on lines
+551
to
+552
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IIUC this work is independent from the data-flow analysis part. It relies on dictionary watchers, which deserve a separate mention if they aren't already mentioned. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. dictionary watchers were implemented in 3.12, not 3.13. So it's already mentioned there. |
||
This work relies on dictionary watchers, implemented in 3.12. | ||
(Contributed by Mark Shannon in :gh:`113710`.) | ||
|
||
(Tier 2 optimizer contributed by Ken Jin and Mark Shannon, | ||
with implementation help by Guido van Rossum. Special thanks | ||
to Manuel Rigger.) | ||
|
||
|
||
Execution Engine | ||
---------------- | ||
|
||
There are two execution engines for Tier 2 IR: | ||
the Tier 2 interpreter and the Just-in-Time (JIT) compiler. | ||
|
||
The Tier 2 interpreter is mostly intended for debugging | ||
the earlier stages of the optimization pipeline. If the JIT is not | ||
enabled, the Tier 2 interpreter can be invoked by passing Python the | ||
``-X uops`` option or by setting the ``PYTHON_UOPS`` environment | ||
variable to ``1``. | ||
|
||
The second is the JIT compiler. When the ``--enable-experimental-jit`` | ||
Fidget-Spinner marked this conversation as resolved.
Show resolved
Hide resolved
|
||
build-time option is used, the optimized Tier 2 IR is translated to machine | ||
code, which is then executed. This does not require additional | ||
runtime options. | ||
|
||
The machine code translation process uses a technique called | ||
*copy-and-patch*. It has no runtime dependencies, but there is a new | ||
build-time dependency on `LLVM <https://llvm.org>`_. | ||
The main benefit of this technique is | ||
fast compilation, reported as orders of magnitudes faster versus | ||
traditional compilation techniques in the paper linked below. The code | ||
Fidget-Spinner marked this conversation as resolved.
Show resolved
Hide resolved
|
||
produced is slightly less optimized, but suitable for a baseline JIT | ||
compiler. Fast compilation is critical to reduce the runtime overhead | ||
of the JIT compiler. | ||
|
||
(Copy-and-patch JIT compiler contributed by Brandt Bucher, | ||
directly inspired by the paper | ||
`Copy-and-Patch Compilation <https://fredrikbk.com/publications/copy-and-patch.pdf>`_ | ||
by Haoran Xu and Fredrik Kjolstad. For more information, | ||
`a talk <https://youtu.be/HxSHIpEQRjs?si=RwC78FcXrThIgFmY>`_ by Brandt Bucher | ||
is available.) | ||
|
||
|
||
Results and Future Work | ||
----------------------- | ||
|
||
The final performance results will be published here before | ||
CPython 3.13's beta release. | ||
|
||
(JIT by Brandt Bucher, inspired by a paper by Haoran Xu and Fredrik Kjolstad. | ||
Tier 2 IR by Mark Shannon and Guido van Rossum. | ||
Tier 2 optimizer by Ken Jin.) | ||
The JIT compiler is rather unoptimized, and serves as the foundation | ||
for significant optimizations in future releases. | ||
|
||
|
||
Deprecated | ||
|
Uh oh!
There was an error while loading. Please reload this page.