Skip to content

JIT: Optimize common C calls via replication #133020

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Fidget-Spinner opened this issue Apr 26, 2025 · 4 comments
Open

JIT: Optimize common C calls via replication #133020

Fidget-Spinner opened this issue Apr 26, 2025 · 4 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage topic-JIT type-feature A feature request or enhancement

Comments

@Fidget-Spinner
Copy link
Member

Fidget-Spinner commented Apr 26, 2025

Feature or enhancement

Proposal:

Currently, a C calls is executed roughly as follows in the specializing interpreter:

(py_func->m_ml->ml_meth)(...args)

This has two sources of overhead:

  1. Double pointer lookup while cheap is still something we can remove.
  2. JIT cannot inline the function without PGO (which the JIT currently does not have, and will probably never have).

We can optimize it to the following in the JIT:

PyCFunction cfunc = LOOKUP_TABLE[1..n]; // Via replicate(n)
DEOPT_IF(cfunc != py_func->m_ml->ml_meth);
cfunc(...args);

LOOKUP_TABLE will be populated with common C functions that we know Python code uses.
This will remove the overhead of 2. Allowing the JIT to inline and optimize these calls.

If we want, there's an even more extreme optimization we could do. We could just burn in the C function directly and call it. saving the overhead of 1. However, I don't think this could be done without breaking strange usages of ml_meth where it's dynamically set. So I would be more cautious here with that.

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

No response

@Fidget-Spinner Fidget-Spinner added type-feature A feature request or enhancement topic-JIT labels Apr 26, 2025
@Fidget-Spinner
Copy link
Member Author

This doesn't work with the current copy and patch because calls to external functions are treated as holes. @brandtbucher are there any plans to make inlining work with functions that are external relocations? Would LTO work (I applied your LTO patch, but it seems to not be generating anything better).

@picnixz picnixz added performance Performance or resource usage interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Apr 26, 2025
@brandtbucher
Copy link
Member

I'm certainly interested in being able to inline stuff into the stencils, but I don't have any obvious way to do it yet.

@markshannon
Copy link
Member

This looks similar to faster-cpython/ideas#660.

@Fidget-Spinner
Copy link
Member Author

Oh yeah it is

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage topic-JIT type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

4 participants