JIT: Optimize common C calls via replication

# Feature or enhancement

### Proposal:

Currently, a C calls is executed roughly as follows in the specializing interpreter:
```
(py_func->m_ml->ml_meth)(...args)
```
This has two sources of overhead:
1. Double pointer lookup while cheap is still something we can remove.
2. JIT cannot inline the function without PGO (which the JIT currently does not have, and will probably never have).

We can optimize it to the following in the JIT:
```
PyCFunction cfunc = LOOKUP_TABLE[1..n]; // Via replicate(n)
DEOPT_IF(cfunc != py_func->m_ml->ml_meth);
cfunc(...args);
```
`LOOKUP_TABLE` will be populated with common C functions that we know Python code uses.
This will remove the overhead of 2. Allowing the JIT to inline and optimize these calls.

If we want, there's an even more extreme optimization we could do. We could just burn in the C function directly and call it. saving the overhead of 1. However, I don't think this could be done without breaking strange usages of `ml_meth` where it's dynamically set. So I would be more cautious here with that.

### Has this already been discussed elsewhere?

No response given

### Links to previous discussion of this feature:

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

JIT: Optimize common C calls via replication #133020

Feature or enhancement

Proposal:

Has this already been discussed elsewhere?

Links to previous discussion of this feature:

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

JIT: Optimize common C calls via replication #133020

Description

Feature or enhancement

Proposal:

Has this already been discussed elsewhere?

Links to previous discussion of this feature:

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions