-
-
Notifications
You must be signed in to change notification settings - Fork 32k
Quick test to see if removing the tier2 interpreter speeds up tier1 #117908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… new configure flag
The failure on WASI of |
Benchmark results are in:
Talked it over with @brandtbucher. We found:
|
I've fixed the two issues we discovered and am firing off a new benchmark run, only for Win64, without pystats. Since I didn't move the point where the branch forks off main, I'm not sure if this will produce the right results, but it's the thing I'm most curious about. Honestly, the |
Here are the Windows benchmark results. It is now only 1% slower than base -- but still, surprisingly, slower. According to the time plot, both richards and richards_super are 9% slower. What's up with that??? This occurs only on Windows; it was present on the last benchmark too (before I disabled the no-optimization pragma). |
Those Windows results are puzzling, but we know there is just generally more noise in the Windows results. Maybe this is a fool's errand, but we could try running them again to get a sense of the range of noise in the results? |
Hm, both runs showed similar results (in particular, both Richards being much slower). Mark's hunch is that by making the function larger, we disabled some compiler optimization that was actually a pessimization. While I'm still curious if there's a smoking gun in Richards, I am leaning towards not merging this PR. |
Yes, a bit frustratingly, it's not a clear win. |
I personally still think that it makes sense to make the tier two interpreter a build-time option (like the JIT) instead of a run-time option. Even if the perf impact is negligible, realistically the only people who are using tier two are building their own interpreter anyways, and it simplifies a lot of tricky edge-cases when testing tier two (for example, with subprocesses not picking up the environment variable or command-line flag). I'm not sure that we have a good argument for shipping all of tier two when pretty much nobody will be using it. |
Agreed. Then maybe I can put more stuff inside |
If we were to go ahead with that idea, the I'm not sure if I care to do all that work, honestly. |
Latest Windows benchmark: 0% faster. Richards is no longer special. (What changed: this no longer updates the bitmask/counter for |
I think I'd like to rethink this after gh-116970 (moving the Tier 2 interpreter out into its own function). I'd like to almost completely skip the |
Because this is just a quick test, I used a hack to define the crucial env var,
_Py_NOTIER2
, by hard-coding it in Makefile.pre.in. If this shows enough of an improvement for Tier 1 that we'd like to have this option, that hack should be replaced with a new configure option. (Adding configure options looks like black magic to me, and IIRC requires using a Docker image to build, so I'm putting that off until after we've seen benchmark results.)Because of that hack, this fails all the JIT tests. I'll just be ignoring that.
There's also a weird test failure in the WASI build, which I'll also ignore unless we decide to move ahead with this.