-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
GH-116017: Put JIT code and data on the same page #116845
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This also fixes an issue where we weren't using the correct alignment during parts of the build process and things just accidentally worked (due to the data starting on a page boundary). |
I realise it probably makes sense to benchmark this one on macOS (since that's where we saw the greatest memory increase in #116017). I'll fire off a run right now. |
Oh, I already did. :) Results were less good: 1% faster, no memory impact (which is a bit odd, but I haven't really dug into it yet). |
@brandtbucher I may be missing something here: is this a reduction in security? Has this been discussed with the security team? |
I should clarify: this change doesn't mean that we're now executing arbitrary user data. In this case, "data" is stuff like C string literals for error messages, pointers to cached objects, static C function addresses, version numbers, opargs, etc. Since this data is now executable (as it lives on the same pages as the machine code), in theory an attacker capable of forcing the compilation of an instruction operand that happens to have the same encoding as a machine instruction could take control of the program. However, that would require that we jump into the data at some point, something which never happens during normal operation. As such, this would require a separate, highly specific bug in machine code generation in order to actually exploit. The most likely way this would happen is not by jumping from the machine code into the data, but rather accidentally running off the end of the machine code into the data portion of the buffer (again, not something that happens with the sort of well-formed traces that tier two creates). This situation currently crashes on I appreciate your comment, though. Please let me know if I've missed anything, or if you still have any concerns. |
@brandtbucher: Just FYI: a 20% reduction in memory usage on macOS on the benchmarking suite: https://github.com/faster-cpython/benchmarking-public/blob/main/results/bm-20240315-3.13.0a5%2B-e6d8e6d-JIT/bm-20240315-darwin-arm64-brandtbucher-justin_mprotect-3.13.0a5%2B-e6d8e6d-vs-base-mem.png |
Awesome! Wonder why it didn't come through in my first run... |
The compiler on that machine was still broken when you kicked it off yesterday -- I didn't truly fix it until this morning. |
It's also a solid 1-3% faster on macOS. |
Oh, I see you mean the run from 5 days ago. Yeah, it's not clear why it shows "no change". Something to keep an eye on for the future in case there is a bug somewhere in the benchmarking infra (but not clear what it would be). |
Instead of marking code as read-exec and data as read-only, put both on the same read-exec pages. This halves the amount of memory used for short traces, and also results in half as many expensive
mprotect
calls per trace.3% faster (10% faster startup), 9% less memory (12% less at startup).
The next step after this will be sharing pages between traces, to further reduce the amount of wasted memory. At the same time, we can expose a way to compile multiple traces at once with a single
mprotect
call (useful for the cold exit array).