-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Grab bag of runtime optimizations #8599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Here's the old runtime on the same benchmark (the variation is pretty high between runs though - this is around the median)
|
After further consideration I think I'll keep |
We going to merge this soon? |
Naturally, and sadly, turning off sanity checks in the runtime is a noticable performance win. The particular test I'm running goes from ~1.5 s to ~1.3s. Sanity checks are turned *on* when not optimizing, or when cfg includes `rtdebug` or `rtassert`.
This makes the lock much less contended. In the test I'm running the number of times it's contended goes from ~100000 down to ~1000.
It's not a huge win but it does reduce the amount of time spent contesting the message queue when the schedulers are under load
These aren't used for anything at the moment and cause some TLS hits on some perf-critical code paths. Will need to put better thought into it in the future.
vec::unshift uses this to add elements, scheduler queues use unshift, and this was causing a lot of reallocation
I'm not comfortable turning off rtassert! yet
Attempting to gives bors a kick by temporarily closing this. (needs to wait a bors cycle to reopen) |
Alright I don't understand what bors is doing here. |
Reopened as #8734 with a no-op amend of the last commit to change the hash (commit date). Hopefully, bors won't loop on it anymore. |
fix FP in lint `[needless_match]` fixes: rust-lang#8542 fixes: rust-lang#8551 fixes: rust-lang#8595 fixes: rust-lang#8599 --- changelog: check for more complex custom type, and ignore type coercion in [`needless_match`]
Here are a bunch of of small optimizations that add up to a 36% improvement on one particular message passing benchmark.
After this and @toddaaro's optimizations from #8566 the next biggest wins are probably going to be avoiding the event loop, which is another 25%, and using a less-allocating channel implementation (not sure how much this wins but it should be a lot). Beyond that there is still the important optimization of using the stack pointer for TLS instead of the TLS API, reducing lock contention, identifying and reducing other syscalls, page faults, context switches and allocations, recycling tasks. Codegen improvements may help as well as there appears to be some nonsense in the assembly that one wouldn't write by hand.
There are a couple of notable changes here:
rtassert!
off for optimized builds, adding a new constant that can be queried before running expensive sanity checks:pub static ENFORCE_SANITY: bool = !cfg!(rtopt) || cfg!(rtdebug) || cfg!(rtassert)
.cfg(rtopt)
is turned on by makefiles.Before (just with #8566):
After (these opts + #8566):
And here is how Go does on the same benchmark:
Here's what the profile looks like after these optimizations, those in #8566, and a hacked up optimization to not hit epoll (not included in this PR):