-
Notifications
You must be signed in to change notification settings - Fork 13.3k
[WIP] impl FromIterator for Option/Result via scan #59605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
As I noted in the description, I included micro-benchmarks for the old implementation. This makes it really easy to see the performance regression that this re-implementation currently introduces: just run the benchmark (
As you can see, when collecting into a When you collect into That's why I've marked this PR a WIP: I don't want to blindly commit this change in the name of "code simplification" without seeing evidence that the microbenchmarks proposed in this PR are irrelevant. Nonetheless, I still posted the PR itself (rather than abandoning these bits of code entirely). I did this for three main reasons:
Lines 1341 to 1342 in 6315221
Lines 1235 to 1236 in 6315221
|
Some interesting numbers! @pnkfelix have you run a profiler to see if there's any particular hot spots in the new implementation that weren't in the old one? If it requires LTO to be turned on to be fast that probably means that something performance critical isn't getting inlined across crates and requires |
@shepmaster ran a profiler early on in the investigation and found some "interesting" codegen in the hotspot: #11084 (comment) I myself haven't run a profiler, not yet. I'll give it a quick whirl. |
Running the profiler on the benchmark
|
I think you mean @dotdash, but I'm happy you thought of me ❤️ |
Ok thanks! That looks like a pretty reasonable trace, without much to illuminate. I wonder thought if you could gist a version that's a profile of what's there today? That code looks relatively optimal (no extraneous function calls at least) but it may be the case that the old version vectorized better or something like that |
What are you asking me to gist here; the analogous |
☔ The latest upstream changes (presumably #59632) made this pull request unmergeable. Please resolve the merge conflicts. |
Oh sure yeah, if I'm basically just curious at the assembly level what the differences are to help understand why the new version is slower than the old |
Okay here are some more complete transcriptions of the https://gist.github.com/pnkfelix/1b54b3272201d9f096a2289fd5712b52 Since I've taken the effort to transcribe the full machine code provided by |
Ok thanks! Unfortunately nothing obviously jumps out at me, so it seems like it's just inherently more branchy in the version in this PR for whatever reason, but without digging into the LLVM IR and such I wouldn't know why |
A possible thought: The general Line 1929 in 9ebf478
You might try flipping that to a rust/src/libcore/iter/adapters/mod.rs Lines 1668 to 1673 in 9ebf478
And if that doesn't work, scan is only overriding |
I went ahead and made changes based on this advice, and it does help
That's enough to convince me that we might be able land this change to the Of course, it also requires that someone review my revisions to |
(This is an attempt to ensure we do not regress performance here, since the performance of this operation varied pretty wildly of the course of rust-lang#11084.)
…end_desugared`. This makes use of specialized Iterator methods (when available).
…old`. (It is easier to subsequently optimize this body, rather than starting from `Scan::try_fold`.)
c3fdf53
to
b210413
Compare
(hmm, after a rebase, I am now seeing stack overflows with this PR applied. Marking WIP again.) |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
ping from triage @pnkfelix any updates? |
I have higher priority items to attack in the near term, and there isn't much clear value provided here anyway. Closing PR. |
…scottmcm Refactoring use common code between option, result and accum `Option` and `Result` have almost exactly the same code that in `accum.rs` that implement `Sum` and `Product`. This PR just move some code to use the same code for all of them. I believe is better to not implement this `Iterator` feature twice. I'm not very familiar with pub visibility hope I didn't make then public. However, maybe these adapters could be useful and we could think to make then pub. rust-lang#59605 rust-lang#11084 r? @pnkfelix
This PR consists of three main things:
FromIterator
forOption
andResult
that uses thescan
method to do the bulk of the work rather than the specialized adapter struct that the old implementation used.FromIterator for Result
in order to measure the performance of this operation, in order to ensure that this change (or other future changes) do not cause this operation to slow down significantly.Vec::extend
andIterator::scan
in order to address performance issues uncovered by above micro-benchmark.cc #11084
Some (lightly edited) notes from the original PR post follow, but I have removed three of the four original benchmarks (you can find more about them in #11084).
The PR was initially marked WIP, because in my experiments on my Linux desktop machine, even when I compile with
optimize=true
,debug=false
,codegen-units=1
andincremental=false
, I still see performance regression on this particular micro-benchmark.-C lto=thin
in order to get competitive performance out of the "with_scan" implementation. The default with codegen-units=1 is suboptimal;-C lto=thin
gives you "whole crate graph" LTO.Anyway, the micro-benchmarks added here include an explicit encoding of the adapter-based implementation of
FromIterator for Result
, so that one can see how the new implementation compares out of the box (that is, without enabling ThinLTO for codegen-units=1 on the bootstrapped benchmark build).