-
-
Notifications
You must be signed in to change notification settings - Fork 647
Updated JMH, Scalaz, Clojure, and ECollections benchmark versions #1658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Current coverage is 96.43% (diff: 100%)@@ master #1658 diff @@
==========================================
Files 89 89
Lines 11123 11123
Methods 0 0
Messages 0 0
Branches 1893 1893
==========================================
Hits 10726 10726
Misses 243 243
Partials 154 154
|
Hmm, strangely it seems that the new Displaying speed ratios 2.11.8: Operation 10 100 1026
Create 7.13× 7.92× 7.70×
Head 0.79× 0.59× 0.49×
Tail 2.64× 3.49× 4.55×
Get 1.09× 1.03× 0.71×
Update 1.21× 1.07× 0.76×
Map 1.21× 1.18× 1.43×
Filter 1.47× 1.55× 1.38×
Prepend 0.09× 0.08× 0.08×
Append 0.15× 0.10× 0.08×
AppendAll 2.79× 1.52× 1.67×
GroupBy 1.25× 1.17× 0.92×
Slice 2.45× 3.13× 4.87×
Iterate 2.03× 0.88× 1.43× 2.12.0: Operation 10 100 1026
Create 6.68× 5.64× 5.77×
Head 0.79× 0.64× 0.48×
Tail 5.68× 7.67× 8.71×
Get 1.03× 1.05× 0.78×
Update 2.64× 2.72× 1.78×
Map 1.37× 1.08× 1.31×
Filter 1.22× 1.45× 1.16×
Prepend 0.18× 0.18× 0.17×
Append 0.31× 0.20× 0.20×
AppendAll 3.57× 1.87× 1.71×
GroupBy 1.52× 1.31× 1.13×
Slice 4.96× 8.52× 10.71×
Iterate 1.84× 1.42× 1.30× |
@viktorklang, tried it separately with the final Scala slice for 1026 elements: @Benchmark
public void scala_persistent(Blackhole bh) {
scala.collection.immutable.Vector<Integer> values = scalaPersistent;
while (!values.isEmpty()) {
values = values.slice(1, values.size());
values = values.slice(0, values.size() - 1);
bh.consume(values);
}
} results in 8,267.31 ops/s - 2.11.8
4,132.78 ops/s - 2.12.0 while Javaslang's speed stays: 44,629.03 ops/s |
Ping @SethTisue |
Could this have anything to do with the new default methods encoding of traits? |
@Ichoran is this expected? |
This is not what I would have expected, @SethTisue. I am not sure what is going on, but basically everything that has to create stuff is considerably slower. The JVM has a really tough job to do with keeping track of what's valid with all the pointers into different depths of the vector; maybe that optimization is affected by the trait encoding? |
Thank you Lorinc, interesting benchmarks! I hope we also can use the insights of the Scala language architects for the design of Javaslang. |
@viktorklang, @SethTisue, @Ichoran, @djspiewak, @danieldietrich, the slowdown seems to have happened in M2 changelog: http://www.scala-lang.org/news/2.12.0-M2
|
Oh, that was the GenBCode / inliner milestone! I wonder if just looking at the bytecode would show what the difference is? |
Could this be the problem: scala/scala@bb4b79c#diff-59f3462485b74027de4fd5e9febcc81bR136 ? |
Could be. I definitely benchmarked that, but I benchmarked it before the GenBCode change IIRC, and it's possible that I forgot to benchmark both with and without optimization. That's a huge penalty for two equalities that ought to be false! |
But it is 2 extra branches and method calls? Perhaps put the method size Cheers, On Nov 4, 2016 12:54 AM, "Ichoran" [email protected] wrote:
|
@viktorklang - It's marked You might be right that the method calls to fetch the builder instances are at fault. |
@viktorklang, @SethTisue, @Ichoran, @djspiewak, @danieldietrich, I've recompiled |
@paplorinc I think this is worth investigating, @Ichoran / @SethTisue is there an issue for this in the scala/scala tracker? |
I don't know if there's an issue, but I'm looking into it. So far I have not found any difference in bytecode or assembly that can explain the whole difference. (Also, in my hands the difference is more like 50%.) I did find that either switching VectorPointer to an abstract class (from a trait) or manually inlining all the code in it (so there is no default method) partially rescued the performance of :+. But that was still only partial. |
This is a tough one! The bytecode is nearly identical (aside from the obvious differences); there is a big difference in profiling gotoPosWritable1, but the bytecode of that, despite being in different places, is essentially identical. The assembly is, for some reason, a bit less efficient in 2.12, but it seems hard to believe that this is causing the entire slowdown. I'm going to try some speculative fixes, but I'm not terribly hopeful that any of them will work given that I still don't understand precisely what is causing the slowness. |
@Ichoran, @SethTisue, @viktorklang, @odersky, @djspiewak, @danieldietrich, @zsolt-donca To sum it up, comparing the e.g. GC type 2.12 / 2.11 ops/s ratios
UseParallelOldGC 78% (8228 / 10563)
UseConcMarkSweepGC 63% (7167 / 11320)
UseG1GC 49% (4695 / 9482) (See l0rinc/ScalaVectorBenchmark@918f75b) Note: There still appears to be room for optimizations, seeing that the |
Some random thoughts that may have no bearing on anything (and may have already occurred to everyone):
|
@djspiewak - I mostly have been looking at assembly; the only glaring thing I've found is a failure to convert from a method call to a field access on some instances of fetching display0. Still, it's hard to imagine that the slowdown could be so big just because of that. It doesn't seem to be an inlining thing in general. About half the performance difference disappears when one converts the VectorPointer trait to an abstract class (plus various other changes that need to happen for that to work--and I'm not certain that the trait/abstract class change is the key one). Also, about half the performance difference disappears if the VectorPointer has no default methods and the implementations are moved to the children. I assume it's the "same" half, though obviously I can't easily check. |
|
@paplorinc - Good question regarding the GC. I don't have a good answer at this time. It does suggest that I should be looking for an extra allocation in the bytecode, doesn't it? This seems relatively Vector-specific, and there isn't a broad benchmark suite. Maybe this is an indication that we should create one. What you've been doing has already been very helpful! If you have time to bisect the commit history and figure out where the slowdown first happened, that would be awesome. Looking for generalization (e.g. is HashMap affected?) might help too. Or you may have other ideas that are better than either. |
may I suggest the discussion move to scala/scala-dev#260? |
@paplorinc - Could you try the PR at scala/scala#5516 to see if it fixes the issue in your benchmarks? It does in my hands. |
Aligned Java 8 compatibility to Scala 2.12 benchmarks