Updated JMH, Scalaz, Clojure, and ECollections benchmark versions #1658

l0rinc · 2016-11-03T18:06:56Z

Aligned Java 8 compatibility to Scala 2.12 benchmarks

codecov-io · 2016-11-03T18:27:03Z

Current coverage is 96.43% (diff: 100%)

Merging #1658 into master will not change coverage

@@             master      #1658   diff @@
==========================================
  Files            89         89          
  Lines         11123      11123          
  Methods           0          0          
  Messages          0          0          
  Branches       1893       1893          
==========================================
  Hits          10726      10726          
  Misses          243        243          
  Partials        154        154

Powered by Codecov. Last update 0bbc789...0864032

l0rinc · 2016-11-03T19:24:54Z

Hmm, strangely it seems that the new Scala 2.12.0 Vector is significantly slower in a few cases (e.g. tail, slice, update, append, prepend). @djspiewak, what could be the reason for that?

Displaying speed ratios Javaslang Vector ops/s / Scala Vector ops/s (greater value means Javaslang is faster):

2.11.8:

Operation     10    100   1026 
Create     7.13×  7.92×  7.70×
Head       0.79×  0.59×  0.49×
Tail       2.64×  3.49×  4.55×
Get        1.09×  1.03×  0.71×
Update     1.21×  1.07×  0.76×
Map        1.21×  1.18×  1.43×
Filter     1.47×  1.55×  1.38×
Prepend    0.09×  0.08×  0.08×
Append     0.15×  0.10×  0.08×
AppendAll  2.79×  1.52×  1.67×
GroupBy    1.25×  1.17×  0.92×
Slice      2.45×  3.13×  4.87×
Iterate    2.03×  0.88×  1.43×

2.12.0:

Operation     10    100   1026
Create     6.68×  5.64×  5.77×
Head       0.79×  0.64×  0.48×
Tail       5.68×  7.67×  8.71×
Get        1.03×  1.05×  0.78×
Update     2.64×  2.72×  1.78×
Map        1.37×  1.08×  1.31×
Filter     1.22×  1.45×  1.16×
Prepend    0.18×  0.18×  0.17×
Append     0.31×  0.20×  0.20×
AppendAll  3.57×  1.87×  1.71×
GroupBy    1.52×  1.31×  1.13×
Slice      4.96×  8.52× 10.71×
Iterate    1.84×  1.42×  1.30×

l0rinc · 2016-11-03T20:25:31Z

@viktorklang, tried it separately with the final 2.12.0 and 2.11.8:

Scala slice for 1026 elements:

@Benchmark
public void scala_persistent(Blackhole bh) {
    scala.collection.immutable.Vector<Integer> values = scalaPersistent;
    while (!values.isEmpty()) {
        values = values.slice(1, values.size());
        values = values.slice(0, values.size() - 1);
        bh.consume(values);
    }
}

results in

8,267.31 ops/s - 2.11.8
4,132.78 ops/s - 2.12.0

while Javaslang's speed stays:

44,629.03 ops/s

viktorklang · 2016-11-03T20:33:57Z

Ping @SethTisue

djspiewak · 2016-11-03T20:36:59Z

Could this have anything to do with the new default methods encoding of traits?

SethTisue · 2016-11-03T20:39:02Z

@Ichoran is this expected?

Ichoran · 2016-11-03T21:26:02Z

This is not what I would have expected, @SethTisue. I am not sure what is going on, but basically everything that has to create stuff is considerably slower. The JVM has a really tough job to do with keeping track of what's valid with all the pointers into different depths of the vector; maybe that optimization is affected by the trait encoding?
I could look into it starting this weekend if it's a high priority. Otherwise, if someone wants to look at the generated assembly, that might give a clue. I'd probably go for append as the easiest to understand. They're all pretty hairy.

danieldietrich · 2016-11-03T22:03:28Z

Thank you Lorinc, interesting benchmarks! I hope we also can use the insights of the Scala language architects for the design of Javaslang.

l0rinc · 2016-11-03T22:45:17Z

@viktorklang, @SethTisue, @Ichoran, @djspiewak, @danieldietrich, the slowdown seems to have happened in 2.12.0-M2 (4,528.84 ops/s), as in 2.12.0-M1 it was still 9,470.62 ops/s.

M2 changelog: http://www.scala-lang.org/news/2.12.0-M2

Beginning with 2.12.0-M2, the Scala 2.12 series targets Java 8.

Ichoran · 2016-11-03T23:17:06Z

Oh, that was the GenBCode / inliner milestone! I wonder if just looking at the bytecode would show what the difference is?

l0rinc · 2016-11-03T23:36:43Z

Could this be the problem: scala/scala@bb4b79c#diff-59f3462485b74027de4fd5e9febcc81bR136 ?

Ichoran · 2016-11-03T23:54:54Z

Could be. I definitely benchmarked that, but I benchmarked it before the GenBCode change IIRC, and it's possible that I forgot to benchmark both with and without optimization. That's a huge penalty for two equalities that ought to be false!

viktorklang · 2016-11-04T07:37:21Z

But it is 2 extra branches and method calls? Perhaps put the method size
above default for inlining? (31bytes IIRC)

Cheers,
√

On Nov 4, 2016 12:54 AM, "Ichoran" [email protected] wrote:

Could be. I definitely benchmarked that, but I benchmarked it before the
GenBCode change IIRC, and it's possible that I forgot to benchmark both
with and without optimization. That's a huge penalty for two equalities
that ought to be false!

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1658 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAAqd7ttMtiRms5nBzuXHamPA7gUzcLKks5q6nRSgaJpZM4Kouq1
.

Ichoran · 2016-11-04T16:19:39Z

@viktorklang - It's marked @inline so I wasn't anticipating that the JVM would be responsible for inlining it! I'm sure it's over the default (35 according to http://www.oracle.com/technetwork/articles/java/vmoptions-jsp-140102.html), but that shouldn't have mattered unless the Scala compiler isn't inlining it.

You might be right that the method calls to fetch the builder instances are at fault.

l0rinc · 2016-11-06T12:57:32Z

@viktorklang, @SethTisue, @Ichoran, @djspiewak, @danieldietrich, I've recompiled Scala 2.12 with the Vector from 2.11, and the above methods are either slower or the same.
I'm afraid the slowdown has a more serious cause, which I'm not yet qualified to investigate :/

viktorklang · 2016-11-07T08:43:14Z

@paplorinc I think this is worth investigating, @Ichoran / @SethTisue is there an issue for this in the scala/scala tracker?

Ichoran · 2016-11-07T12:33:00Z

I don't know if there's an issue, but I'm looking into it. So far I have not found any difference in bytecode or assembly that can explain the whole difference. (Also, in my hands the difference is more like 50%.) I did find that either switching VectorPointer to an abstract class (from a trait) or manually inlining all the code in it (so there is no default method) partially rescued the performance of :+. But that was still only partial.

Ichoran · 2016-11-08T09:07:51Z

This is a tough one! The bytecode is nearly identical (aside from the obvious differences); there is a big difference in profiling gotoPosWritable1, but the bytecode of that, despite being in different places, is essentially identical. The assembly is, for some reason, a bit less efficient in 2.12, but it seems hard to believe that this is causing the entire slowdown.

I'm going to try some speculative fixes, but I'm not terribly hopeful that any of them will work given that I still don't understand precisely what is causing the slowness.

l0rinc · 2016-11-08T13:47:37Z

@Ichoran, @SethTisue, @viktorklang, @odersky, @djspiewak, @danieldietrich, @zsolt-donca

To sum it up, comparing the JMH benchmarks for 2.11.8 with 2.12.0, there seems to be a measurable slowdown in some Vector operations for all GCs, i.e.

e.g. ops/s for 1000 slice operations - 2.11.8 vs. 2.12.0:

GC type             2.12 / 2.11  ops/s ratios
UseParallelOldGC    78%          (8228 / 10563)
UseConcMarkSweepGC  63%          (7167 / 11320)
UseG1GC             49%          (4695 /  9482)

(See l0rinc/ScalaVectorBenchmark@918f75b)

Note: There still appears to be room for optimizations, seeing that the Javaslang persistent Vector's slice is ~5-10× faster

djspiewak · 2016-11-08T18:13:14Z

Some random thoughts that may have no bearing on anything (and may have already occurred to everyone):

The new bytecode emitted may have pushed some functions just over HotSpot's inline limit, preventing certain optimizations which used to apply
The assembly produced by HotSpot in response to the new bytecode may be less amenable to whatever dark magic the CPU is doing. I freely admit to being highly superstitious on this point…
What do the heap profiles look like? Presumably identical?
Have you tried enabling GenBCode on 2.11 to see if the same performance degradation is observed?
Exactly how much method defaulting is left in the code paths after converting the trait to an abstract class? I'm not entirely sure how well the JVM's PIC interacts with method defaulting. I would assume reasonably well, but I don't know the details. Insert more superstitious mumbling here…

Ichoran · 2016-11-08T18:46:54Z

@djspiewak - I mostly have been looking at assembly; the only glaring thing I've found is a failure to convert from a method call to a field access on some instances of fetching display0. Still, it's hard to imagine that the slowdown could be so big just because of that. It doesn't seem to be an inlining thing in general.

About half the performance difference disappears when one converts the VectorPointer trait to an abstract class (plus various other changes that need to happen for that to work--and I'm not certain that the trait/abstract class change is the key one). Also, about half the performance difference disappears if the VectorPointer has no default methods and the implementations are moved to the children. I assume it's the "same" half, though obviously I can't easily check.

l0rinc · 2016-11-08T18:55:31Z

@Ichoran

how can the selection of GC affect the results so much in the new, but less in the old case?
is this a general slowdown? Why wasn't it detected by other benchmarks?
can I help in any way?

Ichoran · 2016-11-08T20:21:39Z

@paplorinc - Good question regarding the GC. I don't have a good answer at this time. It does suggest that I should be looking for an extra allocation in the bytecode, doesn't it?

This seems relatively Vector-specific, and there isn't a broad benchmark suite. Maybe this is an indication that we should create one.

What you've been doing has already been very helpful! If you have time to bisect the commit history and figure out where the slowdown first happened, that would be awesome. Looking for generalization (e.g. is HashMap affected?) might help too. Or you may have other ideas that are better than either.

SethTisue · 2016-11-08T20:28:56Z

may I suggest the discussion move to scala/scala-dev#260?

Ichoran · 2016-11-09T08:46:27Z

@paplorinc - Could you try the PR at scala/scala#5516 to see if it fixes the issue in your benchmarks? It does in my hands.

l0rinc added 2 commits November 3, 2016 19:35

Updated JMH, Scalaz, Clojure, and ECollections benchmark versions

48db0b4

Aligned Java 8 compatibility to Scala 2.12 benchmarks

0864032

danieldietrich merged commit d6d2d19 into vavr-io:master Nov 3, 2016

danieldietrich added the test/benchmark label Nov 3, 2016

danieldietrich added this to the 2.1.0 milestone Nov 3, 2016

l0rinc deleted the benchmark-versions branch November 6, 2016 12:57

SethTisue mentioned this pull request Nov 8, 2016

Vector got slower in 2.12? scala/scala-dev#260

Closed

danieldietrich added the «non-public» label Nov 14, 2016

Uh oh!

Updated JMH, Scalaz, Clojure, and ECollections benchmark versions #1658

Updated JMH, Scalaz, Clojure, and ECollections benchmark versions #1658

Uh oh!

Conversation

l0rinc commented Nov 3, 2016

Uh oh!

codecov-io commented Nov 3, 2016

Current coverage is 96.43% (diff: 100%)

Uh oh!

l0rinc commented Nov 3, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

l0rinc commented Nov 3, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

viktorklang commented Nov 3, 2016

Uh oh!

djspiewak commented Nov 3, 2016

Uh oh!

SethTisue commented Nov 3, 2016

Uh oh!

Ichoran commented Nov 3, 2016

Uh oh!

danieldietrich commented Nov 3, 2016

Uh oh!

l0rinc commented Nov 3, 2016

Uh oh!

Ichoran commented Nov 3, 2016

Uh oh!

l0rinc commented Nov 3, 2016

Uh oh!

Ichoran commented Nov 3, 2016

Uh oh!

viktorklang commented Nov 4, 2016

Uh oh!

Ichoran commented Nov 4, 2016

Uh oh!

l0rinc commented Nov 6, 2016

Uh oh!

viktorklang commented Nov 7, 2016

Uh oh!

Ichoran commented Nov 7, 2016

Uh oh!

Ichoran commented Nov 8, 2016

Uh oh!

l0rinc commented Nov 8, 2016

Uh oh!

djspiewak commented Nov 8, 2016

Uh oh!

Ichoran commented Nov 8, 2016

Uh oh!

l0rinc commented Nov 8, 2016

Uh oh!

Ichoran commented Nov 8, 2016

Uh oh!

SethTisue commented Nov 8, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ichoran commented Nov 9, 2016

Uh oh!

Uh oh!

l0rinc commented Nov 3, 2016 •

edited

Loading

l0rinc commented Nov 3, 2016 •

edited

Loading

SethTisue commented Nov 8, 2016 •

edited

Loading