Optimize slice operation in ImmutableArray #354

ackratos · 2018-01-20T15:47:57Z

baseline performance (0a00062):

[info] Benchmark                              (size)  Mode  Cnt        Score         Error  Units
[info] ImmutableArrayBenchmark.access_slice        0  avgt    8       38.393 ±       0.416  ns/op
[info] ImmutableArrayBenchmark.access_slice        1  avgt    8       39.797 ±       0.606  ns/op
[info] ImmutableArrayBenchmark.access_slice        2  avgt    8       38.203 ±       0.261  ns/op
[info] ImmutableArrayBenchmark.access_slice        3  avgt    8       37.194 ±       0.369  ns/op
[info] ImmutableArrayBenchmark.access_slice        4  avgt    8       41.499 ±       0.251  ns/op
[info] ImmutableArrayBenchmark.access_slice        7  avgt    8       40.420 ±       0.485  ns/op
[info] ImmutableArrayBenchmark.access_slice        8  avgt    8       37.346 ±       0.207  ns/op
[info] ImmutableArrayBenchmark.access_slice       15  avgt    8       41.066 ±       0.387  ns/op
[info] ImmutableArrayBenchmark.access_slice       16  avgt    8       38.310 ±       0.326  ns/op
[info] ImmutableArrayBenchmark.access_slice       17  avgt    8       40.797 ±       0.313  ns/op
[info] ImmutableArrayBenchmark.access_slice       39  avgt    8       42.464 ±       0.477  ns/op
[info] ImmutableArrayBenchmark.access_slice      282  avgt    8      107.206 ±       2.077  ns/op
[info] ImmutableArrayBenchmark.access_slice     4096  avgt    8     1483.075 ±      36.826  ns/op
[info] ImmutableArrayBenchmark.access_slice   131070  avgt    8    55750.166 ±    1819.028  ns/op
[info] ImmutableArrayBenchmark.access_slice  7312102  avgt    8  6388168.153 ± 7012236.507  ns/op

This PR improvement:

[info] Benchmark                              (size)  Mode  Cnt       Score       Error  Units
[info] ImmutableArrayBenchmark.access_slice        0  avgt    8       7.399 ±     0.096  ns/op
[info] ImmutableArrayBenchmark.access_slice        1  avgt    8       6.912 ±     0.084  ns/op
[info] ImmutableArrayBenchmark.access_slice        2  avgt    8       6.996 ±     0.021  ns/op
[info] ImmutableArrayBenchmark.access_slice        3  avgt    8       7.126 ±     0.078  ns/op
[info] ImmutableArrayBenchmark.access_slice        4  avgt    8       7.434 ±     0.028  ns/op
[info] ImmutableArrayBenchmark.access_slice        7  avgt    8       7.713 ±     0.063  ns/op
[info] ImmutableArrayBenchmark.access_slice        8  avgt    8       7.684 ±     0.110  ns/op
[info] ImmutableArrayBenchmark.access_slice       15  avgt    8       8.599 ±     0.100  ns/op
[info] ImmutableArrayBenchmark.access_slice       16  avgt    8       7.867 ±     0.090  ns/op
[info] ImmutableArrayBenchmark.access_slice       17  avgt    8       9.248 ±     0.097  ns/op
[info] ImmutableArrayBenchmark.access_slice       39  avgt    8      11.298 ±     0.548  ns/op
[info] ImmutableArrayBenchmark.access_slice      282  avgt    8      27.505 ±     1.063  ns/op
[info] ImmutableArrayBenchmark.access_slice     4096  avgt    8     310.161 ±    36.247  ns/op
[info] ImmutableArrayBenchmark.access_slice   131070  avgt    8    9640.269 ±   403.856  ns/op
[info] ImmutableArrayBenchmark.access_slice  7312102  avgt    8  676886.167 ± 22375.895  ns/op

Ichoran · 2018-01-20T19:22:14Z

benchmarks/time/src/main/scala/strawman/collection/immutable/ImmutableArrayBenchmark.scala

@@ -10,7 +10,7 @@ import scala.Predef.intWrapper

 @BenchmarkMode(scala.Array(Mode.AverageTime))
 @OutputTimeUnit(TimeUnit.NANOSECONDS)
-@Fork(1)
+@Fork(2)


Why are you changing this?

@Ichoran I just though fork 2 jvm would make result more stable. (with less error). Reverted.

Ichoran · 2018-01-20T19:24:58Z

collections/src/main/scala/strawman/collection/immutable/ImmutableArray.scala

@@ -197,6 +199,10 @@ object ImmutableArray extends StrictOptimizedSeqFactory[ImmutableArray] {
      case that: ofByte => Arrays.equals(unsafeArray, that.unsafeArray)
      case _ => super.equals(that)
    }
+    override def slice(from: Int, until: Int): ImmutableArray[Byte] = {
+      val lo = scala.math.max(from, 0)
+      ImmutableArray.unsafeWrapArray(Arrays.copyOfRange(unsafeArray, lo, until))


This isn't correct. copyOfRange pads with zeros rather than truncating when you run off the end.

@Ichoran fixed. Thank you.

Ichoran · 2018-01-20T19:26:00Z

collections/src/main/scala/strawman/collection/immutable/ImmutableArray.scala

+      // new ofUnit(util.Arrays.copyOfRange[Unit](array, from, until)) - Unit is special and doesnt compile
+      // cant use util.Arrays.copyOfRange[Unit](repr, from, until) - Unit is special and doesnt compile
+      val lo = scala.math.max(from, 0)
+      val res = new Array[Unit](until-lo)


This is also incorrect (it copies the copyOfRange behavior instead of the normal slice behavior). Note that I didn't mark all the places the copyOfRange call needs to be fixed.

Hi @Ichoran
I fixed this and all copyOfRange invocation. But for this ofUnit case, I am not sure whether we need have this case. Because Unit will be auto-boxed to BoxedUnit at runtime, and per my testing, the runtime implementation of slice is rely on definition of ofRef.

Do you know under which situation, the ofUnit representation will be generated and used?

Ichoran · 2018-01-20T19:28:09Z

Thanks for looking at this! The behavior needs to be tweaked to work the way slice is supposed to, but otherwise this should be a better implementation. The benchmarking doesn't seem to be a good indication of actual time taken for small array sizes, but the benefit is obvious on the large sizes.

Ichoran · 2018-01-21T22:50:47Z

collections/src/main/scala/strawman/collection/immutable/ImmutableArray.scala

+    override def slice(from: Int, until: Int): ImmutableArray[T] = {
+      val lo = scala.math.max(from, 0)
+      val hi = scala.math.min(until, length)
+      new ofRef(Arrays.copyOfRange[T](unsafeArray, lo, hi))


Better! But the current behavior (in 2.12) is to give an empty array, not throw an exception, when the range is of negative size. So you either need to check lo < hi or take from lo to math.max(lo, hi).

thanks, fixed.

Ichoran · 2018-01-21T22:51:14Z

collections/src/main/scala/strawman/collection/immutable/ImmutableArray.scala

+      // cant use util.Arrays.copyOfRange[Unit](repr, from, until) - Unit is special and doesnt compile
+      val lo = scala.math.max(from, 0)
+      val hi = scala.math.min(until, length)
+      val slicedLenght = hi - lo


Typo: spelling should be slicedLength

Ah, good spot. Fixed

Ichoran · 2018-01-21T22:51:37Z

collections/src/main/scala/strawman/collection/immutable/ImmutableArray.scala

+      val lo = scala.math.max(from, 0)
+      val hi = scala.math.min(until, length)
+      val slicedLenght = hi - lo
+      val res = new Array[Unit](slicedLenght)


Need to make sure the length is non-negative.

thanks, fixed

szeiger · 2018-01-22T17:20:45Z

collections/src/main/scala/strawman/collection/immutable/ImmutableArray.scala

@@ -186,6 +184,14 @@ object ImmutableArray extends StrictOptimizedSeqFactory[ImmutableArray] {
      case that: ofRef[_] => Arrays.equals(unsafeArray.asInstanceOf[Array[AnyRef]], that.unsafeArray.asInstanceOf[Array[AnyRef]])
      case _ => super.equals(that)
    }
+    override def slice(from: Int, until: Int): ImmutableArray[T] = {


Is there a performance advantage in duplicating this functionality in all subclasses rather than calling new Array with an available implicit ClassTag and then using System.arraycopy?

I should have thought about this for a few more seconds. There is a definite advantage but it's currently not realized. All these slice methods should declare the correct subtype as the return type, for example in this case ofRef[T]. This ensures that the slices can still be accessed without boxing.

ackratos · 2018-02-25T04:06:00Z

rerun benchmark against 2.12.4 with following sbt setting:

val collectionsScalaVersionSettings = Seq(
  scalaVersion := "2.12.4",
  crossScalaVersions := scalaVersion.value :: "2.12.4" :: dotty.value :: Nil
)

val commonSettings = Seq(
  organization := "ch.epfl.scala",
  version := "0.10.0-SNAPSHOT",
  scalaVersion := "2.12.4",

This PR improvement

[info] Benchmark                              (size)  Mode  Cnt       Score       Error  Units
[info] ImmutableArrayBenchmark.access_slice        0  avgt    8       7.478 ±     0.145  ns/op
[info] ImmutableArrayBenchmark.access_slice        1  avgt    8       7.356 ±     0.662  ns/op
[info] ImmutableArrayBenchmark.access_slice        2  avgt    8       7.189 ±     0.265  ns/op
[info] ImmutableArrayBenchmark.access_slice        3  avgt    8       7.199 ±     0.057  ns/op
[info] ImmutableArrayBenchmark.access_slice        4  avgt    8       7.357 ±     0.108  ns/op
[info] ImmutableArrayBenchmark.access_slice        7  avgt    8       7.928 ±     0.145  ns/op
[info] ImmutableArrayBenchmark.access_slice        8  avgt    8       7.893 ±     0.285  ns/op
[info] ImmutableArrayBenchmark.access_slice       15  avgt    8       8.623 ±     0.071  ns/op
[info] ImmutableArrayBenchmark.access_slice       16  avgt    8       8.013 ±     0.078  ns/op
[info] ImmutableArrayBenchmark.access_slice       17  avgt    8       8.149 ±     0.080  ns/op
[info] ImmutableArrayBenchmark.access_slice       39  avgt    8      11.391 ±     0.086  ns/op
[info] ImmutableArrayBenchmark.access_slice      282  avgt    8      28.116 ±     0.771  ns/op
[info] ImmutableArrayBenchmark.access_slice     4096  avgt    8     312.608 ±    11.314  ns/op
[info] ImmutableArrayBenchmark.access_slice   131070  avgt    8   10262.133 ±  1544.902  ns/op
[info] ImmutableArrayBenchmark.access_slice  7312102  avgt    8  688483.807 ± 27744.618  ns/op

collection strawman baseline (0a00062)

[info] Benchmark                              (size)  Mode  Cnt        Score        Error  Units
[info] ImmutableArrayBenchmark.access_slice        0  avgt    8       38.762 ±      0.630  ns/op
[info] ImmutableArrayBenchmark.access_slice        1  avgt    8       40.028 ±      0.548  ns/op
[info] ImmutableArrayBenchmark.access_slice        2  avgt    8       56.638 ±     22.198  ns/op
[info] ImmutableArrayBenchmark.access_slice        3  avgt    8       41.656 ±      0.351  ns/op
[info] ImmutableArrayBenchmark.access_slice        4  avgt    8       41.610 ±      0.448  ns/op
[info] ImmutableArrayBenchmark.access_slice        7  avgt    8       37.348 ±      0.340  ns/op
[info] ImmutableArrayBenchmark.access_slice        8  avgt    8       40.937 ±      0.583  ns/op
[info] ImmutableArrayBenchmark.access_slice       15  avgt    8       40.982 ±      0.867  ns/op
[info] ImmutableArrayBenchmark.access_slice       16  avgt    8       41.430 ±      0.896  ns/op
[info] ImmutableArrayBenchmark.access_slice       17  avgt    8       38.804 ±      0.314  ns/op
[info] ImmutableArrayBenchmark.access_slice       39  avgt    8       47.292 ±      0.715  ns/op
[info] ImmutableArrayBenchmark.access_slice      282  avgt    8      112.369 ±     14.016  ns/op
[info] ImmutableArrayBenchmark.access_slice     4096  avgt    8     1475.208 ±     44.008  ns/op
[info] ImmutableArrayBenchmark.access_slice   131070  avgt    8    54926.844 ±   1527.051  ns/op
[info] ImmutableArrayBenchmark.access_slice  7312102  avgt    8  4875963.582 ± 487106.138  ns/op

scala 2.12.x official baseline (scala/scala@9fc1356)

[info] Benchmark                     (size)  Mode  Cnt        Score       Error  Units
[info] ArrayBenchmark.access_slice        0  avgt    8       40.889 ±     4.991  ns/op
[info] ArrayBenchmark.access_slice        1  avgt    8       39.278 ±     3.365  ns/op
[info] ArrayBenchmark.access_slice        2  avgt    8       37.768 ±     0.117  ns/op
[info] ArrayBenchmark.access_slice        3  avgt    8       37.608 ±     0.116  ns/op
[info] ArrayBenchmark.access_slice        4  avgt    8       37.708 ±     0.290  ns/op
[info] ArrayBenchmark.access_slice        7  avgt    8       37.578 ±     0.350  ns/op
[info] ArrayBenchmark.access_slice        8  avgt    8       37.337 ±     0.146  ns/op
[info] ArrayBenchmark.access_slice       15  avgt    8       37.009 ±     0.438  ns/op
[info] ArrayBenchmark.access_slice       16  avgt    8       36.992 ±     0.303  ns/op
[info] ArrayBenchmark.access_slice       17  avgt    8       37.094 ±     0.225  ns/op
[info] ArrayBenchmark.access_slice       39  avgt    8       36.549 ±     0.342  ns/op
[info] ArrayBenchmark.access_slice      282  avgt    8       44.495 ±     0.501  ns/op
[info] ArrayBenchmark.access_slice     4096  avgt    8      313.988 ±     9.483  ns/op
[info] ArrayBenchmark.access_slice   131070  avgt    8     9970.620 ±   327.554  ns/op
[info] ArrayBenchmark.access_slice  7312102  avgt    8  1148737.239 ± 58234.753  ns/op

szeiger · 2018-02-27T16:03:42Z

My previous comment still stands: Since slice is duplicated in all specialized implementations, it should return a specialized type. The way it is implemented now you lose specialization. With the new implementation that special-cases empty arrays this means that we need to add empty instances of all specialized types.

With the current non-specialized return types you could use a single implementation of slice that calls ArrayOps.copyOf (just merged as part of #479) but I think it's worth having the specialized return types.

ackratos · 2018-03-05T15:17:11Z

This PR Latest benchmark (return specialized ImmutableArray to avoid auto-boxing), slightly improved:

[info] Benchmark                                    (size)  Mode  Cnt       Score       Error  Units
[info] ImmutableArrayBenchmark.access_slice              0  avgt   16       8.655 ±     0.126  ns/op
[info] ImmutableArrayBenchmark.access_slice              1  avgt   16       8.658 ±     0.069  ns/op
[info] ImmutableArrayBenchmark.access_slice              2  avgt   16       8.704 ±     0.024  ns/op
[info] ImmutableArrayBenchmark.access_slice              3  avgt   16       8.806 ±     0.093  ns/op
[info] ImmutableArrayBenchmark.access_slice              4  avgt   16       8.895 ±     0.097  ns/op
[info] ImmutableArrayBenchmark.access_slice              7  avgt   16       9.148 ±     0.033  ns/op
[info] ImmutableArrayBenchmark.access_slice              8  avgt   16       9.273 ±     0.125  ns/op
[info] ImmutableArrayBenchmark.access_slice             15  avgt   16       9.941 ±     0.108  ns/op
[info] ImmutableArrayBenchmark.access_slice             16  avgt   16       9.786 ±     0.480  ns/op
[info] ImmutableArrayBenchmark.access_slice             17  avgt   16       9.685 ±     0.042  ns/op
[info] ImmutableArrayBenchmark.access_slice             39  avgt   16      12.235 ±     0.125  ns/op
[info] ImmutableArrayBenchmark.access_slice            282  avgt   16      27.228 ±     0.252  ns/op
[info] ImmutableArrayBenchmark.access_slice           4096  avgt   16     302.324 ±     3.077  ns/op
[info] ImmutableArrayBenchmark.access_slice         131070  avgt   16    9618.849 ±   346.984  ns/op
[info] ImmutableArrayBenchmark.access_slice        7312102  avgt   16  674628.854 ± 11980.765  ns/op

szeiger · 2018-03-07T14:41:37Z

Could you run a benchmark against the update in #492? Does this version here still offer an advantage? As far as I can tell the difference comes down to 1 vs 2 or 3 operations on polymorphic arrays and any remaining advantage would be small.

… cases

… cases again

ackratos · 2018-03-10T11:40:24Z

Hi @szeiger

You are right, the remaining advantage is small to be noticed:

[info] Benchmark                                    (size)  Mode  Cnt       Score        Error  Units
[info] ImmutableArrayBenchmark.access_slice              0  avgt    8      11.478 ±      0.302  ns/op
[info] ImmutableArrayBenchmark.access_slice              1  avgt    8      26.553 ±     37.922  ns/op
[info] ImmutableArrayBenchmark.access_slice              2  avgt    8      29.739 ±     28.177  ns/op
[info] ImmutableArrayBenchmark.access_slice              3  avgt    8      12.337 ±      1.725  ns/op
[info] ImmutableArrayBenchmark.access_slice              4  avgt    8      15.059 ±      5.291  ns/op
[info] ImmutableArrayBenchmark.access_slice              7  avgt    8      12.421 ±      0.663  ns/op
[info] ImmutableArrayBenchmark.access_slice              8  avgt    8      14.292 ±      3.028  ns/op
[info] ImmutableArrayBenchmark.access_slice             15  avgt    8      13.761 ±      0.744  ns/op
[info] ImmutableArrayBenchmark.access_slice             16  avgt    8      14.044 ±      1.105  ns/op
[info] ImmutableArrayBenchmark.access_slice             17  avgt    8      15.913 ±      3.216  ns/op
[info] ImmutableArrayBenchmark.access_slice             39  avgt    8      17.158 ±      2.362  ns/op
[info] ImmutableArrayBenchmark.access_slice            282  avgt    8      32.892 ±      6.235  ns/op
[info] ImmutableArrayBenchmark.access_slice           4096  avgt    8     266.251 ±     93.297  ns/op
[info] ImmutableArrayBenchmark.access_slice         131070  avgt    8    8796.769 ±   1911.218  ns/op
[info] ImmutableArrayBenchmark.access_slice        7312102  avgt    8  599496.145 ± 102423.115  ns/op


baseline c0129af841eb698768a3fa7af902d7b869f95302:
[info] # Run complete. Total time: 00:08:53
[info] 
[info] Benchmark                              (size)  Mode  Cnt       Score       Error  Units
[info] ImmutableArrayBenchmark.access_slice        0  avgt   16      27.982 ±     3.955  ns/op
[info] ImmutableArrayBenchmark.access_slice        1  avgt   16      26.364 ±     1.861  ns/op
[info] ImmutableArrayBenchmark.access_slice        2  avgt   16      29.313 ±     6.833  ns/op
[info] ImmutableArrayBenchmark.access_slice        3  avgt   16      58.318 ±    35.633  ns/op
[info] ImmutableArrayBenchmark.access_slice        4  avgt   16      24.806 ±     0.851  ns/op
[info] ImmutableArrayBenchmark.access_slice        7  avgt   16      26.439 ±     4.658  ns/op
[info] ImmutableArrayBenchmark.access_slice        8  avgt   16      33.560 ±    12.170  ns/op
[info] ImmutableArrayBenchmark.access_slice       15  avgt   16      28.844 ±     3.464  ns/op
[info] ImmutableArrayBenchmark.access_slice       16  avgt   16      28.977 ±     8.784  ns/op
[info] ImmutableArrayBenchmark.access_slice       17  avgt   16      27.173 ±     2.322  ns/op
[info] ImmutableArrayBenchmark.access_slice       39  avgt   16      26.675 ±     1.704  ns/op
[info] ImmutableArrayBenchmark.access_slice      282  avgt   16      33.786 ±     2.137  ns/op
[info] ImmutableArrayBenchmark.access_slice     4096  avgt   16     224.553 ±    14.430  ns/op
[info] ImmutableArrayBenchmark.access_slice   131070  avgt   16    8575.520 ±  1504.710  ns/op
[info] ImmutableArrayBenchmark.access_slice  7312102  avgt   16  575944.051 ± 74915.828  ns/op

Thank you for your help on reviewing this

ackratos mentioned this pull request Jan 20, 2018

optimise arrayOps in strawman collection #317

Closed

Ichoran reviewed Jan 20, 2018

View reviewed changes

ackratos force-pushed the wip/arrayslice branch from c241908 to 29e0506 Compare January 21, 2018 03:55

Ichoran reviewed Jan 21, 2018

View reviewed changes

ackratos force-pushed the wip/arrayslice branch from c2d6fd1 to b994be2 Compare January 22, 2018 13:37

szeiger reviewed Jan 22, 2018

View reviewed changes

ackratos force-pushed the wip/arrayslice branch 2 times, most recently from 010055f to f86e69d Compare February 20, 2018 02:32

ackratos force-pushed the wip/arrayslice branch from 12ee956 to 1bf1090 Compare March 5, 2018 14:46

julienrf mentioned this pull request Mar 6, 2018

Fixes from scala/scala repository #493

Merged

ackratos added 4 commits March 10, 2018 15:28

Optimize slice operation in ImmutableArray

0f03b31

Fix semantic issue according to Stefan's review comments and add test…

86b0126

… cases

Fix semantic issue according to Stefan's review comments and add test…

57c1ef3

… cases again

Update according to review comment

72f16ac

ackratos force-pushed the wip/arrayslice branch from 1bf1090 to 72f16ac Compare March 10, 2018 11:35

ackratos closed this Mar 10, 2018

Optimize slice operation in ImmutableArray #354

Optimize slice operation in ImmutableArray #354

Uh oh!

Conversation

ackratos commented Jan 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Ichoran Jan 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Ichoran commented Jan 20, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ackratos commented Feb 25, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

szeiger commented Feb 27, 2018

Uh oh!

ackratos commented Mar 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

szeiger commented Mar 7, 2018

Uh oh!

ackratos commented Mar 10, 2018

Uh oh!

Uh oh!

ackratos commented Jan 20, 2018 •

edited

Loading

Ichoran Jan 20, 2018 •

edited

Loading

ackratos commented Feb 25, 2018 •

edited

Loading

ackratos commented Mar 5, 2018 •

edited

Loading