Skip to content
This repository was archived by the owner on Dec 22, 2021. It is now read-only.

Optimize slice operation in ImmutableArray #354

Closed
wants to merge 4 commits into from

Conversation

ackratos
Copy link
Contributor

@ackratos ackratos commented Jan 20, 2018

baseline performance (0a00062):

[info] Benchmark                              (size)  Mode  Cnt        Score         Error  Units
[info] ImmutableArrayBenchmark.access_slice        0  avgt    8       38.393 ±       0.416  ns/op
[info] ImmutableArrayBenchmark.access_slice        1  avgt    8       39.797 ±       0.606  ns/op
[info] ImmutableArrayBenchmark.access_slice        2  avgt    8       38.203 ±       0.261  ns/op
[info] ImmutableArrayBenchmark.access_slice        3  avgt    8       37.194 ±       0.369  ns/op
[info] ImmutableArrayBenchmark.access_slice        4  avgt    8       41.499 ±       0.251  ns/op
[info] ImmutableArrayBenchmark.access_slice        7  avgt    8       40.420 ±       0.485  ns/op
[info] ImmutableArrayBenchmark.access_slice        8  avgt    8       37.346 ±       0.207  ns/op
[info] ImmutableArrayBenchmark.access_slice       15  avgt    8       41.066 ±       0.387  ns/op
[info] ImmutableArrayBenchmark.access_slice       16  avgt    8       38.310 ±       0.326  ns/op
[info] ImmutableArrayBenchmark.access_slice       17  avgt    8       40.797 ±       0.313  ns/op
[info] ImmutableArrayBenchmark.access_slice       39  avgt    8       42.464 ±       0.477  ns/op
[info] ImmutableArrayBenchmark.access_slice      282  avgt    8      107.206 ±       2.077  ns/op
[info] ImmutableArrayBenchmark.access_slice     4096  avgt    8     1483.075 ±      36.826  ns/op
[info] ImmutableArrayBenchmark.access_slice   131070  avgt    8    55750.166 ±    1819.028  ns/op
[info] ImmutableArrayBenchmark.access_slice  7312102  avgt    8  6388168.153 ± 7012236.507  ns/op

This PR improvement:

[info] Benchmark                              (size)  Mode  Cnt       Score       Error  Units
[info] ImmutableArrayBenchmark.access_slice        0  avgt    8       7.399 ±     0.096  ns/op
[info] ImmutableArrayBenchmark.access_slice        1  avgt    8       6.912 ±     0.084  ns/op
[info] ImmutableArrayBenchmark.access_slice        2  avgt    8       6.996 ±     0.021  ns/op
[info] ImmutableArrayBenchmark.access_slice        3  avgt    8       7.126 ±     0.078  ns/op
[info] ImmutableArrayBenchmark.access_slice        4  avgt    8       7.434 ±     0.028  ns/op
[info] ImmutableArrayBenchmark.access_slice        7  avgt    8       7.713 ±     0.063  ns/op
[info] ImmutableArrayBenchmark.access_slice        8  avgt    8       7.684 ±     0.110  ns/op
[info] ImmutableArrayBenchmark.access_slice       15  avgt    8       8.599 ±     0.100  ns/op
[info] ImmutableArrayBenchmark.access_slice       16  avgt    8       7.867 ±     0.090  ns/op
[info] ImmutableArrayBenchmark.access_slice       17  avgt    8       9.248 ±     0.097  ns/op
[info] ImmutableArrayBenchmark.access_slice       39  avgt    8      11.298 ±     0.548  ns/op
[info] ImmutableArrayBenchmark.access_slice      282  avgt    8      27.505 ±     1.063  ns/op
[info] ImmutableArrayBenchmark.access_slice     4096  avgt    8     310.161 ±    36.247  ns/op
[info] ImmutableArrayBenchmark.access_slice   131070  avgt    8    9640.269 ±   403.856  ns/op
[info] ImmutableArrayBenchmark.access_slice  7312102  avgt    8  676886.167 ± 22375.895  ns/op

@@ -10,7 +10,7 @@ import scala.Predef.intWrapper

@BenchmarkMode(scala.Array(Mode.AverageTime))
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Fork(1)
@Fork(2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you changing this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ichoran I just though fork 2 jvm would make result more stable. (with less error). Reverted.

@@ -197,6 +199,10 @@ object ImmutableArray extends StrictOptimizedSeqFactory[ImmutableArray] {
case that: ofByte => Arrays.equals(unsafeArray, that.unsafeArray)
case _ => super.equals(that)
}
override def slice(from: Int, until: Int): ImmutableArray[Byte] = {
val lo = scala.math.max(from, 0)
ImmutableArray.unsafeWrapArray(Arrays.copyOfRange(unsafeArray, lo, until))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't correct. copyOfRange pads with zeros rather than truncating when you run off the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ichoran fixed. Thank you.

// new ofUnit(util.Arrays.copyOfRange[Unit](array, from, until)) - Unit is special and doesnt compile
// cant use util.Arrays.copyOfRange[Unit](repr, from, until) - Unit is special and doesnt compile
val lo = scala.math.max(from, 0)
val res = new Array[Unit](until-lo)
Copy link
Contributor

@Ichoran Ichoran Jan 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also incorrect (it copies the copyOfRange behavior instead of the normal slice behavior). Note that I didn't mark all the places the copyOfRange call needs to be fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Ichoran
I fixed this and all copyOfRange invocation. But for this ofUnit case, I am not sure whether we need have this case. Because Unit will be auto-boxed to BoxedUnit at runtime, and per my testing, the runtime implementation of slice is rely on definition of ofRef.

Do you know under which situation, the ofUnit representation will be generated and used?

@Ichoran
Copy link
Contributor

Ichoran commented Jan 20, 2018

Thanks for looking at this! The behavior needs to be tweaked to work the way slice is supposed to, but otherwise this should be a better implementation. The benchmarking doesn't seem to be a good indication of actual time taken for small array sizes, but the benefit is obvious on the large sizes.

override def slice(from: Int, until: Int): ImmutableArray[T] = {
val lo = scala.math.max(from, 0)
val hi = scala.math.min(until, length)
new ofRef(Arrays.copyOfRange[T](unsafeArray, lo, hi))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better! But the current behavior (in 2.12) is to give an empty array, not throw an exception, when the range is of negative size. So you either need to check lo < hi or take from lo to math.max(lo, hi).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, fixed.

// cant use util.Arrays.copyOfRange[Unit](repr, from, until) - Unit is special and doesnt compile
val lo = scala.math.max(from, 0)
val hi = scala.math.min(until, length)
val slicedLenght = hi - lo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: spelling should be slicedLength

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good spot. Fixed

val lo = scala.math.max(from, 0)
val hi = scala.math.min(until, length)
val slicedLenght = hi - lo
val res = new Array[Unit](slicedLenght)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to make sure the length is non-negative.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, fixed

@@ -186,6 +184,14 @@ object ImmutableArray extends StrictOptimizedSeqFactory[ImmutableArray] {
case that: ofRef[_] => Arrays.equals(unsafeArray.asInstanceOf[Array[AnyRef]], that.unsafeArray.asInstanceOf[Array[AnyRef]])
case _ => super.equals(that)
}
override def slice(from: Int, until: Int): ImmutableArray[T] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a performance advantage in duplicating this functionality in all subclasses rather than calling new Array with an available implicit ClassTag and then using System.arraycopy?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have thought about this for a few more seconds. There is a definite advantage but it's currently not realized. All these slice methods should declare the correct subtype as the return type, for example in this case ofRef[T]. This ensures that the slices can still be accessed without boxing.

@ackratos ackratos force-pushed the wip/arrayslice branch 2 times, most recently from 010055f to f86e69d Compare February 20, 2018 02:32
@ackratos
Copy link
Contributor Author

ackratos commented Feb 25, 2018

rerun benchmark against 2.12.4 with following sbt setting:

val collectionsScalaVersionSettings = Seq(
  scalaVersion := "2.12.4",
  crossScalaVersions := scalaVersion.value :: "2.12.4" :: dotty.value :: Nil
)

val commonSettings = Seq(
  organization := "ch.epfl.scala",
  version := "0.10.0-SNAPSHOT",
  scalaVersion := "2.12.4",

This PR improvement

[info] Benchmark                              (size)  Mode  Cnt       Score       Error  Units
[info] ImmutableArrayBenchmark.access_slice        0  avgt    8       7.478 ±     0.145  ns/op
[info] ImmutableArrayBenchmark.access_slice        1  avgt    8       7.356 ±     0.662  ns/op
[info] ImmutableArrayBenchmark.access_slice        2  avgt    8       7.189 ±     0.265  ns/op
[info] ImmutableArrayBenchmark.access_slice        3  avgt    8       7.199 ±     0.057  ns/op
[info] ImmutableArrayBenchmark.access_slice        4  avgt    8       7.357 ±     0.108  ns/op
[info] ImmutableArrayBenchmark.access_slice        7  avgt    8       7.928 ±     0.145  ns/op
[info] ImmutableArrayBenchmark.access_slice        8  avgt    8       7.893 ±     0.285  ns/op
[info] ImmutableArrayBenchmark.access_slice       15  avgt    8       8.623 ±     0.071  ns/op
[info] ImmutableArrayBenchmark.access_slice       16  avgt    8       8.013 ±     0.078  ns/op
[info] ImmutableArrayBenchmark.access_slice       17  avgt    8       8.149 ±     0.080  ns/op
[info] ImmutableArrayBenchmark.access_slice       39  avgt    8      11.391 ±     0.086  ns/op
[info] ImmutableArrayBenchmark.access_slice      282  avgt    8      28.116 ±     0.771  ns/op
[info] ImmutableArrayBenchmark.access_slice     4096  avgt    8     312.608 ±    11.314  ns/op
[info] ImmutableArrayBenchmark.access_slice   131070  avgt    8   10262.133 ±  1544.902  ns/op
[info] ImmutableArrayBenchmark.access_slice  7312102  avgt    8  688483.807 ± 27744.618  ns/op

collection strawman baseline (0a00062)

[info] Benchmark                              (size)  Mode  Cnt        Score        Error  Units
[info] ImmutableArrayBenchmark.access_slice        0  avgt    8       38.762 ±      0.630  ns/op
[info] ImmutableArrayBenchmark.access_slice        1  avgt    8       40.028 ±      0.548  ns/op
[info] ImmutableArrayBenchmark.access_slice        2  avgt    8       56.638 ±     22.198  ns/op
[info] ImmutableArrayBenchmark.access_slice        3  avgt    8       41.656 ±      0.351  ns/op
[info] ImmutableArrayBenchmark.access_slice        4  avgt    8       41.610 ±      0.448  ns/op
[info] ImmutableArrayBenchmark.access_slice        7  avgt    8       37.348 ±      0.340  ns/op
[info] ImmutableArrayBenchmark.access_slice        8  avgt    8       40.937 ±      0.583  ns/op
[info] ImmutableArrayBenchmark.access_slice       15  avgt    8       40.982 ±      0.867  ns/op
[info] ImmutableArrayBenchmark.access_slice       16  avgt    8       41.430 ±      0.896  ns/op
[info] ImmutableArrayBenchmark.access_slice       17  avgt    8       38.804 ±      0.314  ns/op
[info] ImmutableArrayBenchmark.access_slice       39  avgt    8       47.292 ±      0.715  ns/op
[info] ImmutableArrayBenchmark.access_slice      282  avgt    8      112.369 ±     14.016  ns/op
[info] ImmutableArrayBenchmark.access_slice     4096  avgt    8     1475.208 ±     44.008  ns/op
[info] ImmutableArrayBenchmark.access_slice   131070  avgt    8    54926.844 ±   1527.051  ns/op
[info] ImmutableArrayBenchmark.access_slice  7312102  avgt    8  4875963.582 ± 487106.138  ns/op

scala 2.12.x official baseline (scala/scala@9fc1356)

[info] Benchmark                     (size)  Mode  Cnt        Score       Error  Units
[info] ArrayBenchmark.access_slice        0  avgt    8       40.889 ±     4.991  ns/op
[info] ArrayBenchmark.access_slice        1  avgt    8       39.278 ±     3.365  ns/op
[info] ArrayBenchmark.access_slice        2  avgt    8       37.768 ±     0.117  ns/op
[info] ArrayBenchmark.access_slice        3  avgt    8       37.608 ±     0.116  ns/op
[info] ArrayBenchmark.access_slice        4  avgt    8       37.708 ±     0.290  ns/op
[info] ArrayBenchmark.access_slice        7  avgt    8       37.578 ±     0.350  ns/op
[info] ArrayBenchmark.access_slice        8  avgt    8       37.337 ±     0.146  ns/op
[info] ArrayBenchmark.access_slice       15  avgt    8       37.009 ±     0.438  ns/op
[info] ArrayBenchmark.access_slice       16  avgt    8       36.992 ±     0.303  ns/op
[info] ArrayBenchmark.access_slice       17  avgt    8       37.094 ±     0.225  ns/op
[info] ArrayBenchmark.access_slice       39  avgt    8       36.549 ±     0.342  ns/op
[info] ArrayBenchmark.access_slice      282  avgt    8       44.495 ±     0.501  ns/op
[info] ArrayBenchmark.access_slice     4096  avgt    8      313.988 ±     9.483  ns/op
[info] ArrayBenchmark.access_slice   131070  avgt    8     9970.620 ±   327.554  ns/op
[info] ArrayBenchmark.access_slice  7312102  avgt    8  1148737.239 ± 58234.753  ns/op

@szeiger
Copy link
Contributor

szeiger commented Feb 27, 2018

My previous comment still stands: Since slice is duplicated in all specialized implementations, it should return a specialized type. The way it is implemented now you lose specialization. With the new implementation that special-cases empty arrays this means that we need to add empty instances of all specialized types.

With the current non-specialized return types you could use a single implementation of slice that calls ArrayOps.copyOf (just merged as part of #479) but I think it's worth having the specialized return types.

@ackratos
Copy link
Contributor Author

ackratos commented Mar 5, 2018

This PR Latest benchmark (return specialized ImmutableArray to avoid auto-boxing), slightly improved:

[info] Benchmark                                    (size)  Mode  Cnt       Score       Error  Units
[info] ImmutableArrayBenchmark.access_slice              0  avgt   16       8.655 ±     0.126  ns/op
[info] ImmutableArrayBenchmark.access_slice              1  avgt   16       8.658 ±     0.069  ns/op
[info] ImmutableArrayBenchmark.access_slice              2  avgt   16       8.704 ±     0.024  ns/op
[info] ImmutableArrayBenchmark.access_slice              3  avgt   16       8.806 ±     0.093  ns/op
[info] ImmutableArrayBenchmark.access_slice              4  avgt   16       8.895 ±     0.097  ns/op
[info] ImmutableArrayBenchmark.access_slice              7  avgt   16       9.148 ±     0.033  ns/op
[info] ImmutableArrayBenchmark.access_slice              8  avgt   16       9.273 ±     0.125  ns/op
[info] ImmutableArrayBenchmark.access_slice             15  avgt   16       9.941 ±     0.108  ns/op
[info] ImmutableArrayBenchmark.access_slice             16  avgt   16       9.786 ±     0.480  ns/op
[info] ImmutableArrayBenchmark.access_slice             17  avgt   16       9.685 ±     0.042  ns/op
[info] ImmutableArrayBenchmark.access_slice             39  avgt   16      12.235 ±     0.125  ns/op
[info] ImmutableArrayBenchmark.access_slice            282  avgt   16      27.228 ±     0.252  ns/op
[info] ImmutableArrayBenchmark.access_slice           4096  avgt   16     302.324 ±     3.077  ns/op
[info] ImmutableArrayBenchmark.access_slice         131070  avgt   16    9618.849 ±   346.984  ns/op
[info] ImmutableArrayBenchmark.access_slice        7312102  avgt   16  674628.854 ± 11980.765  ns/op

@szeiger
Copy link
Contributor

szeiger commented Mar 7, 2018

Could you run a benchmark against the update in #492? Does this version here still offer an advantage? As far as I can tell the difference comes down to 1 vs 2 or 3 operations on polymorphic arrays and any remaining advantage would be small.

@ackratos
Copy link
Contributor Author

Hi @szeiger

You are right, the remaining advantage is small to be noticed:

[info] Benchmark                                    (size)  Mode  Cnt       Score        Error  Units
[info] ImmutableArrayBenchmark.access_slice              0  avgt    8      11.478 ±      0.302  ns/op
[info] ImmutableArrayBenchmark.access_slice              1  avgt    8      26.553 ±     37.922  ns/op
[info] ImmutableArrayBenchmark.access_slice              2  avgt    8      29.739 ±     28.177  ns/op
[info] ImmutableArrayBenchmark.access_slice              3  avgt    8      12.337 ±      1.725  ns/op
[info] ImmutableArrayBenchmark.access_slice              4  avgt    8      15.059 ±      5.291  ns/op
[info] ImmutableArrayBenchmark.access_slice              7  avgt    8      12.421 ±      0.663  ns/op
[info] ImmutableArrayBenchmark.access_slice              8  avgt    8      14.292 ±      3.028  ns/op
[info] ImmutableArrayBenchmark.access_slice             15  avgt    8      13.761 ±      0.744  ns/op
[info] ImmutableArrayBenchmark.access_slice             16  avgt    8      14.044 ±      1.105  ns/op
[info] ImmutableArrayBenchmark.access_slice             17  avgt    8      15.913 ±      3.216  ns/op
[info] ImmutableArrayBenchmark.access_slice             39  avgt    8      17.158 ±      2.362  ns/op
[info] ImmutableArrayBenchmark.access_slice            282  avgt    8      32.892 ±      6.235  ns/op
[info] ImmutableArrayBenchmark.access_slice           4096  avgt    8     266.251 ±     93.297  ns/op
[info] ImmutableArrayBenchmark.access_slice         131070  avgt    8    8796.769 ±   1911.218  ns/op
[info] ImmutableArrayBenchmark.access_slice        7312102  avgt    8  599496.145 ± 102423.115  ns/op


baseline c0129af841eb698768a3fa7af902d7b869f95302:
[info] # Run complete. Total time: 00:08:53
[info] 
[info] Benchmark                              (size)  Mode  Cnt       Score       Error  Units
[info] ImmutableArrayBenchmark.access_slice        0  avgt   16      27.982 ±     3.955  ns/op
[info] ImmutableArrayBenchmark.access_slice        1  avgt   16      26.364 ±     1.861  ns/op
[info] ImmutableArrayBenchmark.access_slice        2  avgt   16      29.313 ±     6.833  ns/op
[info] ImmutableArrayBenchmark.access_slice        3  avgt   16      58.318 ±    35.633  ns/op
[info] ImmutableArrayBenchmark.access_slice        4  avgt   16      24.806 ±     0.851  ns/op
[info] ImmutableArrayBenchmark.access_slice        7  avgt   16      26.439 ±     4.658  ns/op
[info] ImmutableArrayBenchmark.access_slice        8  avgt   16      33.560 ±    12.170  ns/op
[info] ImmutableArrayBenchmark.access_slice       15  avgt   16      28.844 ±     3.464  ns/op
[info] ImmutableArrayBenchmark.access_slice       16  avgt   16      28.977 ±     8.784  ns/op
[info] ImmutableArrayBenchmark.access_slice       17  avgt   16      27.173 ±     2.322  ns/op
[info] ImmutableArrayBenchmark.access_slice       39  avgt   16      26.675 ±     1.704  ns/op
[info] ImmutableArrayBenchmark.access_slice      282  avgt   16      33.786 ±     2.137  ns/op
[info] ImmutableArrayBenchmark.access_slice     4096  avgt   16     224.553 ±    14.430  ns/op
[info] ImmutableArrayBenchmark.access_slice   131070  avgt   16    8575.520 ±  1504.710  ns/op
[info] ImmutableArrayBenchmark.access_slice  7312102  avgt   16  575944.051 ± 74915.828  ns/op

Thank you for your help on reviewing this

@ackratos ackratos closed this Mar 10, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants