KT-55178 Performance improvement of KFunction.callBy #4840

k163377 · 2022-06-01T20:46:21Z

Performance improvements have been made to KCallableImpl.callDefaultMethod (KFunction.callBy).

Significance of this improvement

KFunction.callBy is almost always used when function calls are made using kotlin-reflect.
Improving this performance will improve the performance of the entire library that depends on kotlin-reflect.

Improvement Details

The contents of the array used for reflection calls are definitive for the target function.
On the other hand, the current implementation manages arguments dynamically.

In this PR, I have improved performance by caching as much of the processing as possible on the first run, and also by dropping the use of ArrayList.

Benchmark Results

I have created a simple benchmark project to compare 1.7.0-RC with this change.
The content compares the calling process for functions with 0, 1, 5, and 20 arguments, using all default arguments or not using default arguments.
https://github.com/k163377/kfunction-call-by-benchmark

The results are as follows (for ease of viewing, only the order of the results has been rearranged).
The higher the score, the better.

before

Benchmark                          Mode  Cnt         Score        Error  Units
Measurement.zero                  thrpt    4  17642071.229 ± 258708.101  ops/s
Measurement.oneWithDefault        thrpt    4   1134130.294 ±  25929.190  ops/s
Measurement.oneWithoutDefault     thrpt    4   6126880.972 ± 419737.002  ops/s
Measurement.fiveWithDefault       thrpt    4    424029.672 ±  42593.194  ops/s
Measurement.fiveWithoutDefault    thrpt    4   2084516.569 ± 204075.004  ops/s
Measurement.twentyWithDefault     thrpt    4    134928.255 ±  41719.699  ops/s
Measurement.twentyWithoutDefault  thrpt    4    608741.935 ± 106547.416  ops/s

after

Benchmark                          Mode  Cnt         Score        Error  Units
Measurement.zero                  thrpt    4  13168129.895 ± 492380.039  ops/s
Measurement.oneWithDefault        thrpt    4   4903788.122 ± 383012.856  ops/s
Measurement.oneWithoutDefault     thrpt    4   6293334.841 ± 211793.899  ops/s
Measurement.fiveWithDefault       thrpt    4   1924051.074 ±  34722.071  ops/s
Measurement.fiveWithoutDefault    thrpt    4   2255907.246 ± 616366.197  ops/s
Measurement.twentyWithDefault     thrpt    4    530805.418 ±   7278.245  ops/s
Measurement.twentyWithoutDefault  thrpt    4    664515.652 ±  18892.456  ops/s

For the case where all default arguments are used, there is a significant performance improvement.
For patterns that did not use default arguments, performance was marginally improved, or at least not degraded.

For the no-argument function calls, performance tended to be worse.
I considered optimizing for argumentless functions, but decided against it for the following reasons

Calls to no-argument functions are very performant to begin with.
I did not feel that the pattern was important enough to require optimization.

k163377 · 2022-06-01T20:47:33Z

core/reflection.jvm/src/kotlin/reflect/jvm/internal/KCallableImpl.kt

+        arguments
+    }
+
+    private fun getAbsentArguments(): Array<Any?> = _absentArguments().clone()


Since clone processing is required, it is defined as a function rather than a property.

core/reflection.jvm/src/kotlin/reflect/jvm/internal/KCallableImpl.kt

k163377 · 2022-06-01T20:56:44Z

core/reflection.jvm/src/kotlin/reflect/jvm/internal/KCallableImpl.kt

+        // set absent values
+        parameters.forEach { parameter ->
+            if (parameter.isOptional && !parameter.type.isInlineClassType) {
+                // For inline class types, the javaType refers to the underlying type of the inline class,
+                // but we have to pass null in order to mark the argument as absent for InlineClassAwareCaller.
+                arguments[parameter.index] = defaultPrimitiveValue(parameter.type.javaType)
+            } else if (parameter.isVararg) {
+                arguments[parameter.index] = defaultEmptyArray(parameter.type)
+            }
+        }


By setting the abscent value in advance, processing can be omitted at the time of the call.
Even if the argument is set, it will be overwritten, so there is no effect.

k163377 · 2022-10-22T17:30:21Z

@udalov
Would it be difficult for you to review this PR?
I think the KFunction.callBy speedup is an important change for kotlin-reflect users.

If you are concerned about the amount of changes or increased memory consumption due to the introduction of cache, I can change the content as follows
Kotlin/kotlinx.reflect.lite#36

udalov · 2022-11-01T23:32:08Z

Sorry for the delay! I agree, this optimization can be very useful.

I think we can go further and make the getAbsentArguments array of size parameters.size + (if (isSuspend) 1 else 0) + masks.size + 1. This will get rid of another array creation at the end of callDefaultMethod. This will, however, require to copy the array in case no default arguments were omitted (to take its prefix of size parameters.size + (if (isSuspend) 1 else 0)), which is the !anyOptional case. But maybe that's fine since the user can also use the more performant KCallable.call in that case anyway. What do you think?

k163377 · 2022-11-02T00:22:36Z

Thanks for the review.

I agree with your suggestions.
I think it is more reasonable to optimize callBy to use default values, since the optimization of using call when all arguments are given can be easily done by the user.

I will revise and push this when I have time.

k163377 · 2022-11-02T15:39:09Z

@udalov
Fixed and rebased.

Benchmark results for 1.7.20, before and after fix, are as follows

1.7.20

Benchmark                          Mode  Cnt         Score         Error  Units
Measurement.zero                  thrpt    4  17404199.235 ± 1055469.861  ops/s
Measurement.oneWithDefault        thrpt    4   1088190.688 ±  252820.971  ops/s
Measurement.oneWithoutDefault     thrpt    4   6106006.345 ±  313929.311  ops/s
Measurement.fiveWithDefault       thrpt    4    427439.340 ±   18559.512  ops/s
Measurement.fiveWithoutDefault    thrpt    4   2025561.126 ±   78028.024  ops/s
Measurement.twentyWithDefault     thrpt    4    121692.511 ±   45871.752  ops/s
Measurement.twentyWithoutDefault  thrpt    4    612366.962 ±  110886.024  ops/s

before fix

Benchmark                          Mode  Cnt         Score         Error  Units
Measurement.zero                  thrpt    4  13048884.146 ± 1348295.385  ops/s
Measurement.oneWithDefault        thrpt    4   4554089.123 ± 2762308.518  ops/s
Measurement.oneWithoutDefault     thrpt    4   6559863.859 ±  423675.272  ops/s
Measurement.fiveWithDefault       thrpt    4   1925996.082 ±   69195.213  ops/s
Measurement.fiveWithoutDefault    thrpt    4   2482991.930 ±  117665.385  ops/s
Measurement.twentyWithDefault     thrpt    4    547109.098 ±   23378.718  ops/s
Measurement.twentyWithoutDefault  thrpt    4    734820.642 ±  141763.019  ops/s

after fix

Benchmark                          Mode  Cnt         Score        Error  Units
Measurement.zero                  thrpt    4  13134715.929 ± 152888.381  ops/s
Measurement.oneWithDefault        thrpt    4   5384349.821 ± 454531.756  ops/s
Measurement.oneWithoutDefault     thrpt    4   6452819.026 ± 376521.431  ops/s
Measurement.fiveWithDefault       thrpt    4   1948375.781 ± 199051.047  ops/s
Measurement.fiveWithoutDefault    thrpt    4   2461809.238 ± 123859.545  ops/s
Measurement.twentyWithDefault     thrpt    4    520222.742 ±  76772.643  ops/s
Measurement.twentyWithoutDefault  thrpt    4    737876.722 ±  94725.636  ops/s

There was not much difference between before and after the fix.

In addition, I also benchmarked the results of limited changes (made to kotlin-reflect-lite Kotlin/kotlinx.reflect.lite#36).
https://github.com/k163377/kotlin/tree/only-array-opt
The results are as follows

Benchmark                          Mode  Cnt         Score         Error  Units
Measurement.zero                  thrpt    4  15824694.741 ±  416278.522  ops/s
Measurement.oneWithDefault        thrpt    4   1989567.103 ±  668681.760  ops/s
Measurement.oneWithoutDefault     thrpt    4   6748946.272 ±  522893.716  ops/s
Measurement.fiveWithDefault       thrpt    4    459237.317 ±   81528.824  ops/s
Measurement.fiveWithoutDefault    thrpt    4   2354501.801 ± 1219202.447  ops/s
Measurement.twentyWithDefault     thrpt    4    128773.826 ±   21388.445  ops/s
Measurement.twentyWithoutDefault  thrpt    4    723547.425 ±   71860.672  ops/s

In this case, the improvement was relatively small.

core/reflection.jvm/src/kotlin/reflect/jvm/internal/KCallableImpl.kt

udalov · 2022-11-08T21:46:30Z

Thanks! I've also noticed that now we perform integer boxing when computing the mask for every default argument and I'd like to avoid it. What do you think about the following (hopefully last) optimization 388fdc4?

core/reflection.jvm/src/kotlin/reflect/jvm/internal/KCallableImpl.kt

k163377 · 2022-11-11T16:53:55Z

What do you think about the following (hopefully last) optimization 388fdc4?

It does not seem to be a good fix, at least as far as the benchmark results are concerned.

Even with this optimization, the speedup appears to be only when many default arguments are used.
In other cases, it appears to be offset by the cost of branching.

638d89c

Benchmark                          Mode  Cnt         Score        Error  Units
Measurement.zero                  thrpt    4  12938281.870 ± 488259.178  ops/s
Measurement.fiveWithDefault       thrpt    4   1940327.443 ± 198156.544  ops/s
Measurement.fiveWithoutDefault    thrpt    4   2442313.516 ±  39665.853  ops/s
Measurement.oneWithDefault        thrpt    4   5451979.125 ± 715952.574  ops/s
Measurement.oneWithoutDefault     thrpt    4   6475176.599 ± 349258.846  ops/s
Measurement.twentyWithDefault     thrpt    4    505560.683 ±  42324.766  ops/s
Measurement.twentyWithoutDefault  thrpt    4    726913.081 ± 136844.449  ops/s

When 388fdc4 is imported

Benchmark                          Mode  Cnt         Score         Error  Units
Measurement.zero                  thrpt    4  13029498.008 ±  255137.936  ops/s
Measurement.fiveWithDefault       thrpt    4   1832503.215 ± 1117903.663  ops/s
Measurement.fiveWithoutDefault    thrpt    4   2399084.679 ±  117597.661  ops/s
Measurement.oneWithDefault        thrpt    4   5460558.554 ±  279303.773  ops/s
Measurement.oneWithoutDefault     thrpt    4   6325791.468 ±  187765.117  ops/s
Measurement.twentyWithDefault     thrpt    4    598262.914 ±   26892.919  ops/s
Measurement.twentyWithoutDefault  thrpt    4    729748.383 ±  100592.024  ops/s

core/reflection.jvm/src/kotlin/reflect/jvm/internal/KCallableImpl.kt

k163377 · 2022-11-12T01:19:53Z

core/reflection.jvm/src/kotlin/reflect/jvm/internal/KCallableImpl.kt

-        val masks = ArrayList<Int>(1)
-        var index = 0
-        var anyOptional = false
+        val parameterSize = parameters.size + (if (isSuspend) 1 else 0)


The following optimization to no-argument functions was found to be nearly twice as fast as 638d89c.
Does this optimization look like it should be incorporated?

Benchmark Mode Cnt Score Error Units Measurement.zero thrpt 4 22299885.316 ± 295995.533 ops/s Measurement.fiveWithDefault thrpt 4 1927106.356 ± 225032.224 ops/s Measurement.fiveWithoutDefault thrpt 4 2460623.502 ± 96649.714 ops/s Measurement.oneWithDefault thrpt 4 5384826.666 ± 340254.100 ops/s Measurement.oneWithoutDefault thrpt 4 5654229.984 ± 5914104.417 ops/s Measurement.twentyWithDefault thrpt 4 510341.805 ± 2940.113 ops/s Measurement.twentyWithoutDefault thrpt 4 739231.653 ± 21298.002 ops/s

Suggested change

val parameterSize = parameters.size + (if (isSuspend) 1 else 0)

val parameterSize = parameters.size + (if (isSuspend) 1 else 0)

// Optimization for general no-argument functions.

if (parameters.isEmpty()) {

@Suppress("UNCHECKED_CAST")

return reflectionCall {

caller.call(if (isSuspend) arrayOf(continuationArgument) else emptyArray()) as R

}

} else if (parameters.size == 1) {

parameters.first().takeIf { it.kind != KParameter.Kind.VALUE }?.let { parameter ->

@Suppress("UNCHECKED_CAST")

return reflectionCall {

caller.call(if (isSuspend) arrayOf(args[parameter], continuationArgument) else arrayOf(args[parameter])) as R

}

}

}

For the 0-parameters case, it looks good! But for the 1 parameter, I'm not really sure, the code is getting quite complicated already. Besides, Measurement.oneWithoutDefault regressed in your results after this change, if I understand them correctly? I propose to keep the 0-parameter case only.

Measurement.oneWithoutDefault regressed in your results after this change, if I understand them correctly?

I ran the benchmark again with almost the same content, but there did not seem to be any regression in performance.
Since the benchmark was run on a local machine, there must have been something going on in the background during this measurement.

1e1cd3a

Benchmark Mode Cnt Score Error Units Measurement.fiveWithDefault thrpt 4 1713974.649 ± 807978.785 ops/s Measurement.fiveWithoutDefault thrpt 4 2402911.282 ± 116905.232 ops/s Measurement.oneWithDefault thrpt 4 5018352.177 ± 3050333.309 ops/s Measurement.oneWithoutDefault thrpt 4 6338081.662 ± 384633.083 ops/s Measurement.twentyWithDefault thrpt 4 506022.842 ± 28381.148 ops/s Measurement.twentyWithoutDefault thrpt 4 721987.591 ± 78852.517 ops/s Measurement.zero thrpt 4 12727105.089 ± 188029.864 ops/s

after-zero-arg-opt2

Benchmark Mode Cnt Score Error Units Measurement.fiveWithDefault thrpt 4 1880153.629 ± 232291.891 ops/s Measurement.fiveWithoutDefault thrpt 4 2388820.000 ± 97785.718 ops/s Measurement.oneWithDefault thrpt 4 5276216.078 ± 327025.056 ops/s Measurement.oneWithoutDefault thrpt 4 6251850.821 ± 497046.067 ops/s Measurement.twentyWithDefault thrpt 4 516927.278 ± 13448.841 ops/s Measurement.twentyWithoutDefault thrpt 4 728619.156 ± 36904.744 ops/s Measurement.zero thrpt 4 21774799.243 ± 1534253.147 ops/s

I agree that it is complicated.
I have pushed for a fix.
68a9741

I did not measure benchmark results with the one-parameter branch removed, but at least there is no reason for the score to drop since the target of the measurement is a top-level function.

As a side note, the reason I proposed the original form is that getter is a typical example of a function that takes only instances as arguments.

udalov · 2022-11-29T13:25:49Z

I've merged all commits and slightly changed comments in 3393bb4. I'll push it to master after the build succeeds.

k163377 · 2022-11-29T13:37:43Z

Thanks a lot!

#KT-55178 Fixed Co-authored-by: Alexander Udalov <[email protected]>

udalov · 2022-11-29T21:16:06Z

Merged to master. This change will be released as a part of Kotlin 1.8.20.

Thank you for the contribution! And in particular, thanks for extensive measurement of effects of all the discussed changes. 👍

JetBrains#4840 (comment)

#4840 (comment)

JetBrains#4840 (comment)

#4840 (comment)

udalov added the Reflection label Jun 1, 2022

udalov self-assigned this Jun 1, 2022

k163377 commented Jun 1, 2022

View reviewed changes

k163377 mentioned this pull request Oct 22, 2022

Performance improvement of KCallableImpl.callBy Kotlin/kotlinx.reflect.lite#36

Draft

udalov reviewed Nov 8, 2022

View reviewed changes

core/reflection.jvm/src/kotlin/reflect/jvm/internal/KCallableImpl.kt Outdated Show resolved Hide resolved

k163377 commented Nov 11, 2022

View reviewed changes

core/reflection.jvm/src/kotlin/reflect/jvm/internal/KCallableImpl.kt Show resolved Hide resolved

k163377 commented Nov 11, 2022

View reviewed changes

core/reflection.jvm/src/kotlin/reflect/jvm/internal/KCallableImpl.kt Outdated Show resolved Hide resolved

k163377 commented Nov 12, 2022

View reviewed changes

Improve performance of KCallableImpl.callBy

398b704

#KT-55178 Fixed Co-authored-by: Alexander Udalov <[email protected]>

udalov merged commit 2439c22 into JetBrains:master Nov 29, 2022

k163377 deleted the improve-callby-perf branch November 30, 2022 13:32

k163377 changed the title ~~Performance improvement of KFunction.callBy~~ KT-55178 Performance improvement of KFunction.callBy Nov 30, 2022

k163377 added a commit to k163377/kotlin that referenced this pull request Dec 3, 2022

Modified to refer to the isSuspend property

058397e

JetBrains#4840 (comment)

k163377 mentioned this pull request Dec 3, 2022

Modified to refer to the isSuspend property. #5038

Merged

udalov pushed a commit that referenced this pull request Dec 5, 2022

Modified to refer to the isSuspend property

469ae27

#4840 (comment)

udalov pushed a commit to k163377/kotlin that referenced this pull request Dec 6, 2022

Minor, use isSuspend in KCallableImpl.extractContinuationArgument

3711630

JetBrains#4840 (comment)

udalov pushed a commit that referenced this pull request Dec 6, 2022

Minor, use isSuspend in KCallableImpl.extractContinuationArgument

ccac4ac

#4840 (comment)

-        val parameterSize = parameters.size + (if (isSuspend) 1 else 0)
+        val parameterSize = parameters.size + (if (isSuspend) 1 else 0)
+        // Optimization for general no-argument functions.
+        if (parameters.isEmpty()) {
+            @Suppress("UNCHECKED_CAST")
+            return reflectionCall {
+                caller.call(if (isSuspend) arrayOf(continuationArgument) else emptyArray()) as R
+            }
+        } else if (parameters.size == 1) {
+            parameters.first().takeIf { it.kind != KParameter.Kind.VALUE }?.let { parameter ->
+                @Suppress("UNCHECKED_CAST")
+                return reflectionCall {
+                    caller.call(if (isSuspend) arrayOf(args[parameter], continuationArgument) else arrayOf(args[parameter])) as R
+                }
+            }
+        }

KT-55178 Performance improvement of KFunction.callBy #4840

KT-55178 Performance improvement of KFunction.callBy #4840

Conversation

k163377 commented Jun 1, 2022

Significance of this improvement

Improvement Details

Benchmark Results

Uh oh!

k163377 Jun 1, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

k163377 Jun 1, 2022

Choose a reason for hiding this comment

Uh oh!

k163377 commented Oct 22, 2022

Uh oh!

udalov commented Nov 1, 2022

Uh oh!

k163377 commented Nov 2, 2022

Uh oh!

k163377 commented Nov 2, 2022

Uh oh!

Uh oh!

udalov commented Nov 8, 2022

Uh oh!

Uh oh!

k163377 commented Nov 11, 2022

Uh oh!

Uh oh!

k163377 Nov 12, 2022

Choose a reason for hiding this comment

Uh oh!

udalov Nov 24, 2022

Choose a reason for hiding this comment

Uh oh!

k163377 Nov 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

k163377 Nov 25, 2022

Choose a reason for hiding this comment

Uh oh!

udalov commented Nov 29, 2022

Uh oh!

k163377 commented Nov 29, 2022

Uh oh!

udalov commented Nov 29, 2022

Uh oh!

Uh oh!

k163377 Nov 25, 2022 •

edited

Loading