Skip to content

KT-55178 Performance improvement of KFunction.callBy #4840

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 29, 2022
Merged

KT-55178 Performance improvement of KFunction.callBy #4840

merged 1 commit into from
Nov 29, 2022

Conversation

k163377
Copy link
Contributor

@k163377 k163377 commented Jun 1, 2022

Performance improvements have been made to KCallableImpl.callDefaultMethod (KFunction.callBy).

Significance of this improvement

KFunction.callBy is almost always used when function calls are made using kotlin-reflect.
Improving this performance will improve the performance of the entire library that depends on kotlin-reflect.

Improvement Details

The contents of the array used for reflection calls are definitive for the target function.
On the other hand, the current implementation manages arguments dynamically.

In this PR, I have improved performance by caching as much of the processing as possible on the first run, and also by dropping the use of ArrayList.

Benchmark Results

I have created a simple benchmark project to compare 1.7.0-RC with this change.
The content compares the calling process for functions with 0, 1, 5, and 20 arguments, using all default arguments or not using default arguments.
https://github.com/k163377/kfunction-call-by-benchmark

The results are as follows (for ease of viewing, only the order of the results has been rearranged).
The higher the score, the better.

before

Benchmark                          Mode  Cnt         Score        Error  Units
Measurement.zero                  thrpt    4  17642071.229 ± 258708.101  ops/s
Measurement.oneWithDefault        thrpt    4   1134130.294 ±  25929.190  ops/s
Measurement.oneWithoutDefault     thrpt    4   6126880.972 ± 419737.002  ops/s
Measurement.fiveWithDefault       thrpt    4    424029.672 ±  42593.194  ops/s
Measurement.fiveWithoutDefault    thrpt    4   2084516.569 ± 204075.004  ops/s
Measurement.twentyWithDefault     thrpt    4    134928.255 ±  41719.699  ops/s
Measurement.twentyWithoutDefault  thrpt    4    608741.935 ± 106547.416  ops/s

after

Benchmark                          Mode  Cnt         Score        Error  Units
Measurement.zero                  thrpt    4  13168129.895 ± 492380.039  ops/s
Measurement.oneWithDefault        thrpt    4   4903788.122 ± 383012.856  ops/s
Measurement.oneWithoutDefault     thrpt    4   6293334.841 ± 211793.899  ops/s
Measurement.fiveWithDefault       thrpt    4   1924051.074 ±  34722.071  ops/s
Measurement.fiveWithoutDefault    thrpt    4   2255907.246 ± 616366.197  ops/s
Measurement.twentyWithDefault     thrpt    4    530805.418 ±   7278.245  ops/s
Measurement.twentyWithoutDefault  thrpt    4    664515.652 ±  18892.456  ops/s

For the case where all default arguments are used, there is a significant performance improvement.
For patterns that did not use default arguments, performance was marginally improved, or at least not degraded.

For the no-argument function calls, performance tended to be worse.
I considered optimizing for argumentless functions, but decided against it for the following reasons

  • Calls to no-argument functions are very performant to begin with.
  • I did not feel that the pattern was important enough to require optimization.

@udalov udalov self-assigned this Jun 1, 2022
arguments
}

private fun getAbsentArguments(): Array<Any?> = _absentArguments().clone()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since clone processing is required, it is defined as a function rather than a property.

Comment on lines 120 to 131
// set absent values
parameters.forEach { parameter ->
if (parameter.isOptional && !parameter.type.isInlineClassType) {
// For inline class types, the javaType refers to the underlying type of the inline class,
// but we have to pass null in order to mark the argument as absent for InlineClassAwareCaller.
arguments[parameter.index] = defaultPrimitiveValue(parameter.type.javaType)
} else if (parameter.isVararg) {
arguments[parameter.index] = defaultEmptyArray(parameter.type)
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By setting the abscent value in advance, processing can be omitted at the time of the call.
Even if the argument is set, it will be overwritten, so there is no effect.

@k163377
Copy link
Contributor Author

k163377 commented Oct 22, 2022

@udalov
Would it be difficult for you to review this PR?
I think the KFunction.callBy speedup is an important change for kotlin-reflect users.

If you are concerned about the amount of changes or increased memory consumption due to the introduction of cache, I can change the content as follows
Kotlin/kotlinx.reflect.lite#36

@udalov
Copy link
Member

udalov commented Nov 1, 2022

Sorry for the delay! I agree, this optimization can be very useful.

I think we can go further and make the getAbsentArguments array of size parameters.size + (if (isSuspend) 1 else 0) + masks.size + 1. This will get rid of another array creation at the end of callDefaultMethod. This will, however, require to copy the array in case no default arguments were omitted (to take its prefix of size parameters.size + (if (isSuspend) 1 else 0)), which is the !anyOptional case. But maybe that's fine since the user can also use the more performant KCallable.call in that case anyway. What do you think?

@k163377
Copy link
Contributor Author

k163377 commented Nov 2, 2022

Thanks for the review.

I agree with your suggestions.
I think it is more reasonable to optimize callBy to use default values, since the optimization of using call when all arguments are given can be easily done by the user.

I will revise and push this when I have time.

@k163377
Copy link
Contributor Author

k163377 commented Nov 2, 2022

@udalov
Fixed and rebased.

Benchmark results for 1.7.20, before and after fix, are as follows

1.7.20

Benchmark                          Mode  Cnt         Score         Error  Units
Measurement.zero                  thrpt    4  17404199.235 ± 1055469.861  ops/s
Measurement.oneWithDefault        thrpt    4   1088190.688 ±  252820.971  ops/s
Measurement.oneWithoutDefault     thrpt    4   6106006.345 ±  313929.311  ops/s
Measurement.fiveWithDefault       thrpt    4    427439.340 ±   18559.512  ops/s
Measurement.fiveWithoutDefault    thrpt    4   2025561.126 ±   78028.024  ops/s
Measurement.twentyWithDefault     thrpt    4    121692.511 ±   45871.752  ops/s
Measurement.twentyWithoutDefault  thrpt    4    612366.962 ±  110886.024  ops/s

before fix

Benchmark                          Mode  Cnt         Score         Error  Units
Measurement.zero                  thrpt    4  13048884.146 ± 1348295.385  ops/s
Measurement.oneWithDefault        thrpt    4   4554089.123 ± 2762308.518  ops/s
Measurement.oneWithoutDefault     thrpt    4   6559863.859 ±  423675.272  ops/s
Measurement.fiveWithDefault       thrpt    4   1925996.082 ±   69195.213  ops/s
Measurement.fiveWithoutDefault    thrpt    4   2482991.930 ±  117665.385  ops/s
Measurement.twentyWithDefault     thrpt    4    547109.098 ±   23378.718  ops/s
Measurement.twentyWithoutDefault  thrpt    4    734820.642 ±  141763.019  ops/s

after fix

Benchmark                          Mode  Cnt         Score        Error  Units
Measurement.zero                  thrpt    4  13134715.929 ± 152888.381  ops/s
Measurement.oneWithDefault        thrpt    4   5384349.821 ± 454531.756  ops/s
Measurement.oneWithoutDefault     thrpt    4   6452819.026 ± 376521.431  ops/s
Measurement.fiveWithDefault       thrpt    4   1948375.781 ± 199051.047  ops/s
Measurement.fiveWithoutDefault    thrpt    4   2461809.238 ± 123859.545  ops/s
Measurement.twentyWithDefault     thrpt    4    520222.742 ±  76772.643  ops/s
Measurement.twentyWithoutDefault  thrpt    4    737876.722 ±  94725.636  ops/s

There was not much difference between before and after the fix.

In addition, I also benchmarked the results of limited changes (made to kotlin-reflect-lite Kotlin/kotlinx.reflect.lite#36).
https://github.com/k163377/kotlin/tree/only-array-opt
The results are as follows

Benchmark                          Mode  Cnt         Score         Error  Units
Measurement.zero                  thrpt    4  15824694.741 ±  416278.522  ops/s
Measurement.oneWithDefault        thrpt    4   1989567.103 ±  668681.760  ops/s
Measurement.oneWithoutDefault     thrpt    4   6748946.272 ±  522893.716  ops/s
Measurement.fiveWithDefault       thrpt    4    459237.317 ±   81528.824  ops/s
Measurement.fiveWithoutDefault    thrpt    4   2354501.801 ± 1219202.447  ops/s
Measurement.twentyWithDefault     thrpt    4    128773.826 ±   21388.445  ops/s
Measurement.twentyWithoutDefault  thrpt    4    723547.425 ±   71860.672  ops/s

In this case, the improvement was relatively small.

@udalov
Copy link
Member

udalov commented Nov 8, 2022

Thanks! I've also noticed that now we perform integer boxing when computing the mask for every default argument and I'd like to avoid it. What do you think about the following (hopefully last) optimization 388fdc4?

@k163377
Copy link
Contributor Author

k163377 commented Nov 11, 2022

What do you think about the following (hopefully last) optimization 388fdc4?

It does not seem to be a good fix, at least as far as the benchmark results are concerned.

Even with this optimization, the speedup appears to be only when many default arguments are used.
In other cases, it appears to be offset by the cost of branching.

638d89c

Benchmark                          Mode  Cnt         Score        Error  Units
Measurement.zero                  thrpt    4  12938281.870 ± 488259.178  ops/s
Measurement.fiveWithDefault       thrpt    4   1940327.443 ± 198156.544  ops/s
Measurement.fiveWithoutDefault    thrpt    4   2442313.516 ±  39665.853  ops/s
Measurement.oneWithDefault        thrpt    4   5451979.125 ± 715952.574  ops/s
Measurement.oneWithoutDefault     thrpt    4   6475176.599 ± 349258.846  ops/s
Measurement.twentyWithDefault     thrpt    4    505560.683 ±  42324.766  ops/s
Measurement.twentyWithoutDefault  thrpt    4    726913.081 ± 136844.449  ops/s

When 388fdc4 is imported

Benchmark                          Mode  Cnt         Score         Error  Units
Measurement.zero                  thrpt    4  13029498.008 ±  255137.936  ops/s
Measurement.fiveWithDefault       thrpt    4   1832503.215 ± 1117903.663  ops/s
Measurement.fiveWithoutDefault    thrpt    4   2399084.679 ±  117597.661  ops/s
Measurement.oneWithDefault        thrpt    4   5460558.554 ±  279303.773  ops/s
Measurement.oneWithoutDefault     thrpt    4   6325791.468 ±  187765.117  ops/s
Measurement.twentyWithDefault     thrpt    4    598262.914 ±   26892.919  ops/s
Measurement.twentyWithoutDefault  thrpt    4    729748.383 ±  100592.024  ops/s

val masks = ArrayList<Int>(1)
var index = 0
var anyOptional = false
val parameterSize = parameters.size + (if (isSuspend) 1 else 0)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following optimization to no-argument functions was found to be nearly twice as fast as 638d89c.
Does this optimization look like it should be incorporated?

Benchmark                          Mode  Cnt         Score         Error  Units
Measurement.zero                  thrpt    4  22299885.316 ±  295995.533  ops/s
Measurement.fiveWithDefault       thrpt    4   1927106.356 ±  225032.224  ops/s
Measurement.fiveWithoutDefault    thrpt    4   2460623.502 ±   96649.714  ops/s
Measurement.oneWithDefault        thrpt    4   5384826.666 ±  340254.100  ops/s
Measurement.oneWithoutDefault     thrpt    4   5654229.984 ± 5914104.417  ops/s
Measurement.twentyWithDefault     thrpt    4    510341.805 ±    2940.113  ops/s
Measurement.twentyWithoutDefault  thrpt    4    739231.653 ±   21298.002  ops/s
Suggested change
val parameterSize = parameters.size + (if (isSuspend) 1 else 0)
val parameterSize = parameters.size + (if (isSuspend) 1 else 0)
// Optimization for general no-argument functions.
if (parameters.isEmpty()) {
@Suppress("UNCHECKED_CAST")
return reflectionCall {
caller.call(if (isSuspend) arrayOf(continuationArgument) else emptyArray()) as R
}
} else if (parameters.size == 1) {
parameters.first().takeIf { it.kind != KParameter.Kind.VALUE }?.let { parameter ->
@Suppress("UNCHECKED_CAST")
return reflectionCall {
caller.call(if (isSuspend) arrayOf(args[parameter], continuationArgument) else arrayOf(args[parameter])) as R
}
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the 0-parameters case, it looks good! But for the 1 parameter, I'm not really sure, the code is getting quite complicated already. Besides, Measurement.oneWithoutDefault regressed in your results after this change, if I understand them correctly? I propose to keep the 0-parameter case only.

Copy link
Contributor Author

@k163377 k163377 Nov 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Measurement.oneWithoutDefault regressed in your results after this change, if I understand them correctly?

I ran the benchmark again with almost the same content, but there did not seem to be any regression in performance.
Since the benchmark was run on a local machine, there must have been something going on in the background during this measurement.

1e1cd3a

Benchmark                          Mode  Cnt         Score         Error  Units
Measurement.fiveWithDefault       thrpt    4   1713974.649 ±  807978.785  ops/s
Measurement.fiveWithoutDefault    thrpt    4   2402911.282 ±  116905.232  ops/s
Measurement.oneWithDefault        thrpt    4   5018352.177 ± 3050333.309  ops/s
Measurement.oneWithoutDefault     thrpt    4   6338081.662 ±  384633.083  ops/s
Measurement.twentyWithDefault     thrpt    4    506022.842 ±   28381.148  ops/s
Measurement.twentyWithoutDefault  thrpt    4    721987.591 ±   78852.517  ops/s
Measurement.zero                  thrpt    4  12727105.089 ±  188029.864  ops/s

after-zero-arg-opt2

Benchmark                          Mode  Cnt         Score         Error  Units
Measurement.fiveWithDefault       thrpt    4   1880153.629 ±  232291.891  ops/s
Measurement.fiveWithoutDefault    thrpt    4   2388820.000 ±   97785.718  ops/s
Measurement.oneWithDefault        thrpt    4   5276216.078 ±  327025.056  ops/s
Measurement.oneWithoutDefault     thrpt    4   6251850.821 ±  497046.067  ops/s
Measurement.twentyWithDefault     thrpt    4    516927.278 ±   13448.841  ops/s
Measurement.twentyWithoutDefault  thrpt    4    728619.156 ±   36904.744  ops/s
Measurement.zero                  thrpt    4  21774799.243 ± 1534253.147  ops/s

I agree that it is complicated.
I have pushed for a fix.
68a9741

I did not measure benchmark results with the one-parameter branch removed, but at least there is no reason for the score to drop since the target of the measurement is a top-level function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a side note, the reason I proposed the original form is that getter is a typical example of a function that takes only instances as arguments.

@udalov
Copy link
Member

udalov commented Nov 29, 2022

I've merged all commits and slightly changed comments in 3393bb4. I'll push it to master after the build succeeds.

@k163377
Copy link
Contributor Author

k163377 commented Nov 29, 2022

Thanks a lot!

 #KT-55178 Fixed

Co-authored-by: Alexander Udalov <[email protected]>
@udalov udalov merged commit 2439c22 into JetBrains:master Nov 29, 2022
@udalov
Copy link
Member

udalov commented Nov 29, 2022

Merged to master. This change will be released as a part of Kotlin 1.8.20.

Thank you for the contribution! And in particular, thanks for extensive measurement of effects of all the discussed changes. 👍

@k163377 k163377 deleted the improve-callby-perf branch November 30, 2022 13:32
@k163377 k163377 changed the title Performance improvement of KFunction.callBy KT-55178 Performance improvement of KFunction.callBy Nov 30, 2022
k163377 added a commit to k163377/kotlin that referenced this pull request Dec 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants