[ffi] function call overhead? #52692

modulovalue · 2023-06-13T11:43:14Z

I wanted to see if it would be possible to implement the features proposed in #52673 via FFI.

I've created a demo for compiling pure assembly to a dylib. (The demo can be found here. That file in that repo should be able to run on any arm/macos machine and reproduce my observations.)

I was able to make the following measurements on my arm macbook:

 - total: 9884992
   => via lookup table took: 3.816ms
 - total: 499999500000
   => via assembly took: 9.608ms
 - total: 499999500000
   => id control took: 0.473ms
 - total: 499999500000
   => id control via closure call took: 4.765ms
 - total: 499999500000
   => id control via function call took: 3.405ms

The "lookup table" part calculates popcounts via a lookup table approach in Dart.
The "assembly" part passes a single integer to Dart in assembly via a dylib.
"id control" does what "assembly" does and nothing else.
"id control via closure" does what "assembly" does via a closure.
"id control via function" does what "assembly" does via a do-not-inline annotated function.

My conclusion is that the overhead of calling some assembly via a dynamically linked library takes ~2x the time it would take to invoke a closure and ~3x the time it would take to invoke a function. Furthermore, it seems like it wouldn't make sense to use this approach for exposing ARM instructions to Dart, as the overhead of calling them is too big.

I did expect the dylib calls to have some overhead, but I don't know how much overhead to expect.

I wanted to ask the following:

Is this observation about the overhead of dylib function calls expected?
Is the overhead of calling dylib functions expected to change in the near feature?
Are native assets expected to improve the overhead of native function calls?
Would statically linked libraries (It appears like there's some work planned on that) improve the performance of native function calls?
Are there any plans to support some form of optimization (LTO?) that could inline functions found in statically linked libraries to remove the overhead of calling functions in native code?

All in all, I'd like to be able to ship custom assembly with Dart, and the above questions are meant to find out if that could become practical (in a way that does not incur any performance penalties) in some form in the near future.

The text was updated successfully, but these errors were encountered:

mit-mit · 2023-06-13T11:45:39Z

cc @dcharkes

mit-mit · 2023-06-13T11:47:09Z

What Dart execution mode are you running the measurements on? JIT (dart run ...)? Or AOT (dart compile exe ...)?

modulovalue · 2023-06-13T11:49:36Z

My measurements were taken on JIT (dart run ...). I haven't tested AOT (but I should do that. However, the demo does not support that because it uses dart:mirrors).

modulovalue · 2023-06-13T11:55:00Z

Here are the results for AOT:

 - total: 9884992
   => via lookup table took: 3.013ms
 - total: 499999500000
   => via assembly took: 6.905ms
 - total: 499999500000
   => id control took: 0.483ms
 - total: 499999500000
   => id control via closure call took: 3.105ms
 - total: 499999500000
   => id control via function call took: 0.952ms

Pure function calls have improved significantly, "assembly" has improved only slightly in a way that does not affect my conclusion.

dcharkes · 2023-06-13T15:31:34Z

@modulovalue

lookupFunction -> should have isLeaf: true, this will make it faster.

Using the @Native external functions instead of DynamicLibrary.open + lookupFunction will make it faster (also use isLeaf: true). You need to use the --enable-experiment=native-assets flag for this. For more info see #50565. (An alternative to using the experimental flag is to dlopen with global flags first, and then @Native externals will be resolved in the process. See an example in https://github.com/dart-lang/sdk/tree/main/benchmarks/FfiCall/dart.)

@Native externals will become even faster after landing: https://dart-review.googlesource.com/c/sdk/+/284300

Background info:

lookupFunction creates a closure, so it is always slower than a closure call
a dart function call might not be a function call, it might be inlined 🚀 If you want to measure the function call itself you can add @pragma('vm:never-inline'). You can also check with dart --trace-inlining ... what inlining is happening.

The remaining questions:

Would statically linked libraries (It appears like there's some work planned on that) improve the performance of native function calls?

I have done some exploration #49418 https://dart-review.googlesource.com/c/sdk/+/251263.

I have not done any performance measurements on the exploration. I expect it to be maybe slightly faster but not a whole lot than @Native external calls, the only difference would be removing a single load and the call instruction taking a relative address rather than a register as argument. (Which might make the branch predictor happy possibly.)

Yes, I'd like to land that work =)

Are there any plans to support some form of optimization (LTO?) that could inline functions found in statically linked libraries to remove the overhead of calling functions in native code?

@mraleph mentioned that LTO doesn't work unfortunately: #49418 (comment)

mit-mit added area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. library-ffi labels Jun 13, 2023

modulovalue mentioned this issue Jun 13, 2023

[ffi] isLeaf changes the value returned by calling lookupFunction. #52696

Closed

a-siva added type-performance Issue relates to performance or code size triaged Issue has been triaged by sub team P3 A lower priority bug or feature request labels Nov 30, 2023

rainyl mentioned this issue Jul 30, 2024

property getters should be leaf functions rainyl/opencv_dart#190

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ffi] function call overhead? #52692

[ffi] function call overhead? #52692

modulovalue commented Jun 13, 2023 •

edited

Loading

mit-mit commented Jun 13, 2023

mit-mit commented Jun 13, 2023

modulovalue commented Jun 13, 2023

modulovalue commented Jun 13, 2023

dcharkes commented Jun 13, 2023 •

edited

Loading

[ffi] function call overhead? #52692

[ffi] function call overhead? #52692

Comments

modulovalue commented Jun 13, 2023 • edited Loading

mit-mit commented Jun 13, 2023

mit-mit commented Jun 13, 2023

modulovalue commented Jun 13, 2023

modulovalue commented Jun 13, 2023

dcharkes commented Jun 13, 2023 • edited Loading

modulovalue commented Jun 13, 2023 •

edited

Loading

dcharkes commented Jun 13, 2023 •

edited

Loading