-
Notifications
You must be signed in to change notification settings - Fork 1.7k
VM compiler support for cachable function calls in IL #51618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Context https://dart-review.googlesource.com/c/sdk/+/284300/7/runtime/vm/compiler/frontend/kernel_to_il.cc The prototype does:
This approach would be:
This would also be cleaner from a layering perspective. The prototype exposes the 'pool' in IL, but the pool is not accessible in IL it only exists in MC generation. |
Some observations from trying to implement this in a general way, using this in #47625, and chat discussions. (Posting them here so they don't get lost.)
Since we only have a single use-case now, we might avoid over-generalizing the implementation for now. |
We don't visit the object pool entries. If we were to visit object pool entries we would have to never store any raw values with the least significant bit set to 1 or split it into two groups. Until then, cacheable static calls will only work for static functions returning an |
Modeling these cachable static calls after static calls does not give use the entirely desired behavior. Because a slow path is an assembler concept using the normal IL pushing leads to the arguments being loaded on the fast path. Kernel building: body += Constant(asset_id);
body += Constant(symbol);
body += Constant(Smi::ZoneHandle(Smi::New(arg_n)));
body +=
MemoizableIdempotentCall(TokenPosition::kNoSource, FfiResolverFunction(),
/*argument_count=*/3,
/*argument_names=*/Array::null_array(),
/*type_args_count=*/0); Resulting machine code:
I'll investigate what would be required to achieve that. Edit: We could possibly refer to the constant args in the instruction itself, rather than push them on the stack when building the IL graph. So the arguments would not be visible in IL graph, the instruction would have no inputs, and we'd the constants from the pool in machine code on the slow path to the right place. "The right place" should then correspond to what a static call expects. |
Exactly. As long as the cacheable mechanism doesn't support dynamic args but only constants, that's the obvious way to do it. |
Adds a `CachableIdempotentCallInstr` that can be invoked via `@pragma('vm:cachable-idempotent')` if the call-sites is force optimized. The object pool is not visited by the scavenger. So, we store the results as unboxed integers. Consequently, only dart functions that return integers can be cached. Cachable idempotent calls should never be inlined. After the first call the function not be called again. The call itself is on a slow path to avoid register spilling on the fast path. TEST=vm/cc/IRTest_CachableIdempotentCall TEST=runtime/tests/vm/dart/cachable_idempotent_test.dart Bug: #51618 Change-Id: I612e896f27add76f57796c060157e14cc687a0fd Cq-Include-Trybots: luci.dart.try:vm-aot-android-release-arm64c-try,vm-aot-android-release-arm_x64-try,vm-aot-asan-linux-release-x64-try,vm-aot-linux-debug-simarm_x64-try,vm-aot-linux-debug-simriscv64-try,vm-aot-mac-release-arm64-try,vm-aot-mac-release-x64-try,vm-aot-msan-linux-release-x64-try,vm-aot-obfuscate-linux-release-x64-try,vm-aot-tsan-linux-release-x64-try,vm-aot-ubsan-linux-release-x64-try,vm-aot-win-debug-arm64-try,vm-aot-win-debug-x64c-try,vm-aot-win-release-x64-try,vm-appjit-linux-debug-x64-try,vm-asan-linux-release-x64-try,vm-checked-mac-release-arm64-try,vm-eager-optimization-linux-release-ia32-try,vm-eager-optimization-linux-release-x64-try,vm-kernel-linux-debug-x64-try,vm-kernel-precomp-linux-release-x64-try,vm-linux-debug-ia32-try,vm-linux-debug-simriscv64-try,vm-linux-debug-x64-try,vm-mac-debug-arm64-try,vm-mac-debug-x64-try,vm-msan-linux-release-x64-try,vm-reload-linux-debug-x64-try,vm-reload-rollback-linux-debug-x64-try,vm-tsan-linux-release-x64-try,vm-ubsan-linux-release-x64-try,vm-win-debug-arm64-try,vm-win-debug-x64-try,vm-win-debug-x64c-try,vm-win-release-ia32-try Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/301601 Reviewed-by: Ryan Macnak <[email protected]> Reviewed-by: Martin Kustermann <[email protected]>
This CL removes static fields for storing the `@Native`'s function addresses. Instead, the function addresses are stored in the object pool for all archs except for ia32. ia32 has no AOT and no AppJit snapshots, so the addresses are directly embedded in the code. This CL removes the closure wrapping for `@Native`s. Instead of `pointer.asFunctionInternal()()` where `asFunction` returns a closure which contains the trampoline, the function is compiled to a body which contains the trampoline `Native()`. This is possible for `@Native`s because the dylib and symbol names are known statically. Doing the compilation in kernel_to_il instead of a CFE transform enables supporting static linking later. (The alternative would have been to transform in the cfe to a `@pragma('vm:cachable-idempotent')` instead of constructing the IL in kernel_to_il. To enable running resolution in ia32 in kernel_to_il.cc, the resolution function has been made available via `runtime/lib/ffi_dynamic_library.h`. Because the new calls are simply static calls, the TFA can figure out const arguments flowing to these calls. This leads to constant locations in the parameters to FfiCalls. So, this CL also introduces logic to move constants into `NativeLocation`s. TEST=runtime/vm/compiler/backend/il_test.cc TEST=tests/ffi/function_*_native_(leaf_)test.dart TEST=pkg/vm/testcases/transformations/ffi/ffinative_compound_return.dart Closes: #47625 Closes: #51618 Change-Id: Ic5d017005dedcedea40c455c4d24dbe774f91603 CoreLibraryReviewExempt: Internal FFI implementation changes Cq-Include-Trybots: luci.dart.try:vm-aot-android-release-arm64c-try,vm-aot-android-release-arm_x64-try,vm-aot-linux-debug-x64-try,vm-aot-linux-debug-x64c-try,vm-aot-mac-release-arm64-try,vm-aot-mac-release-x64-try,vm-aot-obfuscate-linux-release-x64-try,vm-aot-win-debug-arm64-try,vm-aot-win-debug-x64c-try,vm-aot-win-release-x64-try,vm-appjit-linux-debug-x64-try,vm-asan-linux-release-x64-try,vm-checked-mac-release-arm64-try,vm-eager-optimization-linux-release-ia32-try,vm-eager-optimization-linux-release-x64-try,vm-ffi-android-debug-arm-try,vm-ffi-android-debug-arm64c-try,vm-ffi-qemu-linux-release-arm-try,vm-ffi-qemu-linux-release-riscv64-try,vm-fuchsia-release-x64-try,vm-kernel-linux-debug-x64-try,vm-kernel-precomp-linux-release-x64-try,vm-linux-debug-ia32-try,vm-linux-debug-x64-try,vm-linux-debug-x64c-try,vm-mac-debug-arm64-try,vm-mac-debug-x64-try,vm-msan-linux-release-x64-try,vm-reload-linux-debug-x64-try,vm-reload-rollback-linux-debug-x64-try,vm-ubsan-linux-release-x64-try,vm-win-debug-arm64-try,vm-win-debug-x64-try,vm-win-debug-x64c-try,vm-win-release-ia32-try Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/284300 Commit-Queue: Daco Harkes <[email protected]> Reviewed-by: Alexander Markov <[email protected]> Reviewed-by: Martin Kustermann <[email protected]>
This reverts commit e16bb21. Reason for revert: Indication this caused engine test failures of the kind: ``` �[0;32m[ RUN ] �[mEmbedderA11yTest.A11yTreeIsConsistentUsingV1Callbacks ../../flutter/shell/platform/embedder/tests/embedder_a11y_unittests.cc:639: Failure Expected equality of these values: std::strncmp(kTooltip, node->tooltip, sizeof(kTooltip) - 1) Which is: 116 0 ``` Original change's description: > [vm/ffi] Optimize `@Native` calls > > This CL removes static fields for storing the `@Native`'s function > addresses. Instead, the function addresses are stored in the object > pool for all archs except for ia32. ia32 has no AOT and no AppJit > snapshots, so the addresses are directly embedded in the code. > > This CL removes the closure wrapping for `@Native`s. Instead of > `pointer.asFunctionInternal()()` where `asFunction` returns a closure > which contains the trampoline, the function is compiled to a body > which contains the trampoline `Native()`. This is possible for > `@Native`s because the dylib and symbol names are known statically. > > Doing the compilation in kernel_to_il instead of a CFE transform > enables supporting static linking later. (The alternative would have > been to transform in the cfe to a `@pragma('vm:cachable-idempotent')` > instead of constructing the IL in kernel_to_il. > > To enable running resolution in ia32 in kernel_to_il.cc, the > resolution function has been made available via > `runtime/lib/ffi_dynamic_library.h`. > > Because the new calls are simply static calls, the TFA can figure > out const arguments flowing to these calls. This leads to constant > locations in the parameters to FfiCalls. So, this CL also introduces > logic to move constants into `NativeLocation`s. > > TEST=runtime/vm/compiler/backend/il_test.cc > TEST=tests/ffi/function_*_native_(leaf_)test.dart > TEST=pkg/vm/testcases/transformations/ffi/ffinative_compound_return.dart > > Closes: #47625 > Closes: #51618 > Change-Id: Ic5d017005dedcedea40c455c4d24dbe774f91603 > CoreLibraryReviewExempt: Internal FFI implementation changes > Cq-Include-Trybots: luci.dart.try:vm-aot-android-release-arm64c-try,vm-aot-android-release-arm_x64-try,vm-aot-linux-debug-x64-try,vm-aot-linux-debug-x64c-try,vm-aot-mac-release-arm64-try,vm-aot-mac-release-x64-try,vm-aot-obfuscate-linux-release-x64-try,vm-aot-win-debug-arm64-try,vm-aot-win-debug-x64c-try,vm-aot-win-release-x64-try,vm-appjit-linux-debug-x64-try,vm-asan-linux-release-x64-try,vm-checked-mac-release-arm64-try,vm-eager-optimization-linux-release-ia32-try,vm-eager-optimization-linux-release-x64-try,vm-ffi-android-debug-arm-try,vm-ffi-android-debug-arm64c-try,vm-ffi-qemu-linux-release-arm-try,vm-ffi-qemu-linux-release-riscv64-try,vm-fuchsia-release-x64-try,vm-kernel-linux-debug-x64-try,vm-kernel-precomp-linux-release-x64-try,vm-linux-debug-ia32-try,vm-linux-debug-x64-try,vm-linux-debug-x64c-try,vm-mac-debug-arm64-try,vm-mac-debug-x64-try,vm-msan-linux-release-x64-try,vm-reload-linux-debug-x64-try,vm-reload-rollback-linux-debug-x64-try,vm-ubsan-linux-release-x64-try,vm-win-debug-arm64-try,vm-win-debug-x64-try,vm-win-debug-x64c-try,vm-win-release-ia32-try > Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/284300 > Commit-Queue: Daco Harkes <[email protected]> > Reviewed-by: Alexander Markov <[email protected]> > Reviewed-by: Martin Kustermann <[email protected]> Change-Id: Icc87a6ca33bffecabb15c6b168a06ccc38c2fe5b Cq-Include-Trybots: luci.dart.try:vm-aot-android-release-arm64c-try,vm-aot-android-release-arm_x64-try,vm-aot-linux-debug-x64-try,vm-aot-linux-debug-x64c-try,vm-aot-mac-release-arm64-try,vm-aot-mac-release-x64-try,vm-aot-obfuscate-linux-release-x64-try,vm-aot-win-debug-arm64-try,vm-aot-win-debug-x64c-try,vm-aot-win-release-x64-try,vm-appjit-linux-debug-x64-try,vm-asan-linux-release-x64-try,vm-checked-mac-release-arm64-try,vm-eager-optimization-linux-release-ia32-try,vm-eager-optimization-linux-release-x64-try,vm-ffi-android-debug-arm-try,vm-ffi-android-debug-arm64c-try,vm-ffi-qemu-linux-release-arm-try,vm-ffi-qemu-linux-release-riscv64-try,vm-fuchsia-release-x64-try,vm-kernel-linux-debug-x64-try,vm-kernel-precomp-linux-release-x64-try,vm-linux-debug-ia32-try,vm-linux-debug-x64-try,vm-linux-debug-x64c-try,vm-mac-debug-arm64-try,vm-mac-debug-x64-try,vm-msan-linux-release-x64-try,vm-reload-linux-debug-x64-try,vm-reload-rollback-linux-debug-x64-try,vm-ubsan-linux-release-x64-try,vm-win-debug-arm64-try,vm-win-debug-x64-try,vm-win-debug-x64c-try,vm-win-release-ia32-try No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/333840 Bot-Commit: Rubber Stamper <[email protected]> Reviewed-by: Daco Harkes <[email protected]> Commit-Queue: Martin Kustermann <[email protected]>
In the implementation of FFI natives like this
and a call site like this
we have two resolve the native symbol
"foo"
and then call the C function. We obviously want to avoid to lookup the symbol every time, so we need to either eagerly do it (e.g. at loading time of kernel/aot-snapshot) or we want to cache the result.Old-school natives solve this problem by having a resolve function pointer in the native, which after resolving, gets patched to the actual native. This can be done for old natives more easily, since the calling convention of the resolve function and the target native function is the same. For FFI this is not the case. There's several compilations (e.g. old natives are unoptimized and cannot be inlined, so stack walker can obtain function object and native name, ffi calls could be inlined, ..., resolver function would need to preserve all args, ...)
There may be an easy way to solve this problem without too much overhead. We compile invocations like
foo(1)
to something like this:the
CachableStaticCall()
can be implemented asThat would add a general capability of function calls with caching ability which could be used by any idempotent function call (i.e. arguments are constant and result will be constant).
The text was updated successfully, but these errors were encountered: