-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[vm/ffi] Investigate copy-free Strings #39787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is particularly painful for a Win32 function like SysAllocString(), since creating a
|
Generally speaking we cannot hand out pointers to memory inside the VM's managed / garbage collected heap, because while C code runs the GC can move objects. In addition to that, the VM uses different string representations which might be incompatible with what the C side wants. @timsneath Why does it require two copies? Can you not examine the Dart string to determine what the encoded length would be, then allocate a correctly sized buffer, write length + encoded string? |
const MAX_STRING = 256;
final rawString = Pointer<Uint16>.allocate(count: MAX_STRING).cast<Utf16>();
rawString = Utf16.fromString('Aarhus is a beautiful city.'); // copy from Dart to unmanaged
final bstrString = SysAllocString(rawString); // Win32 makes a second copy here
... // do stuff
SysFreeString(bstrString);
free(rawString); |
@timsneath If the only goal is to avoid the second memory allocation and copy, would something like this do the trick: foo(String string) {
// Allocate BSTR without initializing it (i.e. no copy of bytes)
final bstr = SysAllocStringByteLen(nullptr, 2 * string.length).cast<Uint16>();
// Initialize the BSTR (remember "bstr" points to the actual 16-bit character buffer, not at the length prefix)
for (int i = 0; i < string.length; ++i) {
bstr[i] = string.codeUnitAt(i);
}
// <do something with "bstr">
// Free the BSTR
SysFreeString(bstr);
} ? |
Yes, this works. But I think I'll need to wind up wrapping BSTR as a whole
so that I can embed this kind of logic rather than expecting the package
user to be aware of these subtleties.
Thanks, Martin.
…On Tue, Dec 22, 2020 at 7:21 AM Martin Kustermann ***@***.***> wrote:
@timsneath <https://github.com/timsneath> If the only goal is to avoid
the second memory allocation and copy, would something like this do the
trick:
foo(String string) {
// Allocate BSTR without initializing it (i.e. no copy of bytes)
final bstr = SysAllocStringByteLen(nullptr, 2 * string.length).cast<Uint16>();
// Initialize the BSTR (remember "bstr" points to the actual 16-bit character buffer, not at the length prefix)
for (int i = 0; i < string.length; ++i) {
bstr[i] = string.codeUnitAt(i);
}
// <do something with "bstr">
// Free the BSTR
SysFreeString(bstr);
}
?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#39787 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AARWL6YMZSYWMFHLNQWWHNDSWC2OTANCNFSM4J2O2CQQ>
.
|
In leaf-calls we could consider allowing conversion between String and Pointer. In a fashion similar to unwrapping TypedData #44589. The type argument of pointer in the signature should then specify what encoding to use. |
We should consider if we want to also add equivalent operations on the Originally posted by @mraleph in #50494 (comment) We could add extension methods to The only issue is that Utf8 and Utf16 are defined in Putting it in The only way to put it in Transplanting the |
Some notes from discussion @robertbastian and follow up investigation. Strings internally can have multiple representations (OneByteString, TwoByteString). Completely copy-free strings are only possible if
Because strings can have multiple encodings in the runtime, we could try to make the FFI unwrap strings if all of the above hold, and otherwise allocate a temporary re-encoded string (condition 4 must still be true). |
My current thinking is something like class String {
Utf8View get utf8View => Utf8View(self);
Utf16View get utf16View => Utf16View(self);
}
// vm provided, same for utf16
abstract class Utf8View {
int get length;
}
foo(v: String) {
final vView = v.utf8View;
fooFfi(vView, vView.length);
}
static final fooFfi =
_capi<ffi.NativeFunction<Void Function(Pointer<Uint8>, ffi.Size)>>('foo')
.asFunction<void Function(Pointer<Uint8>, int)>(isLeaf: true); Here,
In other cases, we have to allocate. It would be nice if the VM could take care of the allocation, and release it after the call (I'm currently using an We might want to do some special casing of Zero-termination could also be part of this design, with a flag on Currently |
Following the view idea. The Dart type should be the view in this case, cecause the borrowing can only happen in the FFI call itself. (If it were to happen earlier and be passed around as a static final fooFfi =
_capi<ffi.NativeFunction<Void Function(Pointer<Uint8>, ffi.Size)>>('foo')
.asFunction<void Function(Utf8View, int)>(isLeaf: true);
This would require some trickery in argument evaluation of FFI calls. If an argument pair If the length were to be implemented as a |
However, we might need something more performant (which avoids copying), if this is not performant enough.
Originally posted by @dcharkes in #35762 (comment)
Our null-terminated Utf8 and Utf16 string helpers in package:ffi require copying bytes from Dart to C.
We should investigate whether we can have pass strings from C to Dart without copying, and whether we can pass Utf16 strings from Dart to C without copying. The latter is unlikely though, as the Dart Garbage Collector might relocate the String.
The text was updated successfully, but these errors were encountered: