-
Notifications
You must be signed in to change notification settings - Fork 1.7k
UTF8.decode is slow (performance) #31954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
All of the decoding is implemented in Dart so performance suffers in AOT mode. JIT mode (debug) does around 15-20 MB per second vs 1.7-2MB in AOT mode on Nexus 5X.
We expect things to become better in AOT mode post Dart 2 transition as AOT code quality should improve considerably. However it is not trivial to reach the parity with JIT on this benchmark. LICENSE is a Latin1 file so it essentially tests how fast we can determine that and create a string out of it. Most important loops for the benchmark are in
Here is how all those functions look: int scanOneByteCharacters(List<int> units, int from) {
final to = endIndex;
final mask = _ONE_BYTE_LIMIT;
for (var i = from; i < to; i++) {
final unit = units[i];
if ((unit & mask) != unit) return i - from;
}
return to - from;
}
In Dart 2 mode we might be able to establish the same thing using type flow analysis, but I am cautiously optimistic about that because it is likely that people invoke It might be that we need to implement something heavier as a real solution - e.g. some mechanism to duplicate / "version" performance sensitive code that manipulates typed lists. |
We have already solved this problem for dart2js/browser. There is a hook for host-specific acceleration. If you hook to a second implementation, you could use C++, or a clone of the current implementation and modify it to use the byte array implementation type. If you do this, I would advise bailing out to the main algorithm to handle error cases since the Dart runtime is incompatible with the browser implementation in error cases. We want to fix the discrepancy by changing the Dart library to match the browser as that would make the in-browser conversion more efficient by avoiding a validation scan. If you fall-back to the main implementation on edge cases, that will ensure that we need to fix edge cases in only one implementation. @lrhn fyi. |
Thanks for the tip about The real problem however is much larger - if the user writes similar code they will see similar slowness (especially comparing JIT vs AOT), so I would like us to think about it in that context too. |
It seems extremely likely to me that any large inputs are going to be from ByteData instances. FWIW, I would be perfectly fine with specialized |
This reduces the jank of bringing up the license screen further. The remaining lost frame or two are caused by Dart itself, see: dart-lang/sdk#31954 dart-lang/sdk#31959 dart-lang/sdk#31960 Fixes flutter#5187
This reduces the jank of bringing up the license screen further. The remaining lost frame or two are caused by Dart itself, see: dart-lang/sdk#31954 dart-lang/sdk#31959 dart-lang/sdk#31960 Fixes #5187
This reduces the jank of bringing up the license screen further. The remaining lost frame or two are caused by Dart itself, see: dart-lang/sdk#31954 dart-lang/sdk#31959 dart-lang/sdk#31960 Fixes flutter#5187
Focusing on just the Dart part with a benchmark that does just String unpack(Uint8List data) {
return utf8.decode(data);
} using a synthetic Uint8List. This yields still a big JIT vs AOT difference:
Here is a quick profile:
|
Just to make sure we are measuring the same thing, here is a stand-alone toy benchmark that operates on long lists of ASCII 'X' bytes, with JIT/AOT timings below.
benchmark: import 'dart:convert';
import 'dart:typed_data';
String unpack(Uint8List data) {
return utf8.decode(data);
}
main(List<String> args) {
var data = new Uint8List.fromList(
new List<int>.generate(1000000, (int i) => 88)
);
String x = null;
Stopwatch stopwatch = new Stopwatch()..start();
for (int i = 0; i < 1000; i++) {
x = unpack(data);
}
print('executed in ${stopwatch.elapsed} (verify="${x.substring(1,10)}")');
}
|
@aartbik It might be also good to have a benchmark that measures |
Sure, I made it "your" format as follows: main(List<String> args) {
var data = new Uint8List.fromList(
new List<int>.generate(1000000, (int i) => 88)
);
if (args.contains('--use-view')) {
data = new Uint8List.view(data.buffer);
}
// AS BEFORE
} which gives me JIT-direct/view: AOT-direct/view |
In the long run, something like cloning for type may help here. In the short run, we could get the utf performance up by specializing a bit more in the library code. The _convertIntercepted() idea is very interesting, but perhaps done a bit too high in the calling tree. Specialization seems only necessary for the leaf methods where most work is done. Still, let me play around with this a bit.... |
Ah, I thought that using generics would make it extremely easy to specialize library code in a non-intrusive way, but Dart does not follow the C++ way of generating different versions for each instance. Without this sort of generics, the code duplication of such a solution seem to become a bit excessive (any best practices on this?). |
One of the ideas I entertained is that we can do a Kernel to Kernel transformation based on TFA analysis results and an explicit pragma hint, e.g. in core libraries we do @pragma('vm.type-specialize')
void convert(List<int> codeUnits, int startIndex, int endIndex) {
/* body */
} which would be a hint for a subsequent transformation to look at TFA analysis results and see which types can reach @pragma('vm.type-specialize')
void convert(List<int> codeUnits, int startIndex, int endIndex) {
if (codeUnits is Uint8List) {
_convert$Uint8List(codeUnits, ...);
} else if (codeUnits is _List) {
_convert$_List(codeUnits, ...);
}
}
void _convert$Uint8List(Uint8List codeUnits, ...) {
// copy of body specialized for Uint8List
} This is one possible approach that works for library code that people are willing to annotate with hints. Another approach would be to try and change how we do
maybe the cost of invoking
Sidenote: it might be good to check that the rest of the code in hot functions actually looks good in AOT mode (i.e. that |
@aartbik I was asked to help with some protobuf decoding speed and it seems like we happen to spend significant time in string decoding. Hope it's ok with you if I look into this? One aspect might be to unify the object layouts of typed data views and non-views. Currently if code is specialized via |
…generally Currently the @pramga('vm:exact-result-type') annotation only works if the function is a recognized method. This change changes that to make the VM just look if a function has the annotation (no matter if it's also in the list of recognized methods or not). Furthermore this CL adds a "has_pragma" bit to [Field] objects, similar to how we have it on [Function]/[Class]es. This allows annotating fields with types, as we do with function return types. Furthermore this CL lets the type propgagator use @pragma('vm:exact-result-type') annotations to narrow the [CompileType] set on [LoadFieldInstr]s. Since the @pragma is a general feature, this CL moves the `Function::FindPragma()` to `Library::FindPragma` (which is where any other metadata lookup happens). We also let the `FindPragma` accept any of Class/Function/Field objects. Furthermore this CL adds a bailout if we try to evaluate metadata in the background compiler, since the background compiler is not allowed to execute generated code. The bailout should trigger a re-compilation on the mutator thread. Furthermore the `FindPragma()` function is fixed to handle the case when the evaluation of the metadata results in e.g. a language error. In this case we simply claim to not have found a pragma annotation. Issue #31954 Change-Id: I0900a80d5ae0f3e8d09baf13cba1b20dd974df31 Reviewed-on: https://dart-review.googlesource.com/c/84037 Commit-Queue: Martin Kustermann <[email protected]> Reviewed-by: Alexander Markov <[email protected]> Reviewed-by: Vyacheslav Egorov <[email protected]>
…ns work generally" This reverts commit 2de63ac. Reason for revert: This triggers shutdown deadlocks for some reason: #0 0xf7782cd9 __kernel_vsyscall #1 0xf7753d0b pthread_cond_wait@@GLIBC_2.3.2 #2 0x0108a515 dart::Monitor::WaitMicros(long long) #3 0x0108a467 dart::Monitor::Wait(long long) #4 0x00fe9048 dart::KernelIsolate::Shutdown() #5 0x00f99ae4 dart::Dart::Cleanup() #6 0x01278fce Dart_Cleanup #7 0x00dd347e dart::bin::main(int, char**) #8 0x00dd3d64 main #9 0xf745caf3 __libc_start_main #10 0x00dd2021 _start TID 30443: #0 0xf7782cd9 __kernel_vsyscall #1 0xf7531ee6 epoll_wait #2 0x00ddab61 dart::bin::EventHandlerImplementation::Poll(unsigned int) #3 0x00dfb2d4 dart::bin::ThreadStart(void*) #4 0xf774ff72 start_thread #5 0xf753143e __clone TID 30444: #0 0xf7782cd9 __kernel_vsyscall #1 0xf7753d0b pthread_cond_wait@@GLIBC_2.3.2 #2 0x0108a515 dart::Monitor::WaitMicros(long long) #3 0x0108a467 dart::Monitor::Wait(long long) #4 0x012484d8 dart::BackgroundCompiler::Run() #5 0x01248a37 dart::BackgroundCompilerTask::Run() #6 0x0112203a dart::ThreadPool::Worker::Loop() #7 0x01121efe dart::ThreadPool::Worker::Main(unsigned int) #8 0x01089e73 dart::ThreadStart(void*) #9 0xf774ff72 start_thread #10 0xf753143e __clone Original change's description: > [VM] Add @pragma annotations on Field, make @pragma annotations work generally > > Currently the @pramga('vm:exact-result-type') annotation only works if > the function is a recognized method. This change changes that to make > the VM just look if a function has the annotation (no matter if it's > also in the list of recognized methods or not). > > Furthermore this CL adds a "has_pragma" bit to [Field] objects, similar > to how we have it on [Function]/[Class]es. This allows annotating > fields with types, as we do with function return types. > > Furthermore this CL lets the type propgagator use > @pragma('vm:exact-result-type') annotations to narrow the [CompileType] > set on [LoadFieldInstr]s. > > Since the @pragma is a general feature, this CL moves the > `Function::FindPragma()` to `Library::FindPragma` (which is where any > other metadata lookup happens). We also let the `FindPragma` accept any > of Class/Function/Field objects. > > Furthermore this CL adds a bailout if we try to evaluate metadata in > the background compiler, since the background compiler is not allowed > to execute generated code. The bailout should trigger a re-compilation > on the mutator thread. > > Furthermore the `FindPragma()` function is fixed to handle the case > when the evaluation of the metadata results in e.g. a language error. > In this case we simply claim to not have found a pragma annotation. > > Issue #31954 > > Change-Id: I0900a80d5ae0f3e8d09baf13cba1b20dd974df31 > Reviewed-on: https://dart-review.googlesource.com/c/84037 > Commit-Queue: Martin Kustermann <[email protected]> > Reviewed-by: Alexander Markov <[email protected]> > Reviewed-by: Vyacheslav Egorov <[email protected]> [email protected],[email protected],[email protected],[email protected] Change-Id: Ic0d22d32b0eea3a76ec245cabab0006f97ca1b05 No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://dart-review.googlesource.com/c/84622 Reviewed-by: Martin Kustermann <[email protected]>
Currently the @pramga('vm:exact-result-type') annotation only works if the function is a recognized method. This change changes that to make the VM just look if a function has the annotation (no matter if it's also in the list of recognized methods or not). Furthermore this CL lets the type propgagator use @pragma('vm:exact-result-type') annotations to narrow the [CompileType] set on [LoadFieldInstr]s. Since the @pragma is a general feature, this CL moves the `Function::FindPragma()` to `Library::FindPragma` (which is where any other metadata lookup happens). We also let the `FindPragma` accept any of Class/Function/Field objects. Furthermore the `FindPragma()` function is fixed to handle the case when the evaluation of the metadata results in an error. In this case we simply claim to not have found a pragma annotation. Issue #31954 Change-Id: If03f566e334cd53549985823ee3dd6b5e9672969 Reviewed-on: https://dart-review.googlesource.com/c/85163 Commit-Queue: Martin Kustermann <[email protected]> Reviewed-by: Vyacheslav Egorov <[email protected]>
…ys initialized with _Smi's Until now a typed data view was first constructed and then afterwards the constructor body validated that the offsetInBytes/length are non-null and in range. This CL moves the range check to the call sites of the constructor, thereby guarenteeing that we will never initialize the _TypedListView.{offsetInBytes,length} fields with non-_Smi values. (The views are constructed only within the "dart:typed_data" library. Neither the embedder nor the runtime construct them.) Further more this CL removes the redundant _defaultIfNull() call, since the call sites already handled the `length == null` case. This CL also starts annotating _TypedListView.{offsetInBytes,length} with @pragma("vm:exact-result-type", "dart:core#_Smi") to ensure any [LoadFieldInstr] will have the right type attached, thereby avoiding any null-checks and mint-handling. This improves dart-aot On arm7hf: MD5: +38%, SHA256: +87%, SHA1: +89% (plus various typed data microbenchmarks) JsonParseDefaultReviver/StringBuffer/StringIdiomatic/JsonParseCustomReviver: -5-8% (probably due to not inlining static calls, will find out) On arm8: MD5: +6.5%, SHA256: +12%, SHA1: 3.6%, JsonUtf8RoundTrip: 8% (plus various typed data microbenchmarks) DeltaBlue: -6.7% (probably due to not inlining static calls, will find out) Issue #35154 Issue #31954 Change-Id: I37c822e6879f5a2d17fd9650a68cf2eee4326b01 Reviewed-on: https://dart-review.googlesource.com/c/84241 Commit-Queue: Martin Kustermann <[email protected]> Reviewed-by: Alexander Markov <[email protected]> Reviewed-by: Vyacheslav Egorov <[email protected]>
A field/function annotated with this pragma must be guaranteed to not return `null` at runtime. Make use of this non-nullable annotation in the VM's type propagator. Annotates the "_TypedListView._typedData" field to ensure the VM knows it returns a non-nullable _TypedListView. Furthermore annotates methods on the integer implementation. Those particular methods are recognized methods with a "dynamic" return type. This caused the type propagator to use CompileType::Dynamic() as result type. Since a previous CL started to only utilize the annotated type if it is better than "dynamic" more integer operations got handled in-line, though with null-checks. Annotating those methods to return non-null improves the in-line handling of integer operations. This improves dart-aot On arm7hf: SHA256: +5%, SHA: +6%, JsonObjectRoundTrip: +7%, ... On arm8: SHA1: +28%, MD5: +25%, SHA256: +15%, TypedData.Int16ListViewBench: +18.5%, StringInterpolation: +18%, ... Issue #31954 Issue #35154 Change-Id: Ia4263a37241a36c9dc35e8a48893297effa6f4b2 Reviewed-on: https://dart-review.googlesource.com/c/84421 Commit-Queue: Martin Kustermann <[email protected]> Reviewed-by: Vyacheslav Egorov <[email protected]> Reviewed-by: Alexander Markov <[email protected]>
…the end If we inline something inside a loop which might throw, the throwing blocks are currently spliced in the middle of the loop. This CL moves those blocks to the very end. This improves a number of typed data benchmarks in dart-aot mode by 5-10%. Issue #31954 Change-Id: I5dc86291240d8dac61798ff873ffa7205edc0007 Reviewed-on: https://dart-review.googlesource.com/c/85263 Commit-Queue: Martin Kustermann <[email protected]> Reviewed-by: Vyacheslav Egorov <[email protected]>
… recognize it, use it to special case ascii decoding This change extends/fixes the exiting "pattern recognition" which tries to recognize the pattern v2 <- LoadClassIdInstr(v1) BranchIf v2 == IntegerConstant(cid) Furthermore we start inlining the recognized `ClassID.getID` method very early in the pipeline. This allows the VM to recognize the above pattern and insert redefinitions before the actual inlining pass. Furthermore we special-case two very hot methods in utf8 decoding by manually having two loops, one of which is guarded by a class-id check against the _Uint8ArrayView class, which is most common. (In the future we would like to unify the typed data layouts so we no longer need to use `ClassId.getID`, thereby also allowing non core library code to use this). This improves dart-aot by * 31%+ for a protobuf decoding benchmark we care about Issue #31954 Change-Id: Ia567b92b7e76ff28eda1726deaafda32732ed8f5 Reviewed-on: https://dart-review.googlesource.com/c/85281 Reviewed-by: Johnni Winther <[email protected]> Reviewed-by: Vyacheslav Egorov <[email protected]> Reviewed-by: Jenny Messerly <[email protected]> Commit-Queue: Martin Kustermann <[email protected]>
…logic to recognize it, use it to special case ascii decoding" This reverts commit d7cf695. Reason for revert: This causes crashes on the vm-obfuscate builder at https://ci.chromium.org/p/dart/builders/luci.dart.ci.sandbox/vm-kernel-precomp-obfuscate-linux-release-x64 like FAILED: dartkp-dart_precompiled release_x64 corelib_2/linked_hash_map_test Expected: Pass Actual: Crash Unexpected compile error. --- Command "vm_compile_to_kernel" (took 11.000189s): DART_CONFIGURATION=ReleaseX64 /b/s/w/ir/cache/builder/sdk/pkg/vm/tool/gen_kernel --aot --platform=out/ReleaseX64/vm_platform_strong.dill -o /b/s/w/ir/cache/builder/sdk/out/ReleaseX64/generated_compilations/dartkp/tests_corelib_2_linked_hash_map_test/out.dill /b/s/w/ir/cache/builder/sdk/tests/corelib_2/linked_hash_map_test.dart --packages=/b/s/w/ir/cache/builder/sdk/.packages -Ddart.developer.causal_async_stacks=true exit code: 0 --- Command "precompiler" (took 583ms): DART_CONFIGURATION=ReleaseX64 out/ReleaseX64/gen_snapshot --snapshot-kind=app-aot-assembly --assembly=/b/s/w/ir/cache/builder/sdk/out/ReleaseX64/generated_compilations/dartkp/tests_corelib_2_linked_hash_map_test/out.S --obfuscate --sync-async --ignore-unrecognized-flags --packages=/b/s/w/ir/cache/builder/sdk/.packages /b/s/w/ir/cache/builder/sdk/out/ReleaseX64/generated_compilations/dartkp/tests_corelib_2_linked_hash_map_test/out.dill exit code: -6 stderr: Warning: This VM has been configured to obfuscate symbol information which violates the Dart standard. See dartbug.com/30524 for more information. ===== CRASH ===== version=2.2.0-edge.83712405657ff736033380cb24b0bd5a62fc3692 (Tue Nov 27 16:56:07 2018 +0000) on "linux_x64" si_signo=Segmentation fault(11), si_code=1, si_addr=0xd1400050e5f Dumping native stack trace for thread 2b71 [0x000056001fc426c7] Unknown symbol [0x000056001fc426c7] Unknown symbol [0x000056001fe5763d] Unknown symbol [0x000056001fe4c99c] Unknown symbol [0x000056001fe6166a] Unknown symbol [0x000056001fe59c0c] Unknown symbol Original change's description: > [VM] Inline ClassID.getID() eagerly, extend pattern matching logic to recognize it, use it to special case ascii decoding > > This change extends/fixes the exiting "pattern recognition" which tries > to recognize the pattern > > v2 <- LoadClassIdInstr(v1) > BranchIf v2 == IntegerConstant(cid) > > Furthermore we start inlining the recognized `ClassID.getID` method very > early in the pipeline. This allows the VM to recognize the above > pattern and insert redefinitions before the actual inlining pass. > > Furthermore we special-case two very hot methods in utf8 decoding by > manually having two loops, one of which is guarded by a class-id check > against the _Uint8ArrayView class, which is most common. (In the future > we would like to unify the typed data layouts so we no longer need to > use `ClassId.getID`, thereby also allowing non core library code to use > this). > > This improves dart-aot by > * 31%+ for a protobuf decoding benchmark we care about > > > Issue #31954 > > Change-Id: Ia567b92b7e76ff28eda1726deaafda32732ed8f5 > Reviewed-on: https://dart-review.googlesource.com/c/85281 > Reviewed-by: Johnni Winther <[email protected]> > Reviewed-by: Vyacheslav Egorov <[email protected]> > Reviewed-by: Jenny Messerly <[email protected]> > Commit-Queue: Martin Kustermann <[email protected]> [email protected],[email protected],[email protected],[email protected],[email protected] Change-Id: I912b0768c32cbb00297ce48db29dbdbea44c14fa No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://dart-review.googlesource.com/c/85441 Reviewed-by: William Hesse <[email protected]> Commit-Queue: William Hesse <[email protected]>
…logic to recognize it, use it to special case ascii decoding" This change extends/fixes the exiting "pattern recognition" which tries to recognize the pattern v2 <- LoadClassIdInstr(v1) BranchIf v2 == IntegerConstant(cid) Furthermore we start inlining the recognized `ClassID.getID` method very early in the pipeline. This allows the VM to recognize the above pattern and insert redefinitions before the actual inlining pass. Furthermore we special-case two very hot methods in utf8 decoding by manually having two loops, one of which is guarded by a class-id check against the _Uint8ArrayView class, which is most common. (In the future we would like to unify the typed data layouts so we no longer need to use `ClassId.getID`, thereby also allowing non core library code to use this). This improves dart-aot by * 31%+ for a protobuf decoding benchmark we care about Issue #31954 Change-Id: I7181bbf096aabe303634fd3b2bff9cc96d69719c Reviewed-on: https://dart-review.googlesource.com/c/85443 Reviewed-by: Vyacheslav Egorov <[email protected]> Commit-Queue: Martin Kustermann <[email protected]>
…coding speed Issue #31954 Change-Id: I525157d1c91276d35f6678a9066ac72900cc11ed Reviewed-on: https://dart-review.googlesource.com/c/93434 Commit-Queue: Martin Kustermann <[email protected]> Reviewed-by: Sigurd Meldgaard <[email protected]> Reviewed-by: Lasse R.H. Nielsen <[email protected]>
This removes the 3 fields from the classes in Dart and instead describes the layout in C++ via a RawTypedDataView class (as already do with normal RawTypedData). The existing "semi" TypedDataView handle class is changed to be a real handle class. This decreases performance of some microbenchmarks due to field guards not being used anymore (before the JIT could add a field guard to view classes guarding that only normal typed data is used as backing store (and not e.g. external typed data) Issue #35154 Issue #31954 Change-Id: I7a0022b843a4c0fa69f53dedcc4c7bd2117cdc37 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/96806 Commit-Queue: Martin Kustermann <[email protected]> Reviewed-by: Ryan Macnak <[email protected]>
Recently I've worked on unifying the heap object layout of our typed data classes so that our AOT compiler can directly access the bytes of typed data members (e.g. of All of the work combined has lead to the following numbers (measured on the benchmark mentioned above in #31954 (comment) on my workstation):
So the AOT is almost on par with the JIT (for this particular benchmark) There is still room for improvement and we continue working on it! (**) Notice of Warning: Our AOT compiler will only directly access the bytes (e.g. in |
The AOT typed data optimizations had a huge performance gain. Though the Dart code is only taking advantage of it for the ascii case atm. @askeksa-google is now going to extend this fast case to arbitrary unicode. We'll try to keep optimizing the Dart implementation - though still consider intensifying it afterwards. |
|
how to convert a _uint8arrayview object to image ? |
I recommend asking such questions on Stack Overflow. |
UTF8.decode seems surprisingly slow for large buffers.
Using this Flutter program:
...I find that on a Pixel 2 XL, 4KB is parsed in about 1ms, 40KB in about 12ms, and 400KB in about 135ms. This is only about 3MB per second. Then again, I'm not sure what would be considered good throughput on this device. Maybe this is as good as we can get?
Presumably this could be fixed in the most radical fashion by supporting UTF-8 as a native data type.
The text was updated successfully, but these errors were encountered: