-
Notifications
You must be signed in to change notification settings - Fork 1.7k
ByteData should support memcpy from other type data objects #32080
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Unfortunately it isn't as well documented as it should be, but the Uint8List original;
...
Uint8List copy = new Uint8List.fromList(original);
// copy.buffer.asByteData() to get a ByteData. and Uint8List l1;
Uint8List l2;
...
l2.setRange(offset, length, l2); Are implemented in the VM. See https://github.com/dart-lang/sdk/blob/master/runtime/lib/typed_data_patch.dart#L104 and https://github.com/dart-lang/sdk/blob/master/runtime/lib/typed_data.cc#L108 |
Does ByteData support a setRange? I guess I need to create a view of the underlying ByteBuffer for each type that I want to set as a range? |
Right. |
Do you have a sense for where the crossover point is in performance? For example, is there some length below which |
I'm not sure. The answer might also be different between AOT and JIT. Maybe @mraleph can help. |
The context of this question is the Dart FIDL2 encoder for Fuchsia: I don't have any evidence that this code is a measurable part of the profile. I just felt silly writing these memcpy equivalents. |
I ran some microbenchmarks on my x64 desktop: https://gist.github.com/zanderso/51fcfd0a797f5d486200047a7ece92b7 50 elements seems like a reasonable cross-over point, except for Int64List where it looks like there is a bug:
/cc @alexmarkov |
We don't have an inline version of In general I would recommend using |
Issue: #32080 Change-Id: I692d1873acc4c9816ccc6929f51e0ed699f585da Reviewed-on: https://dart-review.googlesource.com/40103 Reviewed-by: Vyacheslav Egorov <[email protected]> Commit-Queue: Alexander Markov <[email protected]>
After working on optimizing a computation bound problem today (small network protocol benchmark over a loopback socket, hits 200 mb/s if reading is stubbed), I found that using setRange is very slow in many cases, and I've found that replacing setRange calls with a for loop (since in my benchmark there are small amounts of data being copied into staging buffers) greatly increases performance (once I gained around 30 mb/s by removing a setRange call). I've also found that array views also quite slow, but that's being handled by #35154. |
I wish dart had some kind of memcpy operation. Over here, using raw loops vs getRange and iterating, gives about ~3-5% performance boost. Related: dart-lang/sdk#32080
This seems to still be the case. I need to copy a lot of small chunks (1-15 bytes) from one place in a Uint8List to another place in the same Uint8List and using a for loop is 3-4x faster than setRange/getRange. |
@Mr-Pepe small chunks might indeed be faster to copy with an inline loop, but we did optimize Can you post the benchmark you are using? |
@mraleph Sure. Here is a stripped down version of my benchmark. It's an implementation of LZ77 decompression. The benchmarks creates a Uint8List of known size in advance. It then iterates over a list of blocks. A block can be compressed or uncompressed. An uncompressed block holds a byte value that can directly be inserted at the current position of the Uint8List. A compressed block tells the algorithm to go back a certain offset and then copy a certain number of bytes to the current position. I have tried using a for-loop and setRange/getRange to copy the range for compressed blocks and the for-loop seems to be faster on my machine. Here the code. Sorry, I have hardcoded the data (parts of CEDICT) because that was the fastest I could come up with :\ Code
|
One issue with your code is in the lines: decompressed.setRange(
index,
index + block.length,
decompressed.getRange(
index - block.offset,
index - block.offset + block.length,
),
); If I change those to: decompressed.setRange(
index,
index + block.length,
decompressed
index - block.offset,
); and avoid creating an extra Still not as fast as the copy loop, but at least in the same ballpark (well, at least when JIT'ed, still only half the speed of the loop when compiled to exe). There is an overhead of using |
Right, I wasn't making the best use of The performance of the for-loop is currently fine for me. I was just surprised that it was so much faster than |
Writing this code makes me sad. The VM can do this much faster an easier than I can from inside Dart.
/cc @Hixie @zanderso
The text was updated successfully, but these errors were encountered: