-
Notifications
You must be signed in to change notification settings - Fork 2.4k
optimized code for Image.toByteArray() #315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
especially in case of interleaved U and V buffers
Thank you for this PR! I can't comment on the performance claims since I have not tested it. The goal of this code is for illustration purposes. If we can make it faster, then great, but we can't do that at the expense of readability. What you propose is a significantly larger amount of code, and I'm having a hard time following what it does. Would you be interested in cleaning up the code and adding more comments throughout? It seems that your PR removes all our previous comments explaining pixel strides, interleaving planes, etc. |
The question is what you want to illustrate. Please don't underestimate the importance of this code, it's being copied into thousands of projects 'as is' because it is the official way to run the camera and, particularly, to convert camera image to RGB. This PR is actually only the first step for another change I have prepared, to produce rotated (portrait) RGB bitmap. This also resolves the 'reverse' output for Nexus5X (#105). For best performance, I do this rotation in renderscript. Do you think such feature could be useful in the context of camera-samples? |
We are trying to illustrate how pixel data is encoded in image planes, and how the data is interleaved for YUV pixel formats used by the Android framework. We are also demonstrating how developers can manipulate the pixel data in an efficient manner.
This is true, which is why I'm not opposed to making it faster. But it's more important for us to make sure developers can understand (and improve) the code. This code is particularly tricky which is why we added a note about it not being production-ready, so we value readability even more than the usual. If the goal is performance, this work should probably be done in native code anyway. I was hoping that there would be some reusable module written in native code under CameraX or MLKit that can perform this transformation, but apparently that has not happened yet. As it is, I find your proposed changes very hard to follow. I like that you are breaking it down into functions, but you removed all the documentation we added to explain how the pixel data is stored in memory and you don't have a lot of documentation for what each of the functions do. I'm also a bit confused about why you would need to check the actual pixel data to see if the values are interleaved, you should be able to derive that from other available metadata in the image object -- but it's been a while since I looked at this so I could be wrong.
I'm not sure that actually rotating images is the right approach, we generally prefer to do a transformation on the Surface (for preview, which would address #105) or use metadata tags (for videos and images) which is more efficient since there is no frame copy needed at all. That said, showcasing how that can be achieved would certainly be valuable so I'm not opposed to adding a function to do that in this project. |
The truth is, for my limited pool of devices, libyuv with its manually optimized NEON assembly does this job faster than renderscript, and it handles rotation gracefully. I wonder if you have more information on that.
Absolutely. The incentive is to have a bitmap for analysis (e.g. TF) in correct orientation. People often use matrix rotation of the bitmap, because it's a one-liner, but then they find that preparing an image takes longer then inference itself.
Because in Kotlin (or Java) I don't have access to pointer arithmetic for the U and V plane byte buffers. Is there some metadata that can reveal the NV21 nature of the U plane when pixel stride is 2? |
This is unfortunately not surprising. Many device manufacturers don't have a very performant implementation of RenderScript intrinsics. In some cases the implementation is outright wrong, although increased testing in the last few releases has helped a lot getting rid of most bugs.
Good point. Is it easier / faster to do the transformation in TF? I highly suspect that an ML library won't care as much about performance for image rotation, and even if some do it is a battle that has to be fought one library at a time. Having a reference implementation for how to do this as part of the image capture pipeline could be very valuable.
I'll have to look into this more, but I really don't think we should be looking at pixel data at all. The |
On the specific modern low-end device (Nokia 4.2) that I used as my yardstick,
Absolutely. In the sense, all these transformations are needed only because people who build the model on a powerful PC don't really care about optimization for mobile. There is no reason why an ML model would work less efficiently with YUV data in its natural orientation. But the reality is, too often the model requires RGB, and in a very specific orientation. What do we do then? Convert the image before passing it to TF! And people do this in all kinds of super-inefficient ways, like convert this semi-planar to Jpeg and then load the Jpeg to Bitmap. Or coding the conversion in Dart. The result is that this conversion takes longer than inference itself, and they are forced to use low resolutions. As for relying on unspecified behavior, I respect your position, and actually have nothing against keeping the example code inefficient. Let this esoteric knowledge of how the official examples can be accelerated by 50% or more, remain in the heads of the few initiates. |
If you think there's opportunity for compromise and there are any ways to improve performance without relying on unspecified behavior and manually comparing buffers, then that would be much more uncontroversial to merge. Otherwise we could consider a native module which can be focused on efficiency, and we don't have to worry about the code being sample-worthy. |
As I wrote before, the best performance overall is achieved with libyuv (native). I have this code (almost) ready for PR, and this may be a nice expansion of the CameraXTfLite example, but probably it’s not prudent to add such dependency to the other examples which don't actually use the YuvToRgbConverter at all. Alternatively, I can push a custom renderscript that takes three planes as three independent inputs, and does both sampling and conversion in GPU. This would've been a nice way, but it is crippled by the sad facts that renderscript allocation does not accept ByteBuffer, and that the Image planes are not backed up by byte arrays. Thus, we are forced to perform three huge Another alternative is to connect the input allocation to ImageReader Surface, as discussed in https://groups.google.com/a/android.com/g/camerax-developers/c/qMbhcf3T7_c?pli=1. Actually, the code is there, you may reach out to @LiewJunTung and ask him to contribute this to camera-samples. |
Got it. I would be open to adding a Even though I don't want to discourage you, since you are talking about what sounds like a large pull request I must warn you: we can be pretty picky about code quality, style and even comments. It may seem trivial, but please keep in mind that these samples are used first and foremost for educational purposes.
If you are interested in contributing more Renderscript-specific code, we can also try to add it to the Renderscript samples repo. I've been wanting to move the HDR camera sample there for some time, since we just don't have enough resources to maintain that in this repo with all the other samples and the CameraX work.
This is definitely a simplified and elegant way of doing it. It would be interesting to evaluate the performance compared to other methods. I'm worried about the additional frame copies and memory usage required by using |
|
||
if (planes[2].pixelStride == 2 && planes[2].buffer.isInterleavedWith(planes[1].buffer)) | ||
planes[2].extractChromaInterleaved(planes[1], cropRect, outputBuffer) | ||
else if (planes[2].pixelStride == 2 && planes[1].buffer.isInterleavedWith(planes[2].buffer)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you help me figure it out: if pixelStride == 2, it already means uv interleaved, so don't you need this extra condition? Actually, when do you need extra byte from u-buffer? For me extractChromaInterleaved
works perfectly fine with only v-buffer as input. Do I miss something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- if
pixelStride == 2
the two planes may be interleaved. Still, I must prove it (that's done by flipping one byte). There is no guarantee . - Even when we know that the planes are interleaved, we must find whether it is
uv
orvu
kind of interleave. - On some devices, the length of the v ByteBuffer may be
w*h/2
, but on other devices, it isw*h/2-1
. The missing pixel must be extracted from the second plane.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure, but this code should be tested well, and seems that author is confident about pixelStride == 2.
https://github.com/opencv/opencv/blob/master/modules/java/generator/android-21/java/org/opencv/android/JavaCamera2View.java
Thank you for explanation.
P.S. tested your code on some devices from our farm with all androids available: 5.1, 6, 7, 8, 9, 10, 11, everything works fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenCV uses C++ to access U and V planes. Here I only have Java at my disposal.
Especially efficient in case of interleaved U and V buffers. This kindof resolves #277.