Skip to content

optimized code for Image.toByteArray() #315

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

alexcohn
Copy link
Contributor

@alexcohn alexcohn commented Oct 1, 2020

Especially efficient in case of interleaved U and V buffers. This kindof resolves #277.

especially in case of interleaved U and V buffers
@owahltinez
Copy link
Contributor

Thank you for this PR! I can't comment on the performance claims since I have not tested it.

The goal of this code is for illustration purposes. If we can make it faster, then great, but we can't do that at the expense of readability. What you propose is a significantly larger amount of code, and I'm having a hard time following what it does.

Would you be interested in cleaning up the code and adding more comments throughout? It seems that your PR removes all our previous comments explaining pixel strides, interleaving planes, etc.

@alexcohn
Copy link
Contributor Author

alexcohn commented Oct 1, 2020

The question is what you want to illustrate. Please don't underestimate the importance of this code, it's being copied into thousands of projects 'as is' because it is the official way to run the camera and, particularly, to convert camera image to RGB.

This PR is actually only the first step for another change I have prepared, to produce rotated (portrait) RGB bitmap. This also resolves the 'reverse' output for Nexus5X (#105). For best performance, I do this rotation in renderscript. Do you think such feature could be useful in the context of camera-samples?

@owahltinez
Copy link
Contributor

The question is what you want to illustrate

We are trying to illustrate how pixel data is encoded in image planes, and how the data is interleaved for YUV pixel formats used by the Android framework. We are also demonstrating how developers can manipulate the pixel data in an efficient manner.

Please don't underestimate the importance of this code, it's being copied into thousands of projects 'as is' because it is the official way to run the camera and, particularly, to convert camera image to RGB.

This is true, which is why I'm not opposed to making it faster. But it's more important for us to make sure developers can understand (and improve) the code. This code is particularly tricky which is why we added a note about it not being production-ready, so we value readability even more than the usual. If the goal is performance, this work should probably be done in native code anyway. I was hoping that there would be some reusable module written in native code under CameraX or MLKit that can perform this transformation, but apparently that has not happened yet.

As it is, I find your proposed changes very hard to follow. I like that you are breaking it down into functions, but you removed all the documentation we added to explain how the pixel data is stored in memory and you don't have a lot of documentation for what each of the functions do. I'm also a bit confused about why you would need to check the actual pixel data to see if the values are interleaved, you should be able to derive that from other available metadata in the image object -- but it's been a while since I looked at this so I could be wrong.

This PR is actually only the first step for another change I have prepared, to produce rotated (portrait) RGB bitmap

I'm not sure that actually rotating images is the right approach, we generally prefer to do a transformation on the Surface (for preview, which would address #105) or use metadata tags (for videos and images) which is more efficient since there is no frame copy needed at all. That said, showcasing how that can be achieved would certainly be valuable so I'm not opposed to adding a function to do that in this project.

@alexcohn
Copy link
Contributor Author

alexcohn commented Oct 1, 2020

I was hoping that there would be some reusable module written in native code under CameraX or MLKit that can perform this transformation, but apparently that has not happened yet.

The truth is, for my limited pool of devices, libyuv with its manually optimized NEON assembly does this job faster than renderscript, and it handles rotation gracefully. I wonder if you have more information on that.

I'm not sure that actually rotating images is the right approach, we generally prefer to do a transformation on the Surface

Absolutely. The incentive is to have a bitmap for analysis (e.g. TF) in correct orientation. People often use matrix rotation of the bitmap, because it's a one-liner, but then they find that preparing an image takes longer then inference itself.

why you would need to check the actual pixel data to see if the values are interleaved

Because in Kotlin (or Java) I don't have access to pointer arithmetic for the U and V plane byte buffers. Is there some metadata that can reveal the NV21 nature of the U plane when pixel stride is 2?

@owahltinez
Copy link
Contributor

The truth is, for my limited pool of devices, libyuv with its manually optimized NEON assembly does this job faster than renderscript, and it handles rotation gracefully. I wonder if you have more information on that.

This is unfortunately not surprising. Many device manufacturers don't have a very performant implementation of RenderScript intrinsics. In some cases the implementation is outright wrong, although increased testing in the last few releases has helped a lot getting rid of most bugs.

Absolutely. The incentive is to have a bitmap for analysis (e.g. TF) in correct orientation. People often use matrix rotation of the bitmap, because it's a one-liner, but then they find that preparing an image takes longer then inference itself.

Good point. Is it easier / faster to do the transformation in TF? I highly suspect that an ML library won't care as much about performance for image rotation, and even if some do it is a battle that has to be fought one library at a time. Having a reference implementation for how to do this as part of the image capture pipeline could be very valuable.

Because in Kotlin (or Java) I don't have access to pointer arithmetic for the U and V plane byte buffers. Is there some metadata that can reveal the NV21 nature of the U plane when pixel stride is 2?

I'll have to look into this more, but I really don't think we should be looking at pixel data at all. The Image object provides all the abstractions needed, including access to the separate image planes. If there's something that cannot be derived from that object we should not rely on it since it will be implementation-dependent and subject to change without warning. It would be akin to relying on unspecified behavior: you are probably fine, but it's definitely not something we want to showcase as an example.

@alexcohn
Copy link
Contributor Author

alexcohn commented Oct 2, 2020

Many device manufacturers don't have a very performant implementation of RenderScript intrinsics.

On the specific modern low-end device (Nokia 4.2) that I used as my yardstick, ScriptIntrinsicYuvToRGB was quite efficient, but switching to custom renderscript (to perform correct conversion) was painfully slow. Not only the call to forEach() but also the inevitable outputAllocation.copyTo() after that.

Is it easier / faster to do the transformation in TF?

Absolutely. In the sense, all these transformations are needed only because people who build the model on a powerful PC don't really care about optimization for mobile. There is no reason why an ML model would work less efficiently with YUV data in its natural orientation. But the reality is, too often the model requires RGB, and in a very specific orientation. What do we do then? Convert the image before passing it to TF! And people do this in all kinds of super-inefficient ways, like convert this semi-planar to Jpeg and then load the Jpeg to Bitmap. Or coding the conversion in Dart. The result is that this conversion takes longer than inference itself, and they are forced to use low resolutions.

As for relying on unspecified behavior, I respect your position, and actually have nothing against keeping the example code inefficient. Let this esoteric knowledge of how the official examples can be accelerated by 50% or more, remain in the heads of the few initiates.

@owahltinez
Copy link
Contributor

As for relying on unspecified behavior, I respect your position, and actually have nothing against keeping the example code inefficient. Let this esoteric knowledge of how the official examples can be accelerated by 50% or more, remain in the heads of the few initiates.

If you think there's opportunity for compromise and there are any ways to improve performance without relying on unspecified behavior and manually comparing buffers, then that would be much more uncontroversial to merge. Otherwise we could consider a native module which can be focused on efficiency, and we don't have to worry about the code being sample-worthy.

@alexcohn
Copy link
Contributor Author

alexcohn commented Oct 3, 2020

As I wrote before, the best performance overall is achieved with libyuv (native). I have this code (almost) ready for PR, and this may be a nice expansion of the CameraXTfLite example, but probably it’s not prudent to add such dependency to the other examples which don't actually use the YuvToRgbConverter at all.

Alternatively, I can push a custom renderscript that takes three planes as three independent inputs, and does both sampling and conversion in GPU. This would've been a nice way, but it is crippled by the sad facts that renderscript allocation does not accept ByteBuffer, and that the Image planes are not backed up by byte arrays. Thus, we are forced to perform three huge memcpys.

Another alternative is to connect the input allocation to ImageReader Surface, as discussed in https://groups.google.com/a/android.com/g/camerax-developers/c/qMbhcf3T7_c?pli=1. Actually, the code is there, you may reach out to @LiewJunTung and ask him to contribute this to camera-samples.

@owahltinez
Copy link
Contributor

As I wrote before, the best performance overall is achieved with libyuv (native). I have this code (almost) ready for PR, and this may be a nice expansion of the CameraXTfLite example, but probably it’s not prudent to add such dependency to the other examples which don't actually use the YuvToRgbConverter at all.

Got it. I would be open to adding a CameraNativeUtils subfolder, and showcase the native YuvToRgbConverter equivalent there. For the time being, I would prefer not to add it as a dependency to any of the official samples, because building native libs can get complicated for newer Android developers.

Even though I don't want to discourage you, since you are talking about what sounds like a large pull request I must warn you: we can be pretty picky about code quality, style and even comments. It may seem trivial, but please keep in mind that these samples are used first and foremost for educational purposes.

Alternatively, I can push a custom renderscript that takes three planes as three independent inputs, and does both sampling and conversion in GPU. This would've been a nice way, but it is crippled by the sad facts that renderscript allocation does not accept ByteBuffer, and that the Image planes are not backed up by byte arrays. Thus, we are forced to perform three huge memcpys.

If you are interested in contributing more Renderscript-specific code, we can also try to add it to the Renderscript samples repo. I've been wanting to move the HDR camera sample there for some time, since we just don't have enough resources to maintain that in this repo with all the other samples and the CameraX work.

Another alternative is to connect the input allocation to ImageReader Surface, as discussed in https://groups.google.com/a/android.com/g/camerax-developers/c/qMbhcf3T7_c?pli=1. Actually, the code is there, you may reach out to @LiewJunTung and ask him to contribute this to camera-samples.

This is definitely a simplified and elegant way of doing it. It would be interesting to evaluate the performance compared to other methods. I'm worried about the additional frame copies and memory usage required by using ImageWriter + ImageReader as a middle-man during the colormap conversion, but maybe it's no different than using your own ByteBuffer anyhow.


if (planes[2].pixelStride == 2 && planes[2].buffer.isInterleavedWith(planes[1].buffer))
planes[2].extractChromaInterleaved(planes[1], cropRect, outputBuffer)
else if (planes[2].pixelStride == 2 && planes[1].buffer.isInterleavedWith(planes[2].buffer))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me figure it out: if pixelStride == 2, it already means uv interleaved, so don't you need this extra condition? Actually, when do you need extra byte from u-buffer? For me extractChromaInterleaved works perfectly fine with only v-buffer as input. Do I miss something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. if pixelStride == 2 the two planes may be interleaved. Still, I must prove it (that's done by flipping one byte). There is no guarantee .
  2. Even when we know that the planes are interleaved, we must find whether it is uv or vu kind of interleave.
  3. On some devices, the length of the v ByteBuffer may be w*h/2, but on other devices, it is w*h/2-1. The missing pixel must be extracted from the second plane.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure, but this code should be tested well, and seems that author is confident about pixelStride == 2.
https://github.com/opencv/opencv/blob/master/modules/java/generator/android-21/java/org/opencv/android/JavaCamera2View.java
Thank you for explanation.
P.S. tested your code on some devices from our farm with all androids available: 5.1, 6, 7, 8, 9, 10, 11, everything works fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenCV uses C++ to access U and V planes. Here I only have Java at my disposal.

@ggfan
Copy link
Contributor

ggfan commented Jul 28, 2021

@alexcohn could you access this discussion: #399?

@ggfan
Copy link
Contributor

ggfan commented Aug 3, 2021

I verified the performance gain (~100%) with this PR, also mentioned the PR to the team(extend the framework code would be more suitable). For this sample purpose, we would go for #405, refer to #399 for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Q: Yuv converter
4 participants