Skip to content

Add Video Capture Support for macOS through AVFoundation/Swift #821

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 3, 2020

Conversation

yongtang
Copy link
Member

@yongtang yongtang commented Mar 1, 2020

This PR is part of the effort in resolving #814.

In #814, the feature request is to add video capture support for Linux, likely through Video4Linux. This PR fixes #814

Due to some limitations Video4Linux will need a compatible USB camera first.

This PR, instead tries to resolve the featue requrest on macOS first.

On macOS the built-in camera could be accessed through AVFoundation's Swift API.

This PR uses Swift to access AVCaptureSession/etc, and exported to C function (cdecl) so that it could be used in C++ kernel in tensorflow-io.

Since macOS's raw video capture format is NV12 (kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange) additional work is needed to convert NV12 into RGB format, so that a whole pipeline could be built up to allow using video capture for tf.keras' inference.

This PR does not resolve the NV12 => RGB yet. Will address in separate PRs.

Also, since video capture is technically a continuous stream and is not repeatable, it is not possible to train based on video capture with multiple epochs.

Finally, the following is a sample usage which takes video capture and saves as nv12 raw file.

The NV12 raw file could be checked by using ffmpeg to convert to JPEG to validate.

Note: the following is a validation, YUV image could be converted to JPEG with:

ffmpeg -s 1280x720 -pix_fmt nv12 -i frame_{i}.yuv frame_{i}.jpg

Usage:

dataset = tfio.experimental.IODataset.stream().from_video_capture(
    "device").take(5)
i = 0
for frame in dataset:
  print("Frame {}: shape({}) dtype({}) length({})".format(
      i, frame.shape, frame.dtype, tf.strings.length(frame)))
  tf.io.write_file("frame_{}.yuv".format(i), frame)
  i += 1

/cc @bhack @ivelin

Signed-off-by: Yong Tang [email protected]

@yongtang yongtang mentioned this pull request Mar 1, 2020
@yongtang yongtang force-pushed the video branch 2 times, most recently from 2a35d14 to 0775c86 Compare March 2, 2020 03:37
yongtang added 4 commits March 2, 2020 16:22
This PR is part of the effort in resolving 814.

In 814, the feature request is to add video capture support for Linux,
likely through Video4Linux.

Due to some limitations Video4Linux will need a compatible USB camera first.

This PR, instead tries to resolve the featue requrest on macOS first.

On macOS the built-in camera could be accessed through AVFoundation's Swift API.
This PR uses Swift to access  AVCaptureSession/etc, and exported to C function (`cdecl`)
so that it could be used in C++ kernel in tensorflow-io.

Since macOS's raw video capture format is NV12 (kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange)
additional work is needed to convert NV12 into RGB format, so that a whole pipeline could be built
up to allow using video capture for tf.keras' inference.

This PR does not resolve the NV12 => RGB yet. Will address in separate PRs.

Also, since video capture is technically a continuous stream and is not repeatable,
it is not possible to train based on video capture with multiple epochs.

Finally, the following is a sample usage which takes video capture and saves as nv12 raw file.

The NV12 raw file could be checked by using ffmpeg to convert to JPEG to validate.

Note: the following is a validation
YUV image could be converted to JPEG with:
```
ffmpeg -s 1280x720 -pix_fmt nv12 -i frame_{i}.yuv frame_{i}.jpg
```

Usage:
```
dataset = tfio.experimental.IODataset.stream().from_video_capture(
    "device").take(5)
i = 0
for frame in dataset:
  print("Frame {}: shape({}) dtype({}) length({})".format(
      i, frame.shape, frame.dtype, tf.strings.length(frame)))
  tf.io.write_file("frame_{}.yuv".format(i), frame)
  i += 1
```

Signed-off-by: Yong Tang <[email protected]>
Signed-off-by: Yong Tang <[email protected]>
Signed-off-by: Yong Tang <[email protected]>
@yongtang
Copy link
Member Author

yongtang commented Mar 3, 2020

Now video capture on Linux has been added. It is possible to use

dataset = tfio.experimental.IODataset.stream().from_video_capture(
    "/dev/video0").take(5)
i = 0
for frame in dataset:
  print("Frame {}: shape({}) dtype({}) length({})".format(
      i, frame.shape, frame.dtype, tf.strings.length(frame)))
  tf.io.write_file("frame_{}.yuv".format(i), frame)
  i += 1

on Linux platforms where video is available.

I tested on a Debian VM with Video4Linux Loopback and it works as expected.

One thing to note is that by default I only tested and enabled yuyv422 format. (Not also macOS is NV12 and Android is NV21).

decode NV12 and YUYV to RGB will be done in follow up PRs.

/cc @bhack

Copy link
Member

@terrytangyuan terrytangyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@terrytangyuan terrytangyuan merged commit 4477e34 into tensorflow:master Mar 3, 2020
@yongtang yongtang deleted the video branch March 3, 2020 17:18
i-ony pushed a commit to i-ony/io that referenced this pull request Feb 8, 2021
…rflow#821)

* Add Video Capture Support for macOS through AVFoundation/Swift

This PR is part of the effort in resolving 814.

In 814, the feature request is to add video capture support for Linux,
likely through Video4Linux.

Due to some limitations Video4Linux will need a compatible USB camera first.

This PR, instead tries to resolve the featue requrest on macOS first.

On macOS the built-in camera could be accessed through AVFoundation's Swift API.
This PR uses Swift to access  AVCaptureSession/etc, and exported to C function (`cdecl`)
so that it could be used in C++ kernel in tensorflow-io.

Since macOS's raw video capture format is NV12 (kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange)
additional work is needed to convert NV12 into RGB format, so that a whole pipeline could be built
up to allow using video capture for tf.keras' inference.

This PR does not resolve the NV12 => RGB yet. Will address in separate PRs.

Also, since video capture is technically a continuous stream and is not repeatable,
it is not possible to train based on video capture with multiple epochs.

Finally, the following is a sample usage which takes video capture and saves as nv12 raw file.

The NV12 raw file could be checked by using ffmpeg to convert to JPEG to validate.

Note: the following is a validation
YUV image could be converted to JPEG with:
```
ffmpeg -s 1280x720 -pix_fmt nv12 -i frame_{i}.yuv frame_{i}.jpg
```

Usage:
```
dataset = tfio.experimental.IODataset.stream().from_video_capture(
    "device").take(5)
i = 0
for frame in dataset:
  print("Frame {}: shape({}) dtype({}) length({})".format(
      i, frame.shape, frame.dtype, tf.strings.length(frame)))
  tf.io.write_file("frame_{}.yuv".format(i), frame)
  i += 1
```

Signed-off-by: Yong Tang <[email protected]>

* Add Video4Linux V2 support on Linux

Signed-off-by: Yong Tang <[email protected]>

* Update to use device name in API calls

Signed-off-by: Yong Tang <[email protected]>

* Fix typo in Windows

Signed-off-by: Yong Tang <[email protected]>

* Fix test typo

Signed-off-by: Yong Tang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Video4Linux and Genicam
2 participants