[discussion needed] [videoAPI][bc-breaking] custom class for video frames #2981

bjuncek · 2020-11-10T10:20:18Z

🚀 Feature

Design and implement a custom class to hold the return of a video frame.

Motivation

At the moment the new VideoReader API returns a dictionary with a byte tensor "data" and a "pts" value. This works well for numerical data (such as audio and video), however, it may cause issues with CC and SUB streams that return strings. Additionally, if we ever want to expose additional fields and functionalities, this approach would make it easy to do from either C++ or python.

Pitch

I propose to have a custom registered VideoFrame class that would be the sole return value of the video reader API.
For example, we could get a new frame:

frame = next(VideoReaderObject)

where the frame object would have a set type, and be able to return a value based on that return type.

VideoReaderObject.set_current_stream("video")
frame = next(VideoReaderObject)
frame.type()     # returns "video"
frame.data()     # retruns uint8 byte tensor with video frame
frame.pts().      # returns float with the current presentation timestamp

VideoReaderObject.set_current_stream("subtitle")
frame = next(VideoReaderObject)
frame.type()     # returns "subtitle"
frame.data()     # returns a string

Alternatives

We could keep the current API, and cast the pointer to the string to the byte tensor, and then cast it back in python.
Not sure if and how well this could work, but it would be less disruptive. We can always alter the behaviour of python API in the wrapper we have at the moment, but I'm not sure what sort of time overhead that would be adding.

cc @bjuncek

The text was updated successfully, but these errors were encountered:

fmassa · 2021-02-15T14:16:28Z

A few questions:

will we have different Frame classes for each type, or a single one for all types?
the current proposed API doesn't play well with torchscript on the Python side, as .data() will return different types depending on what has been set up. Is the implementation in [WIP] new videoreader API now returns a scriptable Frame object #3077 up to data wrt this?
From C++, will we have different functions for getting the data as subtitles or audio?

PS: I've just realized that the current python wrapper is not torchscript-friendly either, because IIRC torchscript doesn't support dictionaries with heterogeneous types

bjuncek · 2021-02-16T10:46:48Z

There are a few potential solutions:

will we have different Frame classes for each type, or a single one for all types?

Ideally, we'd have a single frame class in C++. This class would have different (and extendable) ways of retrieving the data based on the type. In the WIP PR I have implemented .data_t() which returns tensor (video, audio). The idea is that you can follow that with .data_s() which would return a string (subtitles, CC). In principle, one can imagine extending this with any (scriptable) datatype.

the current proposed API doesn't play well with torchscript on the Python side, as .data() will return different types depending on what has been set up. Is the implementation in #3077 up to data wrt this?

I didn't know this was an issue (we have the python wrapper set up like this at the moment), and I was planning to keep it that way. An alternative would be to have the api modified slightly like this:

VideoReaderObject.set_current_stream("video")
frame = next(VideoReaderObject)
frame.type()     # returns "video"
frame.video()   # retruns uint8 byte tensor with video frame
frame.audio()   # returns empty tensor
frame.subtitle() # returns empty string
frame.pts().      # returns float with the current presentation timestamp



VideoReaderObject.set_current_stream("subtitle")
frame = next(VideoReaderObject)
frame.type()       # returns "subtitle"
frame.subtitle()  # returns a string
frame.video().    # returns empty tensor

From C++, will we have different functions for getting the data as subtitles or audio?

Yes; we have this now already (filling uint8 vs copying float tensor). We'd need to add filling the char buffer for string and cc.

bjuncek · 2021-02-16T10:47:48Z

PS: I've just realized that the current python wrapper is not torchscript-friendly either, because IIRC torchscript doesn't support dictionaries with heterogeneous types

would the move to the API like above help in mitigating this?

bjuncek added needs discussion module: video labels Nov 10, 2020

bjuncek mentioned this issue Nov 17, 2020

New video API Proposal #2660

Open

bjuncek mentioned this issue Dec 1, 2020

[WIP] new videoreader API now returns a scriptable Frame object #3077

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[discussion needed] [videoAPI][bc-breaking] custom class for video frames #2981

[discussion needed] [videoAPI][bc-breaking] custom class for video frames #2981

bjuncek commented Nov 10, 2020 •

edited by pytorch-probot bot

Loading

fmassa commented Feb 15, 2021

Uh oh!

bjuncek commented Feb 16, 2021 •

edited

Loading

Uh oh!

bjuncek commented Feb 16, 2021

Uh oh!

[discussion needed] [videoAPI][bc-breaking] custom class for video frames #2981

[discussion needed] [videoAPI][bc-breaking] custom class for video frames #2981

Comments

bjuncek commented Nov 10, 2020 • edited by pytorch-probot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 Feature

Motivation

Pitch

Alternatives

fmassa commented Feb 15, 2021

Uh oh!

bjuncek commented Feb 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bjuncek commented Feb 16, 2021

Uh oh!

bjuncek commented Nov 10, 2020 •

edited by pytorch-probot bot

Loading

bjuncek commented Feb 16, 2021 •

edited

Loading