-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[discussion needed] [videoAPI][bc-breaking] custom class for video frames #2981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
A few questions:
PS: I've just realized that the current python wrapper is not torchscript-friendly either, because IIRC torchscript doesn't support dictionaries with heterogeneous types |
There are a few potential solutions:
Ideally, we'd have a single frame class in C++. This class would have different (and extendable) ways of retrieving the data based on the type. In the WIP PR I have implemented
I didn't know this was an issue (we have the python wrapper set up like this at the moment), and I was planning to keep it that way. An alternative would be to have the api modified slightly like this:
Yes; we have this now already (filling uint8 vs copying float tensor). We'd need to add filling the char buffer for string and cc. |
would the move to the API like above help in mitigating this? |
Uh oh!
There was an error while loading. Please reload this page.
🚀 Feature
Design and implement a custom class to hold the return of a video frame.
Motivation
At the moment the new VideoReader API returns a dictionary with a byte tensor "data" and a "pts" value. This works well for numerical data (such as audio and video), however, it may cause issues with CC and SUB streams that return strings. Additionally, if we ever want to expose additional fields and functionalities, this approach would make it easy to do from either C++ or python.
Pitch
I propose to have a custom registered
VideoFrame
class that would be the sole return value of the video reader API.For example, we could get a new frame:
where the frame object would have a set type, and be able to return a value based on that return type.
Alternatives
We could keep the current API, and cast the pointer to the string to the byte tensor, and then cast it back in python.
Not sure if and how well this could work, but it would be less disruptive. We can always alter the behaviour of python API in the wrapper we have at the moment, but I'm not sure what sort of time overhead that would be adding.
cc @bjuncek
The text was updated successfully, but these errors were encountered: