-
Notifications
You must be signed in to change notification settings - Fork 7.1k
FFmpeg-based rescaling and frame rate #3016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi, Thanks for bringing up this issue! Our current thinking is that most (if not all?) filters in ffmpeg can be implemented with basic python / PyTorch / torchvision operators without too much loss of speed efficiency, and as such there would be limited benefit in packaging the filter logic from ffmpeg in PyTorch (as we would not have GPU / gradient support out of the box). Your points about the speed for resizing are valid though, and I believe this relates to a current limitation of torchvision For the change of framerate, the results you present are interesting, and I wasn't expecting such a large difference. Thoughts? |
Hi @fmassa, Thanks for your reply! I put the relevant code into a Colab MWE. The numbers are different compared to the plots above, but they tell the same story. I agree that the framerate results are a bit strange. The behaviour I would expect is more like the Torchvision curve – a constant time to decode a video clip and a negligible overhead to duplicate/drop frames, so the fps linearly scales with the output frame rate. I’m not entirely sure how ffmpeg scales the way it does, I’ll take a look and see if I find anything. Edit: I suppose for the ffmpeg case, dropping/duplicating frames does have some non-negligible overhead, as each frame has to be read over a pipe from the ffmpeg process, even if it is a duplicate. In light of that, the ffmpeg curve makes sense. |
Thanks for the notebooks @slimm ! I'm still a bit surprised that the 5s resampling example shows FFmpegVideoReader being much faster. Maybe what's going on is that for that video length and Hz, we can jump to keyframes and then do just a few frame decodings directly for faster reading? If that's the case, then this is for now not something that we support in the video reader in torchvision, but it's in the plans. |
Hi @slimm - thanks a lot for the notebooks, and sorry for the late reply - I've been OOF for the last few days. My initial thoughts are:
I'll test this out a bit further to see if they are doing something different to us, and if maybe resampling would be beneficial to implement in our low level API. Thanks again, |
🚀 Feature
Add support for (basic) FFmpeg filters for faster video pre-processing. In particular, rescaling and changing the frame rate would be useful when feeding in-the-wild videos through a trained model.
Motivation
I am working on a video loader to feed video frames to a model trained on the Kinetics 400 dataset and obtain predictions. The model is trained at a fixed resolution, on videos with a frame rate of 15fps. To support making predictions on videos from various sources, I at least need to resample them at the correct resolution and frame rate.
The current public API only supports decoding of video frames and trimming, but not any other pre-processing, so I need to do any such pre-processing in Python/PyTorch. Such an approach is visibly slower when compared to an implementation based on
ffmpeg-python
– a wrapper around the command lineffmpeg
. For some stats, see Additional context.Pitch
I would like to start a conversation on how best to bring such functionality to Torchvision. I imagine changing the resolution/fps is a common requirement for making predictions on videos, so I can see it as a useful feature of video I/O. Looking at the C++ code, there is already some support for requesting video frames of a certain resolution [1][2], but this functionality is only exposed in
torch.ops.video_reader.read_video_from_file
, not the public API. I can’t find anything similar for requesting a certain frame rate.Is this something that you would want to add to
torchvision.io.read_video
? What about totorchvision.io.VideoReader
? More generally, is there a plan to add support for all FFmpeg filters in the future? What would that interface look like?Additional context
I’ve done some initial comparisons between
torchvision.io.VideoReader
+ changing frame rate in Python +torch
rescaling on batches of 16 frames versus affmpeg-python
pipeline withscale
andfps
filters on a 854x480@30fps MP4 input video of ~261s. I’ve included the results below.Decoding the first seconds of a clip (output fps=15, output size=input size):
Decoding 1s of video for given start time (output fps=15, output size=input size):
Changing the framerate for the first 1s of video (output size=input size):
Changing the framerate for the first 5s of video (output size=input size):
Rescaling the first 1s of video (output fps=15):
Rescaling the first 1s of video with
bilinear-fast
FFMpeg algorithm (output fps=15):Rescaling the first 5s of video (output fps=15):
cc @bjuncek
The text was updated successfully, but these errors were encountered: