FFmpeg-based rescaling and frame rate

## 🚀 Feature
Add support for (basic) FFmpeg filters for faster video pre-processing. In particular, rescaling and changing the frame rate would be useful when feeding in-the-wild videos through a trained model.

## Motivation

I am working on a video loader to feed video frames to a model trained on the Kinetics 400 dataset and obtain predictions. The model is trained at a fixed resolution, on videos with a frame rate of 15fps. To support making predictions on videos from various sources, I at least need to resample them at the correct resolution and frame rate.

The current public API only supports decoding of video frames and trimming, but not any other pre-processing, so I need to do any such pre-processing in Python/PyTorch. Such an approach is visibly slower when compared to an implementation based on `ffmpeg-python` – a wrapper around the command line `ffmpeg`. For some stats, see Additional context.

## Pitch

I would like to start a conversation on how best to bring such functionality to Torchvision. I imagine changing the resolution/fps is a common requirement for making predictions on videos, so I can see it as a useful feature of video I/O. Looking at the C++ code, there is already some support for requesting video frames of a certain resolution [[1]](https://github.com/pytorch/vision/blob/74de51d6d478e289135d9274e6af550a9bfba137/torchvision/csrc/cpu/decoder/defs.h#L47)[[2]](https://github.com/pytorch/vision/blob/74de51d6d478e289135d9274e6af550a9bfba137/torchvision/csrc/cpu/video_reader/VideoReader.cpp#L464), but this functionality is only exposed in `torch.ops.video_reader.read_video_from_file`, not the public API. I can’t find anything similar for requesting a certain frame rate.

Is this something that you would want to add to `torchvision.io.read_video`? What about to `torchvision.io.VideoReader`? More generally, is there a plan to add support for all FFmpeg filters in the future? What would that interface look like?

## Additional context

I’ve done some initial comparisons between `torchvision.io.VideoReader` + changing frame rate in Python + `torch` rescaling on batches of 16 frames versus a `ffmpeg-python` pipeline with `scale` and `fps` filters on a 854x480@30fps MP4 input video of ~261s. I’ve included the results below.

#### Decoding the first <N> seconds of a clip (output fps=15, output size=input size):
![clip-length](https://user-images.githubusercontent.com/971313/98943077-dcc66680-24e6-11eb-8738-d47a9bf69dd4.png)


#### Decoding 1s of video for given start time (output fps=15, output size=input size):
![start-time](https://user-images.githubusercontent.com/971313/98943414-58c0ae80-24e7-11eb-92a6-3458f18b5ad5.png)


#### Changing the framerate for the first 1s of video (output size=input size):
![framerate-1s](https://user-images.githubusercontent.com/971313/98943495-81e13f00-24e7-11eb-87c7-1e3cfaf44709.png)


#### Changing the framerate for the first 5s of video (output size=input size):
![framerate-5s](https://user-images.githubusercontent.com/971313/98943507-886fb680-24e7-11eb-84f7-e9ecfb537d36.png)


#### Rescaling the first 1s of video (output fps=15):
![scale-1s](https://user-images.githubusercontent.com/971313/98943605-accb9300-24e7-11eb-8522-abae167f9045.png)


#### Rescaling the first 1s of video with `bilinear-fast` FFMpeg algorithm (output fps=15):
![scale-1s-fast](https://user-images.githubusercontent.com/971313/98944582-29ab3c80-24e9-11eb-8395-a1cd670c4238.png)


#### Rescaling the first 5s of video (output fps=15):
![scale-5s](https://user-images.githubusercontent.com/971313/98943649-b7862800-24e7-11eb-9e3c-fd6dcad7f5f1.png)




cc @bjuncek

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FFmpeg-based rescaling and frame rate #3016

🚀 Feature

Motivation

Pitch

Additional context

Decoding the first seconds of a clip (output fps=15, output size=input size):

Decoding 1s of video for given start time (output fps=15, output size=input size):

Changing the framerate for the first 1s of video (output size=input size):

Changing the framerate for the first 5s of video (output size=input size):

Rescaling the first 1s of video (output fps=15):

Rescaling the first 1s of video with `bilinear-fast` FFMpeg algorithm (output fps=15):

Rescaling the first 5s of video (output fps=15):

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

FFmpeg-based rescaling and frame rate #3016

Description

🚀 Feature

Motivation

Pitch

Additional context

Decoding the first seconds of a clip (output fps=15, output size=input size):

Decoding 1s of video for given start time (output fps=15, output size=input size):

Changing the framerate for the first 1s of video (output size=input size):

Changing the framerate for the first 5s of video (output size=input size):

Rescaling the first 1s of video (output fps=15):

Rescaling the first 1s of video with bilinear-fast FFMpeg algorithm (output fps=15):

Rescaling the first 5s of video (output fps=15):

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Rescaling the first 1s of video with `bilinear-fast` FFMpeg algorithm (output fps=15):