2022: state of video IO in torchvision

There have been many developments over the last couple of months with a big push in 2022H1 to get things closed up (mainly by @prabhat00155 and @datumbox). Here I'll try to summarize what is the current state of things.

## Features (current, in-dev)

At the moment, `torchvision` has **two API's** one can use for video-reading.
1. `read_video` video API (stable) -- this is a legacy video-reading solution that we're looking to move away from. However, due to external use, we continue to support and patch it. It supports `pyav` and `video_reader` backends.
2. `VideoReader` fine-grained API (prototypem #2660) -- we're moving towards this as a goal for 2022. The API itself is finished, however, due to issues with various backends it still remains unused (see the installation issue below). Supports `video_reader` and `GPU` backends.

Furthermore, we also have three **backends** for video reading.
1. `pyav` -- naive extension of pyAV capabilities
2. `video_reader` -- our own C++ implementation that allows video IO to be torchscriptable. If JIT requirement is dropped, might be deprecated despite minor speed improvements over `pyav`.
3. `GPU` -- highly experimental and not-yet properly tested. Maintenance and further development will depend on the demand from customers and community. 

Overall goal in 2022 is to migrate all APIs (and prototype datasets) to the `VideoReader` API, and hopefully depricate `read_video` as much as possible.

 Related tasks include (will be updated):

- [ ] Datasets to use new API #5250 
- [ ] Reference scripts to use new API


## Currently known issues and enhancements needed

Probably the biggest issue plaguing video is **installation** (see #4260 for some reference). If user wants to install ffmpeg or GPU backends and support for `VideoReader` API, they need to install torchvision from source, and in the case of GPU also download proprietary drivers from NVIDIA. This process should be properly documented until a better/alternative solution is found. 

- [ ] Add proper build documentation #3460

Due to the lack of users, the real-world bug reports have been scarce. Here is the (non-exhaustive) list of known issues, and their progress, sorted by topic, with additional comments in italics if applicable.

#### General

- [ ] Change CPU decoder output frames to use ITU709 colour space #5245 -- _done, but not merged_
- [x] Assertion error during dataset creation #4839 #4112 #4357 #2184 #1884
- [ ] Mismatch in audio frames returned by pyav and video reader #3986 -- _needs revisiting based on latest improvements and bugfixes_


#### `video_reader` backend and `VideoReader` API

- [ ] new video reading API crash #5419 (can't reproduce -- help welcome)
- [ ] read_video_from_file() causes seg fault with Python 3.9 #4430 -- _flakey, can't reproduce on all machines_
- [ ] video_reader test crashes on Windows #4429
- [ ] Black band at certain videos #3534 -- _suspected issue in FFMPEG, needs revisiting_


#### GPU decoding issues and enhancements (note, these are low-pri due to lack of developers and road-map changes so we'll be relatively slow in fixing these):

- [ ] GPU VideoReader not working #5702
- [ ] video classification experiments using GPU decoder #5252
- [ ] video classification reference script with GPU decoder support #5251
- [ ] GPU decoder refactoring #5148
- [ ] Run GPU decoding tests in CI #5147
- [ ] Support reading video from memory #5142
- [ ] Return pts per frame after video decoding on GPU #5140


## Archived feature requests

- [ ] FFmpeg-based rescaling and frame rate #3016 -- _enhancement we've put on pause due to low adoption_
- [Feat] Camera Stream API proposal #2920
- Contribution: select classes in UCF101 dataset #1791


cc @datumbox for visibility


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

2022: state of video IO in torchvision #5720

Features (current, in-dev)

Currently known issues and enhancements needed

General

`video_reader` backend and `VideoReader` API

GPU decoding issues and enhancements (note, these are low-pri due to lack of developers and road-map changes so we'll be relatively slow in fixing these):

Archived feature requests

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

2022: state of video IO in torchvision #5720

Description

Features (current, in-dev)

Currently known issues and enhancements needed

General

video_reader backend and VideoReader API

GPU decoding issues and enhancements (note, these are low-pri due to lack of developers and road-map changes so we'll be relatively slow in fixing these):

Archived feature requests

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`video_reader` backend and `VideoReader` API