Skip to content

2022: state of video IO in torchvision #5720

Open
@bjuncek

Description

@bjuncek

There have been many developments over the last couple of months with a big push in 2022H1 to get things closed up (mainly by @prabhat00155 and @datumbox). Here I'll try to summarize what is the current state of things.

Features (current, in-dev)

At the moment, torchvision has two API's one can use for video-reading.

  1. read_video video API (stable) -- this is a legacy video-reading solution that we're looking to move away from. However, due to external use, we continue to support and patch it. It supports pyav and video_reader backends.
  2. VideoReader fine-grained API (prototypem New video API Proposal #2660) -- we're moving towards this as a goal for 2022. The API itself is finished, however, due to issues with various backends it still remains unused (see the installation issue below). Supports video_reader and GPU backends.

Furthermore, we also have three backends for video reading.

  1. pyav -- naive extension of pyAV capabilities
  2. video_reader -- our own C++ implementation that allows video IO to be torchscriptable. If JIT requirement is dropped, might be deprecated despite minor speed improvements over pyav.
  3. GPU -- highly experimental and not-yet properly tested. Maintenance and further development will depend on the demand from customers and community.

Overall goal in 2022 is to migrate all APIs (and prototype datasets) to the VideoReader API, and hopefully depricate read_video as much as possible.

Related tasks include (will be updated):

Currently known issues and enhancements needed

Probably the biggest issue plaguing video is installation (see #4260 for some reference). If user wants to install ffmpeg or GPU backends and support for VideoReader API, they need to install torchvision from source, and in the case of GPU also download proprietary drivers from NVIDIA. This process should be properly documented until a better/alternative solution is found.

Due to the lack of users, the real-world bug reports have been scarce. Here is the (non-exhaustive) list of known issues, and their progress, sorted by topic, with additional comments in italics if applicable.

General

video_reader backend and VideoReader API

GPU decoding issues and enhancements (note, these are low-pri due to lack of developers and road-map changes so we'll be relatively slow in fixing these):

Archived feature requests

cc @datumbox for visibility

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions