Description
There have been many developments over the last couple of months with a big push in 2022H1 to get things closed up (mainly by @prabhat00155 and @datumbox). Here I'll try to summarize what is the current state of things.
Features (current, in-dev)
At the moment, torchvision
has two API's one can use for video-reading.
read_video
video API (stable) -- this is a legacy video-reading solution that we're looking to move away from. However, due to external use, we continue to support and patch it. It supportspyav
andvideo_reader
backends.VideoReader
fine-grained API (prototypem New video API Proposal #2660) -- we're moving towards this as a goal for 2022. The API itself is finished, however, due to issues with various backends it still remains unused (see the installation issue below). Supportsvideo_reader
andGPU
backends.
Furthermore, we also have three backends for video reading.
pyav
-- naive extension of pyAV capabilitiesvideo_reader
-- our own C++ implementation that allows video IO to be torchscriptable. If JIT requirement is dropped, might be deprecated despite minor speed improvements overpyav
.GPU
-- highly experimental and not-yet properly tested. Maintenance and further development will depend on the demand from customers and community.
Overall goal in 2022 is to migrate all APIs (and prototype datasets) to the VideoReader
API, and hopefully depricate read_video
as much as possible.
Related tasks include (will be updated):
- Datasets to use new API Add kinetics dataset to use new video reading API #5250
- Reference scripts to use new API
Currently known issues and enhancements needed
Probably the biggest issue plaguing video is installation (see #4260 for some reference). If user wants to install ffmpeg or GPU backends and support for VideoReader
API, they need to install torchvision from source, and in the case of GPU also download proprietary drivers from NVIDIA. This process should be properly documented until a better/alternative solution is found.
- Add proper build documentation README section for video backends #3460
Due to the lack of users, the real-world bug reports have been scarce. Here is the (non-exhaustive) list of known issues, and their progress, sorted by topic, with additional comments in italics if applicable.
General
- Change CPU decoder output frames to use ITU709 colour space Change CPU decoder output frames to use ITU709 colour space #5245 -- done, but not merged
- Assertion error during dataset creation Assertion error during kinetics400 validation #4839 UCF101: Dataloader Fail on assertion #4112 Using VideoClips for loading a video dataset #4357 VideoClips with video_reader backend fails at loading clip with idx=0 if clip_length_in_frames=1 #2184 VideoClips Assertion Error #1884
- Mismatch in audio frames returned by pyav and video reader Mismatch in audio frames returned by pyav and video reader #3986 -- needs revisiting based on latest improvements and bugfixes
video_reader
backend and VideoReader
API
- new video reading API crash new video reading API crash #5419 (can't reproduce -- help welcome)
- read_video_from_file() causes seg fault with Python 3.9 read_video_from_file() causes seg fault with Python 3.9 #4430 -- flakey, can't reproduce on all machines
- video_reader test crashes on Windows video_reader test crashes on Windows #4429
- Black band at certain videos Black band at certain videos #3534 -- suspected issue in FFMPEG, needs revisiting
GPU decoding issues and enhancements (note, these are low-pri due to lack of developers and road-map changes so we'll be relatively slow in fixing these):
- GPU VideoReader not working GPU VideoReader not working #5702
- video classification experiments using GPU decoder video classification experiments using GPU decoder #5252
- video classification reference script with GPU decoder support video classification reference script with GPU decoder support #5251
- GPU decoder refactoring GPU decoder refactoring #5148
- Run GPU decoding tests in CI Run GPU decoding tests in CI #5147
- Support reading video from memory Support reading video from memory #5142
- Return pts per frame after video decoding on GPU Return pts per frame after video decoding on GPU #5140
Archived feature requests
- FFmpeg-based rescaling and frame rate FFmpeg-based rescaling and frame rate #3016 -- enhancement we've put on pause due to low adoption
- [Feat] Camera Stream API proposal [Feat] Camera Stream API proposal #2920
- Contribution: select classes in UCF101 dataset Contribution: select classes in UCF101 dataset #1791
cc @datumbox for visibility