torchvision.io.read_video() giving memory error #1446

anandijain · 2019-10-10T17:54:50Z

I have a video file that is 480 x 640, 20 fps, and 20400 frames.

I get a memory error trying to read the video using no endpoints.
I was wondering if there was a way of using io.read_video() that returns a DataLoader or is able to work without loading everything into tensors at once.

Ideally, I can load in as much as possible each time I read_video().
It seems a little cumbersome to have to try to find the point in the video that doesn't throw a memory error, and then iterate over that to get each sections of video that I want.

Code and error:

>>> vid = tv.io.read_video(fn)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/username/.local/lib/python3.6/site-packages/torchvision/io/video.py", line 200, in read_video
    vframes = torch.as_tensor(np.stack(vframes))
  File "/home/username/.local/lib/python3.6/site-packages/numpy/core/shape_base.py", line 423, in stack
    return _nx.concatenate(expanded_arrays, axis=axis, out=out)
MemoryError

Does anyone else have a solution to this?

Thanks!

The text was updated successfully, but these errors were encountered:

fmassa · 2019-10-14T09:59:22Z

@anandijain you are trying to load a big video in memory, and it doesn't fit your CPU memory.

You can try using VideoClips to return non-overlapping clips of fixed size

vision/torchvision/datasets/video_utils.py

Lines 45 to 69 in ed5b2dc

    
           class VideoClips(object): 
        
               """ 
        
               Given a list of video files, computes all consecutive subvideos of size 
        
               `clip_length_in_frames`, where the distance between each subvideo in the 
        
               same video is defined by `frames_between_clips`. 
        
               If `frame_rate` is specified, it will also resample all the videos to have 
        
               the same frame rate, and the clips will refer to this frame rate. 
        
               Creating this instance the first time is time-consuming, as it needs to 
        
               decode all the videos in `video_paths`. It is recommended that you 
        
               cache the results after instantiation of the class. 
        
               Recreating the clips for different clip lengths is fast, and can be done 
        
               with the `compute_clips` method. 
        
               Arguments: 
        
                   video_paths (List[str]): paths to the video files 
        
                   clip_length_in_frames (int): size of a clip in number of frames 
        
                   frames_between_clips (int): step (in frames) between each clip 
        
                   frame_rate (int, optional): if specified, it will resample the video 
        
                       so that it has `frame_rate`, and then the clips will be defined 
        
                       on the resampled video 
        
                   num_workers (int): how many subprocesses to use for data loading. 
        
                       0 means that the data will be loaded in the main process. (default: 0) 
        
               """

here is an example:

from torchvision.datasets.video_utils import VideoClips
video_clips = VideoClips([video_path], clip_length_in_frames=32, frames_between_clips=32)

This is what is used internally in the video datasets to return a Dataset compatible with DataLoader:

vision/torchvision/datasets/kinetics.py

Lines 50 to 78 in ed5b2dc

    
               self.video_clips = VideoClips( 
        
                   video_list, 
        
                   frames_per_clip, 
        
                   step_between_clips, 
        
                   frame_rate, 
        
                   _precomputed_metadata, 
        
                   num_workers=num_workers, 
        
                   _video_width=_video_width, 
        
                   _video_height=_video_height, 
        
                   _video_min_dimension=_video_min_dimension, 
        
                   _audio_samples=_audio_samples, 
        
               ) 
        
               self.transform = transform 
        
           @property 
        
           def metadata(self): 
        
               return self.video_clips.metadata 
        
           def __len__(self): 
        
               return self.video_clips.num_clips() 
        
           def __getitem__(self, idx): 
        
               video, audio, info, video_idx = self.video_clips.get_clip(idx) 
        
               label = self.samples[video_idx][1] 
        
               if self.transform is not None: 
        
                   video = self.transform(video) 
        
               return video, audio, label

(don't bother about the arguments starting with a _, they are private).

So this should be fairly easy to do once you use VideoClips. But a note of warning: you'll drop the last frames of the video with this approach, if they are not divisible by 32 (in this example).

fmassa closed this as completed Oct 14, 2019

fmassa added module: io question module: video labels Oct 14, 2019

mjunyent mentioned this issue Feb 13, 2020

VideoClips Assertion Error #1884

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torchvision.io.read_video() giving memory error #1446

torchvision.io.read_video() giving memory error #1446

anandijain commented Oct 10, 2019

fmassa commented Oct 14, 2019

torchvision.io.read_video() giving memory error #1446

torchvision.io.read_video() giving memory error #1446

Comments

anandijain commented Oct 10, 2019

fmassa commented Oct 14, 2019