Skip to content

torchvision.io.read_video() giving memory error #1446

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
anandijain opened this issue Oct 10, 2019 · 1 comment
Closed

torchvision.io.read_video() giving memory error #1446

anandijain opened this issue Oct 10, 2019 · 1 comment

Comments

@anandijain
Copy link

I have a video file that is 480 x 640, 20 fps, and 20400 frames.

I get a memory error trying to read the video using no endpoints.
I was wondering if there was a way of using io.read_video() that returns a DataLoader or is able to work without loading everything into tensors at once.

Ideally, I can load in as much as possible each time I read_video().
It seems a little cumbersome to have to try to find the point in the video that doesn't throw a memory error, and then iterate over that to get each sections of video that I want.

Code and error:

>>> vid = tv.io.read_video(fn)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/username/.local/lib/python3.6/site-packages/torchvision/io/video.py", line 200, in read_video
    vframes = torch.as_tensor(np.stack(vframes))
  File "/home/username/.local/lib/python3.6/site-packages/numpy/core/shape_base.py", line 423, in stack
    return _nx.concatenate(expanded_arrays, axis=axis, out=out)
MemoryError

Does anyone else have a solution to this?

Thanks!

@fmassa
Copy link
Member

fmassa commented Oct 14, 2019

@anandijain you are trying to load a big video in memory, and it doesn't fit your CPU memory.

You can try using VideoClips to return non-overlapping clips of fixed size

class VideoClips(object):
"""
Given a list of video files, computes all consecutive subvideos of size
`clip_length_in_frames`, where the distance between each subvideo in the
same video is defined by `frames_between_clips`.
If `frame_rate` is specified, it will also resample all the videos to have
the same frame rate, and the clips will refer to this frame rate.
Creating this instance the first time is time-consuming, as it needs to
decode all the videos in `video_paths`. It is recommended that you
cache the results after instantiation of the class.
Recreating the clips for different clip lengths is fast, and can be done
with the `compute_clips` method.
Arguments:
video_paths (List[str]): paths to the video files
clip_length_in_frames (int): size of a clip in number of frames
frames_between_clips (int): step (in frames) between each clip
frame_rate (int, optional): if specified, it will resample the video
so that it has `frame_rate`, and then the clips will be defined
on the resampled video
num_workers (int): how many subprocesses to use for data loading.
0 means that the data will be loaded in the main process. (default: 0)
"""
here is an example:

from torchvision.datasets.video_utils import VideoClips
video_clips = VideoClips([video_path], clip_length_in_frames=32, frames_between_clips=32)

This is what is used internally in the video datasets to return a Dataset compatible with DataLoader:

self.video_clips = VideoClips(
video_list,
frames_per_clip,
step_between_clips,
frame_rate,
_precomputed_metadata,
num_workers=num_workers,
_video_width=_video_width,
_video_height=_video_height,
_video_min_dimension=_video_min_dimension,
_audio_samples=_audio_samples,
)
self.transform = transform
@property
def metadata(self):
return self.video_clips.metadata
def __len__(self):
return self.video_clips.num_clips()
def __getitem__(self, idx):
video, audio, info, video_idx = self.video_clips.get_clip(idx)
label = self.samples[video_idx][1]
if self.transform is not None:
video = self.transform(video)
return video, audio, label

(don't bother about the arguments starting with a _, they are private).

So this should be fairly easy to do once you use VideoClips. But a note of warning: you'll drop the last frames of the video with this approach, if they are not divisible by 32 (in this example).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants