-
Notifications
You must be signed in to change notification settings - Fork 694
Reduce GPU memory consumption #3165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
Comments
8 tasks
After incorporating things like garbage collection, emptying CUDA memory and CUDA caching allocator, the monitored memory usage became stable. Running the following script for a week does not show a sign of memory leak, and the memory increase between peak and off-peak is about 230-350 MB, which roughly corresponds to similar import io
import os
import gc
import time
from datetime import datetime
os.environ["PYTORCH_NO_CUDA_MEMORY_CACHING"] = "1"
import torch
import torchaudio
from torchaudio.io import StreamReader
from torchaudio.io import StreamWriter
# torchaudio.utils.ffmpeg_utils.set_log_level(36)
input = "NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4"
output = "foo.mp4"
def test_both():
r = StreamReader(input)
i = r.get_src_stream_info(r.default_video_stream)
r.add_video_stream(1, decoder="h264_cuvid", hw_accel="cuda:0")
w = StreamWriter(output)
w.add_video_stream(
height=i.height,
width=i.width,
frame_rate=i.frame_rate,
format="yuv444p",
encoder_format="yuv444p",
encoder="h264_nvenc",
hw_accel="cuda:0",
)
with w.open():
num_frames = 0
for chunk, in r.stream():
num_frames += chunk.size(0)
w.write_video_chunk(0, chunk)
del r
del w
return num_frames
def test_writer():
chunk = torch.randint(
255, (2, 3, 256, 256), dtype=torch.uint8, device=torch.device("cuda"))
w = StreamWriter(io.BytesIO(), format="mp4")
w.add_video_stream(
height=256,
width=256,
frame_rate=30000/1001,
format="yuv444p",
encoder_format="yuv444p",
encoder="h264_nvenc",
hw_accel="cuda:0",
)
with w.open():
num_frames = 0
for _ in range(3000):
num_frames += chunk.size(0)
w.write_video_chunk(0, chunk)
del w
return num_frames
total_num_frames = 0
while True:
t0 = time.monotonic()
num_frames = test_both()
elapsed = time.monotonic() - t0
total_num_frames += num_frames
print(f"{datetime.now()}: {elapsed} [sec], {num_frames} frames ({total_num_frames})")
torch.cuda.empty_cache()
gc.collect()
time.sleep(5) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Running GPU decoding/encoding with
ffmpeg
command takes about 300MB of GPU memory.Running the following script, which involves GPU decode/encode and YUV444P conversion takes about 1.3 GB (!) memory. (about 200 MB is from PyTorch CUDA Tensor)
It looks like some 700 MB is still occupied when decoding/encoding is not happening. We need to look into what is consuming so much memory.
600MB of active memory might be about device context, which we might be able to reuse among decoder/encoder. (#3160)
The text was updated successfully, but these errors were encountered: