Reduce GPU memory consumption #3165

mthrok · 2023-03-10T16:14:20Z

Running GPU decoding/encoding with ffmpeg command takes about 300MB of GPU memory.

ffmpeg -hide_banner -y -hwaccel cuvid -hwaccel_output_format cuda -c:v h264_cuvid  -i "NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4" -c:a copy -c:v h264_nvenc test.mp4

$ nvidia-smi --query-gpu=timestamp,name,pci.bus_id,driver_version,pstate,pcie.link.gen.max,pcie.link.gen.current,temperature.gpu,utilization.gpu,utilization.memory,memory.total,memory.free,memory.used --format=csv -l 1
2023/03/10 11:04:18.960, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P8, 4, 4, 35, 34 %, 19 %, 10240 MiB, 8292 MiB, 1785 MiB
2023/03/10 11:04:20.984, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P8, 4, 4, 34, 22 %, 19 %, 10240 MiB, 8292 MiB, 1785 MiB
2023/03/10 11:04:23.003, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 37, 3 %, 6 %, 10240 MiB, 7926 MiB, 2151 MiB
2023/03/10 11:04:25.017, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 38, 3 %, 5 %, 10240 MiB, 7923 MiB, 2154 MiB
2023/03/10 11:04:27.027, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 39, 4 %, 6 %, 10240 MiB, 7926 MiB, 2151 MiB
2023/03/10 11:04:29.043, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 39, 3 %, 5 %, 10240 MiB, 7926 MiB, 2151 MiB

Running the following script, which involves GPU decode/encode and YUV444P conversion takes about 1.3 GB (!) memory. (about 200 MB is from PyTorch CUDA Tensor)
It looks like some 700 MB is still occupied when decoding/encoding is not happening. We need to look into what is consuming so much memory.

600MB of active memory might be about device context, which we might be able to reuse among decoder/encoder. (#3160)

import time
from datetime import datetime

import torchaudio
from torchaudio.io import StreamReader, StreamWriter

# torchaudio.utils.ffmpeg_utils.set_log_level(36)

input = "NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4"
output = "foo.mp4"

def test():
    r = StreamReader(input)
    i = r.get_src_stream_info(r.default_video_stream)
    r.add_video_stream(1, decoder="h264_cuvid", hw_accel="cuda:0")

    w = StreamWriter(output)
    w.add_video_stream(
        height=i.height,
        width=i.width,
        frame_rate=i.frame_rate,
        format="yuv444p",
        encoder_format="yuv444p",
        encoder="h264_nvenc",
        hw_accel="cuda:0",
    )

    with w.open():
        num_frames = 0
        for chunk, in r.stream():
            num_frames += chunk.size(0)
            w.write_video_chunk(0, chunk)
    return num_frames


total_num_frames = 0
while True:
    t0 = time.monotonic()
    num_frames = test()
    elapsed = time.monotonic() - t0
    total_num_frames += num_frames
    time.sleep(10)
    print(f"{datetime.now()}: {elapsed} [sec], {num_frames} frames ({total_num_frames})")

2023/03/10 11:06:34.270, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P8, 4, 4, 33, 18 %, 18 %, 10240 MiB, 8292 MiB, 1785 MiB // BEFORE LAUNCH
2023/03/10 11:06:36.283, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P8, 4, 4, 33, 14 %, 26 %, 10240 MiB, 8292 MiB, 1785 MiB
2023/03/10 11:06:38.297, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P8, 4, 4, 33, 4 %, 58 %, 10240 MiB, 8289 MiB, 1788 MiB
2023/03/10 11:06:40.322, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 35, 52 %, 10 %, 10240 MiB, 7023 MiB, 3054 MiB // ENCODE/DECODE
2023/03/10 11:06:42.346, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 37, 60 %, 11 %, 10240 MiB, 7023 MiB, 3054 MiB
2023/03/10 11:06:44.358, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 38, 56 %, 11 %, 10240 MiB, 7024 MiB, 3053 MiB
2023/03/10 11:06:46.378, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 38, 57 %, 11 %, 10240 MiB, 7024 MiB, 3053 MiB
2023/03/10 11:06:48.389, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 39, 53 %, 10 %, 10240 MiB, 7024 MiB, 3053 MiB
2023/03/10 11:06:50.407, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 39, 58 %, 11 %, 10240 MiB, 7024 MiB, 3053 MiB
2023/03/10 11:06:52.421, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 40, 59 %, 11 %, 10240 MiB, 7024 MiB, 3053 MiB
2023/03/10 11:06:54.445, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 39, 7 %, 2 %, 10240 MiB, 7623 MiB, 2454 MiB // SLEEP
2023/03/10 11:06:56.464, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P3, 4, 4, 37, 0 %, 1 %, 10240 MiB, 7623 MiB, 2454 MiB
2023/03/10 11:06:58.477, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P5, 4, 4, 36, 1 %, 12 %, 10240 MiB, 7623 MiB, 2454 MiB
2023/03/10 11:07:00.489, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P8, 4, 4, 36, 0 %, 33 %, 10240 MiB, 7623 MiB, 2454 MiB
2023/03/10 11:07:02.502, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P8, 4, 4, 36, 0 %, 25 %, 10240 MiB, 7623 MiB, 2454 MiB
2023/03/10 11:07:04.564, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 37, 5 %, 22 %, 10240 MiB, 7155 MiB, 2922 MiB // ENCODE/DECODE
2023/03/10 11:07:06.580, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 40, 60 %, 11 %, 10240 MiB, 7004 MiB, 3073 MiB

The text was updated successfully, but these errors were encountered:

mthrok · 2023-04-03T14:03:31Z

After incorporating things like garbage collection, emptying CUDA memory and CUDA caching allocator, the monitored memory usage became stable. Running the following script for a week does not show a sign of memory leak, and the memory increase between peak and off-peak is about 230-350 MB, which roughly corresponds to similar ffmpeg command memory consumption, so I think the memory usage is fine.

import io
import os
import gc
import time
from datetime import datetime

os.environ["PYTORCH_NO_CUDA_MEMORY_CACHING"] = "1"

import torch
import torchaudio
from torchaudio.io import StreamReader
from torchaudio.io import StreamWriter

# torchaudio.utils.ffmpeg_utils.set_log_level(36)


input = "NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4"
output = "foo.mp4"

def test_both():
    r = StreamReader(input)
    i = r.get_src_stream_info(r.default_video_stream)
    r.add_video_stream(1, decoder="h264_cuvid", hw_accel="cuda:0")

    w = StreamWriter(output)
    w.add_video_stream(
        height=i.height,
        width=i.width,
        frame_rate=i.frame_rate,
        format="yuv444p",
        encoder_format="yuv444p",
        encoder="h264_nvenc",
        hw_accel="cuda:0",
    )

    with w.open():
        num_frames = 0
        for chunk, in r.stream():
            num_frames += chunk.size(0)
            w.write_video_chunk(0, chunk)
    del r
    del w
    return num_frames


def test_writer():
    chunk = torch.randint(
        255, (2, 3, 256, 256), dtype=torch.uint8, device=torch.device("cuda"))

    w = StreamWriter(io.BytesIO(), format="mp4")
    w.add_video_stream(
        height=256,
        width=256,
        frame_rate=30000/1001,
        format="yuv444p",
        encoder_format="yuv444p",
        encoder="h264_nvenc",
        hw_accel="cuda:0",
    )

    with w.open():
        num_frames = 0
        for _ in range(3000):
            num_frames += chunk.size(0)
            w.write_video_chunk(0, chunk)
    del w
    return num_frames    


total_num_frames = 0
while True:
    t0 = time.monotonic()
    num_frames = test_both()
    elapsed = time.monotonic() - t0
    total_num_frames += num_frames
    print(f"{datetime.now()}: {elapsed} [sec], {num_frames} frames ({total_num_frames})")
    torch.cuda.empty_cache()
    gc.collect()
    time.sleep(5)

mthrok mentioned this issue Mar 10, 2023

List of feature requests received so far for StreamReader/Writer #3139

Open

8 tasks

mthrok added module: IO improvement labels Mar 10, 2023

mthrok closed this as completed Apr 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce GPU memory consumption #3165

Reduce GPU memory consumption #3165

mthrok commented Mar 10, 2023 •

edited

Loading

mthrok commented Apr 3, 2023 •

edited

Loading

Uh oh!

Reduce GPU memory consumption #3165

Reduce GPU memory consumption #3165

Comments

mthrok commented Mar 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

mthrok commented Apr 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mthrok commented Mar 10, 2023 •

edited

Loading

mthrok commented Apr 3, 2023 •

edited

Loading