Skip to content

Reduce GPU memory consumption #3165

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mthrok opened this issue Mar 10, 2023 · 1 comment
Closed

Reduce GPU memory consumption #3165

mthrok opened this issue Mar 10, 2023 · 1 comment

Comments

@mthrok
Copy link
Collaborator

mthrok commented Mar 10, 2023

Running GPU decoding/encoding with ffmpeg command takes about 300MB of GPU memory.

ffmpeg -hide_banner -y -hwaccel cuvid -hwaccel_output_format cuda -c:v h264_cuvid  -i "NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4" -c:a copy -c:v h264_nvenc test.mp4
$ nvidia-smi --query-gpu=timestamp,name,pci.bus_id,driver_version,pstate,pcie.link.gen.max,pcie.link.gen.current,temperature.gpu,utilization.gpu,utilization.memory,memory.total,memory.free,memory.used --format=csv -l 1
2023/03/10 11:04:18.960, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P8, 4, 4, 35, 34 %, 19 %, 10240 MiB, 8292 MiB, 1785 MiB
2023/03/10 11:04:20.984, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P8, 4, 4, 34, 22 %, 19 %, 10240 MiB, 8292 MiB, 1785 MiB
2023/03/10 11:04:23.003, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 37, 3 %, 6 %, 10240 MiB, 7926 MiB, 2151 MiB
2023/03/10 11:04:25.017, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 38, 3 %, 5 %, 10240 MiB, 7923 MiB, 2154 MiB
2023/03/10 11:04:27.027, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 39, 4 %, 6 %, 10240 MiB, 7926 MiB, 2151 MiB
2023/03/10 11:04:29.043, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 39, 3 %, 5 %, 10240 MiB, 7926 MiB, 2151 MiB

Running the following script, which involves GPU decode/encode and YUV444P conversion takes about 1.3 GB (!) memory. (about 200 MB is from PyTorch CUDA Tensor)
It looks like some 700 MB is still occupied when decoding/encoding is not happening. We need to look into what is consuming so much memory.

600MB of active memory might be about device context, which we might be able to reuse among decoder/encoder. (#3160)

import time
from datetime import datetime

import torchaudio
from torchaudio.io import StreamReader, StreamWriter

# torchaudio.utils.ffmpeg_utils.set_log_level(36)

input = "NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4"
output = "foo.mp4"

def test():
    r = StreamReader(input)
    i = r.get_src_stream_info(r.default_video_stream)
    r.add_video_stream(1, decoder="h264_cuvid", hw_accel="cuda:0")

    w = StreamWriter(output)
    w.add_video_stream(
        height=i.height,
        width=i.width,
        frame_rate=i.frame_rate,
        format="yuv444p",
        encoder_format="yuv444p",
        encoder="h264_nvenc",
        hw_accel="cuda:0",
    )

    with w.open():
        num_frames = 0
        for chunk, in r.stream():
            num_frames += chunk.size(0)
            w.write_video_chunk(0, chunk)
    return num_frames


total_num_frames = 0
while True:
    t0 = time.monotonic()
    num_frames = test()
    elapsed = time.monotonic() - t0
    total_num_frames += num_frames
    time.sleep(10)
    print(f"{datetime.now()}: {elapsed} [sec], {num_frames} frames ({total_num_frames})")
2023/03/10 11:06:34.270, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P8, 4, 4, 33, 18 %, 18 %, 10240 MiB, 8292 MiB, 1785 MiB // BEFORE LAUNCH
2023/03/10 11:06:36.283, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P8, 4, 4, 33, 14 %, 26 %, 10240 MiB, 8292 MiB, 1785 MiB
2023/03/10 11:06:38.297, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P8, 4, 4, 33, 4 %, 58 %, 10240 MiB, 8289 MiB, 1788 MiB
2023/03/10 11:06:40.322, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 35, 52 %, 10 %, 10240 MiB, 7023 MiB, 3054 MiB // ENCODE/DECODE
2023/03/10 11:06:42.346, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 37, 60 %, 11 %, 10240 MiB, 7023 MiB, 3054 MiB
2023/03/10 11:06:44.358, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 38, 56 %, 11 %, 10240 MiB, 7024 MiB, 3053 MiB
2023/03/10 11:06:46.378, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 38, 57 %, 11 %, 10240 MiB, 7024 MiB, 3053 MiB
2023/03/10 11:06:48.389, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 39, 53 %, 10 %, 10240 MiB, 7024 MiB, 3053 MiB
2023/03/10 11:06:50.407, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 39, 58 %, 11 %, 10240 MiB, 7024 MiB, 3053 MiB
2023/03/10 11:06:52.421, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 40, 59 %, 11 %, 10240 MiB, 7024 MiB, 3053 MiB
2023/03/10 11:06:54.445, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 39, 7 %, 2 %, 10240 MiB, 7623 MiB, 2454 MiB // SLEEP
2023/03/10 11:06:56.464, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P3, 4, 4, 37, 0 %, 1 %, 10240 MiB, 7623 MiB, 2454 MiB
2023/03/10 11:06:58.477, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P5, 4, 4, 36, 1 %, 12 %, 10240 MiB, 7623 MiB, 2454 MiB
2023/03/10 11:07:00.489, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P8, 4, 4, 36, 0 %, 33 %, 10240 MiB, 7623 MiB, 2454 MiB
2023/03/10 11:07:02.502, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P8, 4, 4, 36, 0 %, 25 %, 10240 MiB, 7623 MiB, 2454 MiB
2023/03/10 11:07:04.564, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 37, 5 %, 22 %, 10240 MiB, 7155 MiB, 2922 MiB // ENCODE/DECODE
2023/03/10 11:07:06.580, NVIDIA GeForce RTX 3080, 00000000:61:00.0, 517.02, P2, 4, 4, 40, 60 %, 11 %, 10240 MiB, 7004 MiB, 3073 MiB
@mthrok
Copy link
Collaborator Author

mthrok commented Apr 3, 2023

After incorporating things like garbage collection, emptying CUDA memory and CUDA caching allocator, the monitored memory usage became stable. Running the following script for a week does not show a sign of memory leak, and the memory increase between peak and off-peak is about 230-350 MB, which roughly corresponds to similar ffmpeg command memory consumption, so I think the memory usage is fine.

import io
import os
import gc
import time
from datetime import datetime

os.environ["PYTORCH_NO_CUDA_MEMORY_CACHING"] = "1"

import torch
import torchaudio
from torchaudio.io import StreamReader
from torchaudio.io import StreamWriter

# torchaudio.utils.ffmpeg_utils.set_log_level(36)


input = "NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4"
output = "foo.mp4"

def test_both():
    r = StreamReader(input)
    i = r.get_src_stream_info(r.default_video_stream)
    r.add_video_stream(1, decoder="h264_cuvid", hw_accel="cuda:0")

    w = StreamWriter(output)
    w.add_video_stream(
        height=i.height,
        width=i.width,
        frame_rate=i.frame_rate,
        format="yuv444p",
        encoder_format="yuv444p",
        encoder="h264_nvenc",
        hw_accel="cuda:0",
    )

    with w.open():
        num_frames = 0
        for chunk, in r.stream():
            num_frames += chunk.size(0)
            w.write_video_chunk(0, chunk)
    del r
    del w
    return num_frames


def test_writer():
    chunk = torch.randint(
        255, (2, 3, 256, 256), dtype=torch.uint8, device=torch.device("cuda"))

    w = StreamWriter(io.BytesIO(), format="mp4")
    w.add_video_stream(
        height=256,
        width=256,
        frame_rate=30000/1001,
        format="yuv444p",
        encoder_format="yuv444p",
        encoder="h264_nvenc",
        hw_accel="cuda:0",
    )

    with w.open():
        num_frames = 0
        for _ in range(3000):
            num_frames += chunk.size(0)
            w.write_video_chunk(0, chunk)
    del w
    return num_frames    


total_num_frames = 0
while True:
    t0 = time.monotonic()
    num_frames = test_both()
    elapsed = time.monotonic() - t0
    total_num_frames += num_frames
    print(f"{datetime.now()}: {elapsed} [sec], {num_frames} frames ({total_num_frames})")
    torch.cuda.empty_cache()
    gc.collect()
    time.sleep(5)

@mthrok mthrok closed this as completed Apr 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant