Skip to content

Models are failing to be properly unloaded and freeing up VRAM #1442

Open
@Baquara

Description

@Baquara

Expected Behavior

From the issue #302 , I expected the model to be unloaded with the following function:


def unload_model():
    global llm
    llama_free_model(llm)
    # Delete the model object
    del llm
    llm = None  # Ensure no reference remains
    
    # Explicitly invoke the garbage collector
    gc.collect()

    return {"message": "Model unloaded successfully"}

However, there are two problems here:

1 - Using llama_free_model with the object llm (which is conventionally loaded) is resulting in this:

Traceback (most recent call last):
  File "/run/media/myserver/5dcc41df-7194-4e57-a28f-833dc5ce81bb/llamacpp/app.py", line 48, in <module>
    llama_free_model(llm)
ctypes.ArgumentError: argument 1: TypeError: wrong type

'llm' is generated with this:

llm = Llama(
    model_path=model_path,
    chat_handler=chat_handler,
    n_gpu_layers=gpu_layers,
    n_ctx=n_ctx
)

2 - Even after deleting the object, assigning as None and invoking the garbage collection, the VRAM is still not freed. The VRAM only gets cleared after I kill the app along all of its processes and threads.

Current Behavior

1- llama_free_model does not work.
2 - Garbage collection not freeing up VRAM.

Environment and Context

I tried this on both an Arch Linux setup with an RTX 3090 and a Windows laptop with an eGPU. This problem was consistent on those two different OSes and different hardware setups.

  • Physical (or virtual) hardware you are using, e.g. for Linux:

AMD Ryzen 7 2700 Eight-Core Processor
NVIDIA GeForce RTX 3090

  • Operating System, e.g. for Linux:

Arch Linux 6.8.9-arch1-1
Windows 11

Python 3.12.3
GNU Make 4.4.1
g++ (GCC) 13.2.1 20240417

Failure Information (for bugs)

Traceback (most recent call last):
  File "/run/media/myserver/5dcc41df-7194-4e57-a28f-833dc5ce81bb/llamacpp/app.py", line 48, in <module>
    llama_free_model(llm)
ctypes.ArgumentError: argument 1: TypeError: wrong type

Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

  1. Perform a free install of llama-cpp-python, with CUDA support
  2. Write a code snippet to load the model as usual
  3. Try to use llama_free_model to unload the model, or delete the model object and invoke garbage collection
  4. Make sure to keep the app running afterwards and check VRAM with nvidia-smi

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions