Description
Expected Behavior
From the issue #302 , I expected the model to be unloaded with the following function:
def unload_model():
global llm
llama_free_model(llm)
# Delete the model object
del llm
llm = None # Ensure no reference remains
# Explicitly invoke the garbage collector
gc.collect()
return {"message": "Model unloaded successfully"}
However, there are two problems here:
1 - Using llama_free_model with the object llm (which is conventionally loaded) is resulting in this:
Traceback (most recent call last):
File "/run/media/myserver/5dcc41df-7194-4e57-a28f-833dc5ce81bb/llamacpp/app.py", line 48, in <module>
llama_free_model(llm)
ctypes.ArgumentError: argument 1: TypeError: wrong type
'llm' is generated with this:
llm = Llama(
model_path=model_path,
chat_handler=chat_handler,
n_gpu_layers=gpu_layers,
n_ctx=n_ctx
)
2 - Even after deleting the object, assigning as None and invoking the garbage collection, the VRAM is still not freed. The VRAM only gets cleared after I kill the app along all of its processes and threads.
Current Behavior
1- llama_free_model does not work.
2 - Garbage collection not freeing up VRAM.
Environment and Context
I tried this on both an Arch Linux setup with an RTX 3090 and a Windows laptop with an eGPU. This problem was consistent on those two different OSes and different hardware setups.
- Physical (or virtual) hardware you are using, e.g. for Linux:
AMD Ryzen 7 2700 Eight-Core Processor
NVIDIA GeForce RTX 3090
- Operating System, e.g. for Linux:
Arch Linux 6.8.9-arch1-1
Windows 11
Python 3.12.3
GNU Make 4.4.1
g++ (GCC) 13.2.1 20240417
Failure Information (for bugs)
Traceback (most recent call last):
File "/run/media/myserver/5dcc41df-7194-4e57-a28f-833dc5ce81bb/llamacpp/app.py", line 48, in <module>
llama_free_model(llm)
ctypes.ArgumentError: argument 1: TypeError: wrong type
Steps to Reproduce
Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.
- Perform a free install of llama-cpp-python, with CUDA support
- Write a code snippet to load the model as usual
- Try to use llama_free_model to unload the model, or delete the model object and invoke garbage collection
- Make sure to keep the app running afterwards and check VRAM with nvidia-smi