Closed
Description
After disposing everything the GPU memory is still not freed. When running the same code for the second time it crashes with a message that it cannot allocate enough memory. There are some things which are not freed from GPU memory.
You can test the issue by using ggml_backend_cuda_get_device_memory(0, out freemem, out totalmem);
before and after using a llama model. freemem
will show how much memory is still allocated.
See this also: SciSharp/LLamaSharp#575