-
Notifications
You must be signed in to change notification settings - Fork 12.8k
Closed
Labels
Description
Name and Version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes
version: 5519 (a682474)
built with cc (Debian 12.2.0-14+deb12u1) 12.2.0 for x86_64-linux-gnu
Operating systems
Linux
GGML backends
CUDA
Hardware
Ryzen 5 3600 + RTX 5090
Models
Qwen3 32B q5
Problem description & steps to reproduce
./llama-server -m ~/llm/models/Qwen3-32B-Q5_K_S.gguf -c 16384 -ngl 999 --host 0.0.0.0 --port 5000 --jinja --api-key
This is how I run the program, the issue happens every so often and I can't (in the limited attempts I tried) replicate it with llama-cli
First Bad Commit
No response
Relevant log output
terminate called after throwing an instance of 'std::runtime_error'
what(): Invalid diff: '<think>Okay, the user mentioned that Docker is taking up a lot of space and they want to delete unused volumes. Now they're saying that something else might be using all the storage and they don't know if it's Docker. I need to help them figure out what's consuming their disk space.
teleprint-me and ChrisPVille