-
Notifications
You must be signed in to change notification settings - Fork 12k
Eval bug: uncaught std::runtime_exception thrown in llama-server during tool use #13812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This might be due to malformated jinja template in unsloth's quant, xref: https://huggingface.co/unsloth/Qwen3-4B-GGUF/discussions/4 In case they update their gguf, this is the one I'm using: $ openssl sha256 ~/.cache/llama.cpp/unsloth_Qwen3-4B-GGUF_Qwen3-4B-Q8_0.gguf
SHA2-256(/home/bjorn/.cache/llama.cpp/unsloth_Qwen3-4B-GGUF_Qwen3-4B-Q8_0.gguf)= eed555233267a33c7e8ee31682762cc7751b3f6d224039086e0e846f05fffa5d |
Hi there, i might have a similar problem with the unsloth models. Running the tutorial here works: https://docs.unsloth.ai/basics/devstral-how-to-run-and-fine-tune#possible-vision-support In general, vision models work on my machine with the latest llama.cpp implementations: https://simonwillison.net/2025/May/10/llama-cpp-vision/
For the unsloth image section (which might be a tool usage? I am not 100% sure in this environment) I get an exeption thrown when I want to ask him something about a screenshot. The same screenshot works with the Here is the exception for the unsloth-devstral using the experimental vision part:
Also in the startup of the server, I see this note:
I am not seeing this issue in the Another issue might be, that
was not downloading the mmproj.gguf files.. I added them manually, but there is no respecting .json for that file. Maybe this might be also a problem in my case.. Thanks for the help! :) |
FYI, I could "fix" my issue by dowloading the model again like this:
without the But I still have the
Issue with this model. |
Uh oh!
There was an error while loading. Please reload this page.
Name and Version
$ /build/llama.cpp-debug/bin/llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
register_backend: registered backend CUDA (1 devices)
register_device: registered device CUDA0 (NVIDIA GeForce RTX 3090)
register_backend: registered backend RPC (0 devices)
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (AMD Ryzen 9 7950X 16-Core Processor)
load_backend: failed to find ggml_backend_init in /build/llama.cpp-debug/bin/libggml-cuda.so
load_backend: failed to find ggml_backend_init in /build/llama.cpp-debug/bin/libggml-rpc.so
load_backend: failed to find ggml_backend_init in /build/llama.cpp-debug/bin/libggml-cpu.so
version: 5498 (6f180b9)
built with cc (Debian 12.2.0-14+deb12u1) 12.2.0 for x86_64-linux-gnu
Operating systems
Linux
GGML backends
CUDA
Hardware
Ryzen 7950X + 3090
Models
Qwen3-4B
Problem description & steps to reproduce
For certain requests on /v1/chat/completions end-point, with tool use I get an uncaught exception.
To reproduce this, run
run.sh
from this ephemeral repo:https://github.com/bjodah/bug-reproducer-llamacpp-partial-parse/tree/main
First Bad Commit
I have not bisected this.
Relevant log output
Output from `llama-server --log-file /logs/llamacpp-Qwen3-4B.log --port 11034 --hf-repo unsloth/Qwen3-4B-GGUF:Q8_0 --n-gpu-layers 999 --jinja --cache-type-k q8_0 --ctx-size 32768 --samplers 'top_k;dry;min_p;temperature;top_p' --min-p 0.005 --top-p 0.97 --top-k 40 --temp 0.7 --dry-multiplier 0.7 --dry-allowed-length 4 --dry-penalty-last-n 2048 --presence-penalty 0.05 --frequency-penalty 0.005 --repeat-penalty 1.01 --repeat-last-n 16`
GDB debugging session
So the exception is thrown from
llama.cpp/common/chat.cpp
Line 1923 in 6f180b9
Since it is throwing instead of returning e.g. HTTP status code 500, I guess this constitutes a bug?
That
<think>\n\n
part before<tool_call>
looks suspicious, no?The text was updated successfully, but these errors were encountered: