-
Notifications
You must be signed in to change notification settings - Fork 422
[BUG]: BoolQ Example throws error #1120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
also, I was in here because I'm still trying to wrap my head around the KV slots with batching. I removed the small batches and reworked it so that if the KV gets exhausted we back off slowly the number of concurrent prompts and added a bit of a visualization using the KV cache debug view. I'm still working through in my head the pros and cons of this approach vs the explicit creating of the batches before processing. My approach is pretty naive but functional - if I get a NoKvSlot I first remove the largest conversation and requeue then I reduce the max number of concurrent prompts. Commit is here - phil-scott-78@b4219be I could create a new example or update the existing BoolQ if there is any interest. Screen.Recording.2025-03-02.132053.mp4 |
I had a hunch this was due to a bad grammar, so I rewrote it to:
i.e. allow any text at the end of the answer. Then I rewrote the validation logic to: var str = _decoder.Read().Trim().ToLowerInvariant();
var result = str.StartsWith("true") || str.StartsWith("yes"); This seems to workaround the issue. I'm not sure if this is a bug in llama.cpp, or if it just a malformed grammar. Edit: Never mind, it got most of the way through the test set and then failed with the same assertion. |
Interesting, it's falling right away with the model I'm using (qwen 2.5 7b). I wonder if allowing extra whitespace is giving the successful runs more tokens which gives Accept a chance to succeed, but with the grammar or a model that doesn't add whitespace when allowed it only has the one token which leads to the sampler failing. |
Description
The BoolQ fails with an error in llama.cpp of
GGML_ASSERT(!grammar->stacks.empty())
This happens with both theGreedySamplingPipeline
and also if I change it toDefaultSamplingPipeline
Easy fix is to just remove the
Accept
callReproduction Steps
Just give the sample a run. Any model, any settings will trigger it.
Environment & Configuration
Known Workarounds
Remove the Accept call
The text was updated successfully, but these errors were encountered: