Skip to content

[BUG]: BoolQ Example throws error #1120

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
phil-scott-78 opened this issue Mar 2, 2025 · 3 comments
Open

[BUG]: BoolQ Example throws error #1120

phil-scott-78 opened this issue Mar 2, 2025 · 3 comments

Comments

@phil-scott-78
Copy link
Contributor

phil-scott-78 commented Mar 2, 2025

Description

The BoolQ fails with an error in llama.cpp of GGML_ASSERT(!grammar->stacks.empty()) This happens with both the
GreedySamplingPipeline and also if I change it to DefaultSamplingPipeline

Easy fix is to just remove the Accept call

Reproduction Steps

Just give the sample a run. Any model, any settings will trigger it.

Environment & Configuration

  • Operating system: Windows 11
  • .NET runtime version: .NET 9
  • LLamaSharp version: master
  • CUDA version (if you are using cuda backend): 12
  • CPU & GPU device: NVidia 4080 Super

Known Workarounds

Remove the Accept call

@phil-scott-78
Copy link
Contributor Author

also, I was in here because I'm still trying to wrap my head around the KV slots with batching. I removed the small batches and reworked it so that if the KV gets exhausted we back off slowly the number of concurrent prompts and added a bit of a visualization using the KV cache debug view.

I'm still working through in my head the pros and cons of this approach vs the explicit creating of the batches before processing. My approach is pretty naive but functional - if I get a NoKvSlot I first remove the largest conversation and requeue then I reduce the max number of concurrent prompts.

Commit is here - phil-scott-78@b4219be

I could create a new example or update the existing BoolQ if there is any interest.

Screen.Recording.2025-03-02.132053.mp4

@martindevans
Copy link
Member

martindevans commented Mar 3, 2025

I had a hunch this was due to a bad grammar, so I rewrote it to:

root ::= (" ")? ("true" | "false" | "yes" | "no") [^\n]+ "\n"

i.e. allow any text at the end of the answer.

Then I rewrote the validation logic to:

var str = _decoder.Read().Trim().ToLowerInvariant();
var result = str.StartsWith("true") || str.StartsWith("yes");

This seems to workaround the issue. I'm not sure if this is a bug in llama.cpp, or if it just a malformed grammar.

Edit: Never mind, it got most of the way through the test set and then failed with the same assertion.

@phil-scott-78
Copy link
Contributor Author

Interesting, it's falling right away with the model I'm using (qwen 2.5 7b). I wonder if allowing extra whitespace is giving the successful runs more tokens which gives Accept a chance to succeed, but with the grammar or a model that doesn't add whitespace when allowed it only has the one token which leads to the sampler failing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants