[BUG]: BoolQ Example throws error #1120

phil-scott-78 · 2025-03-02T18:18:20Z

Description

The BoolQ fails with an error in llama.cpp of GGML_ASSERT(!grammar->stacks.empty()) This happens with both the
GreedySamplingPipeline and also if I change it to DefaultSamplingPipeline

Easy fix is to just remove the Accept call

Reproduction Steps

Just give the sample a run. Any model, any settings will trigger it.

Environment & Configuration

Operating system: Windows 11
.NET runtime version: .NET 9
LLamaSharp version: master
CUDA version (if you are using cuda backend): 12
CPU & GPU device: NVidia 4080 Super

Known Workarounds

Remove the Accept call

The text was updated successfully, but these errors were encountered:

phil-scott-78 · 2025-03-02T18:33:56Z

also, I was in here because I'm still trying to wrap my head around the KV slots with batching. I removed the small batches and reworked it so that if the KV gets exhausted we back off slowly the number of concurrent prompts and added a bit of a visualization using the KV cache debug view.

I'm still working through in my head the pros and cons of this approach vs the explicit creating of the batches before processing. My approach is pretty naive but functional - if I get a NoKvSlot I first remove the largest conversation and requeue then I reduce the max number of concurrent prompts.

Commit is here - phil-scott-78@b4219be

I could create a new example or update the existing BoolQ if there is any interest.

Screen.Recording.2025-03-02.132053.mp4

martindevans · 2025-03-03T00:07:18Z

I had a hunch this was due to a bad grammar, so I rewrote it to:

root ::= (" ")? ("true" | "false" | "yes" | "no") [^\n]+ "\n"

i.e. allow any text at the end of the answer.

Then I rewrote the validation logic to:

var str = _decoder.Read().Trim().ToLowerInvariant();
var result = str.StartsWith("true") || str.StartsWith("yes");

This seems to workaround the issue. I'm not sure if this is a bug in llama.cpp, or if it just a malformed grammar.

Edit: Never mind, it got most of the way through the test set and then failed with the same assertion.

phil-scott-78 · 2025-03-03T21:47:22Z

Interesting, it's falling right away with the model I'm using (qwen 2.5 7b). I wonder if allowing extra whitespace is giving the successful runs more tokens which gives Accept a chance to succeed, but with the grammar or a model that doesn't add whitespace when allowed it only has the one token which leads to the sampler failing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: BoolQ Example throws error #1120

[BUG]: BoolQ Example throws error #1120

phil-scott-78 commented Mar 2, 2025 •

edited

Loading

phil-scott-78 commented Mar 2, 2025

martindevans commented Mar 3, 2025 •

edited

Loading

phil-scott-78 commented Mar 3, 2025

[BUG]: BoolQ Example throws error #1120

[BUG]: BoolQ Example throws error #1120

Comments

phil-scott-78 commented Mar 2, 2025 • edited Loading

Description

Reproduction Steps

Environment & Configuration

Known Workarounds

phil-scott-78 commented Mar 2, 2025

martindevans commented Mar 3, 2025 • edited Loading

phil-scott-78 commented Mar 3, 2025

phil-scott-78 commented Mar 2, 2025 •

edited

Loading

martindevans commented Mar 3, 2025 •

edited

Loading