-
Notifications
You must be signed in to change notification settings - Fork 11.6k
Eval bug: Command A only outputs 88888888 with -fa #12441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I can confirm that as soon as Command A is run with flash attention enabled, the model only outputs a sequence of 8s. Tried various other parameters and different quantizations, but as soon as "-fa" is introduced, the issue presents itself. |
Can you confirm that #12688 fixes the issue? |
@ggerganov I tried the fix with the Q4KL-GGUF from Bartowski. It does produce text now, but only gibberish: ![]() |
Pushed a fix - does it work now? |
I can confirm the fix works now. Thanks for your quick responses! |
Name and Version
version: 4908 (a53f7f7)
built with Apple clang version 15.0.0 (clang-1500.3.9.4) for arm64-apple-darwin24.3.0
Operating systems
Mac
GGML backends
Metal
Hardware
M3 Max
Models
https://huggingface.co/bartowski/CohereForAI_c4ai-command-a-03-2025-GGUF/tree/main/CohereForAI_c4ai-command-a-03-2025-Q6_K
https://huggingface.co/lmstudio-community/c4ai-command-a-03-2025-GGUF/blob/main/c4ai-command-a-03-2025-Q6_K-00001-of-00003.gguf
Problem description & steps to reproduce
./llama-cli -m CohereForAI_c4ai-command-a-03-2025-Q6_K-00001-of-00003.gguf --no-mmap -fa -c 1024 -n 64 -if -no-cnv --in-prefix "<|START_OF_TURN_TOKEN|><|USER_TOKEN|>" --in-suffix "<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>"
First Bad Commit
No response
Relevant log output
The text was updated successfully, but these errors were encountered: