-
Notifications
You must be signed in to change notification settings - Fork 11.6k
Misc. bug: Something recently has broken the -ot
option to override model tensor buffers - causes CUDA crash
#12798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Does it work if you build with cuda graphs disabled ( |
Yeah, this fixes it - thanks! Tested on both |
@agray3 any ideas? I think some pointers are not being updated correctly. I can reproduce this reliably with deepseek-v2-lite when running with This case should disable CUDA graphs completely, since there are multiple different graphs. |
Hi, I’m on vacation at the moment - will take a look and work out a fix when I’m back at my laptop late this week. Sent from my phoneOn 7 Apr 2025, at 15:47, Diego Devesa ***@***.***> wrote:
@agray3 any ideas? I think some pointers are not being updated correctly. I can reproduce this reliably with deepseek-v2-lite when running with compute-sanitizer.—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
slaren left a comment (ggml-org/llama.cpp#12798)
@agray3 any ideas? I think some pointers are not being updated correctly. I can reproduce this reliably with deepseek-v2-lite when running with compute-sanitizer.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
imatrix computation without any special arguments seams to be affected by this issue as well. Compiling llama.cpp using Local imatrix computation:
imatrix computation using an RPC server:
|
The issue is that the new @slaren this case was previously using CUDA graphs, but only for the first few tokens and then the |
Thanks @agray3. It definitely would be better to support these nodes since some models use |
@slaren we already had reports of regressions from users due to this - I've now made the tweaks to re-enable CUDA graphs for these node types at #12970 |
Name and Version
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-cli, llama-server
Command line
Problem description & steps to reproduce
Something recently seems to have broken the option to override model tensor buffers added in #11397:
It successfully processes the prompt, seems to write a single token and then crashes with this:
BF16
version ofdeepseek-v2-lite
and it gets the same problem.Q8_0
ofdeepseek-r1
and it gets the same problem.First Bad Commit
Unsure, but recent.
Relevant log output
Using:
gives this as last few sections:
The text was updated successfully, but these errors were encountered: