-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Mistral 7b crashes permanently with GPU #1326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
As I just noted on #1319, I'm still seeing errors which I think are related to that bug even in v0.2.59. |
Thanks a lot for your feedback! I will look into this. |
@rsoika thanks, I'll keep this open, just trying to repro now. Questoin about the log you linked to
So is the segfault issue resolved but now it's outputing invalid values in the logprobs? |
@abetlen I did not linked a log file. At the moment I just added the
And this seems to solve all problems. I run my app in a Docker image with the following build script:
If this may help you? |
Thanks that also helps! And I tagged you by mistake, sorry about that! I meant the log @riedgar-ms posted in the other issue. |
ok, finally I also cleaned up my Dockerfile and I do in deed only build the llama-cpp-python code for my GPU. No other additional libs are needed - all is included in nvidia/cuda image So I think this is how a minimalist Dockerfile should look like:
|
I've added llama_cpp.Llama(model_path=model, logits_all=True, **kwargs) However, on Windows and MacOS, I'm getting AccessViolation/Segfault. Both of those are on Python 3.12. Ubuntu is not sefaulting, but
I'm currently running the Ubuntu test on Python 3.12, so see if that does the same thing. |
Update: Ubuntu on Python 3.12 gives the same "probability contains inf, nan or <0" error as Ubuntu on Python 3.10 |
Facing similar issues with Command R+ & Miqu on a GPU offload setup. On Python 3.11 oobabooga, getting the above probability contains inf, nan or <0 after initial prompt eval too. but somehow, it works if i retry. to be exact, nan error, then it works, then nan error again, then it works, in an alternating pattern if i keep sending new messages (regenerating current message doesnt seem to run into any issues). once in a while, it segfaults instead. EDIT: the logits all workaround works but increases the vram usage for context significantly. EDIT 2: oobabooga/text-generation-webui@3e3a7c4 interesting commit. normally i would investigate further or provide more detailed logs but ivent time |
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
I run into a problem running llama-cpp-python with Mistral 7b with GPU/CUDA.
Onyl when I use small prompts like in the following example my
mistral-7b-instruct-v0.2.Q4_K_M.gguf
model worksOutcome:
Current Behavior
But if I try more complex prompts the model crashes with:
Than the only solution seems to reduce the param
n_gpu_layers
from a value of 30 to only 10. Also other parameters liken_ctx
andn_batch
can cause a crash.This all only happens when I use the GPU. Without GPU the programm runs slow but without any chrashes.
Environment and Context
Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.
My hardware is a CPU Intel Core i7-7700 + GeForce GTX 1080. My programm runs in a Docker container based on nvidia/cuda:12.1.1-devel-ubuntu22.04
$ lscpu
$ uname -a
Linux imixs-ai 6.1.0-18-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.76-1 (2024-02-01) x86_64 GNU/Linux
Failure Information (for bugs)
Please help provide information about the failure if this is a bug. If it is not a bug, please remove the rest of this template.
How Can I provide more useful information about the crash?
The text was updated successfully, but these errors were encountered: