-
Notifications
You must be signed in to change notification settings - Fork 11.9k
Wizard Coder 15b Support? #1901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
WizardCoder-15B-1.0.ggmlv3.q5_1.bin works fine for me using the starcoder ggml example: https://github.com/ggerganov/ggml/tree/master/examples/starcoder. Llama.cpp doesn't support it yet. |
but it is not llama.cpp ;) |
can anyone explain how to use other model such as WizardVicuna with this privateGPT is that model supported? |
I cannot make it work with starcoder.cpp. I downloaded the 4-bit ggml model from huggingface, but it gives ggml error. Error: More information: I have 16 GB RAM, and the model is about 11 GB, so it should probably be fit into the memory, if that was the issue? It may not be the place to ask, but now that you said you can run it, can you give me some help/reference what is going on? |
Are you monitoring memory use when you run starcoder? Running the 14.3GB Q5_1 with 32GB of ram:
From
seems pretty likely you are running out of memory. I dont think any of the mmap magic in llamacpp has made it into ggml yet. |
Thanks for the reply. Yes, the model does not fit into the memory, it seems. I assumed the model would fit into the Ram since it is smaller, but it seems that is not the case with ggml. Good to know, thanks! |
When would it be supported???? |
That model for coding is better than anything offline so far. |
Wizard Coder 15b is no more LLAMA family model. its graph has several different nodes than LLAMA models. |
@spikespiegel I cobbled together basic mmap (and gpu) support for the starcoder example if you'd like to test: There is probably something wrong with it, but it seems to run ok for me on a system with 16GB of ram. |
I have tried running the GGML version of it but it gives this error:
main -i --interactive-first -r "### Human:" --temp 0 -c 2048 -n -1 --repeat_penalty 1.2 --instruct --color --memory_f32 -m WizardCoder-15B-1.0.ggmlv3.q4_0.bin
main: build = 686 (ac3b886)
main: seed = 1686975019
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4050 Laptop GPU
llama.cpp: loading model from WizardCoder-15B-1.0.ggmlv3.q4_0.bin
error loading model: missing tok_embeddings.weight
llama_init_from_file: failed to load model
The text was updated successfully, but these errors were encountered: