Closed
Description
When trying to run the 13B model the following output is given:
main: seed = 1678543550
llama_model_load: loading model from './models/13B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx = 512
llama_model_load: n_embd = 5120
llama_model_load: n_mult = 256
llama_model_load: n_head = 40
llama_model_load: n_layer = 40
llama_model_load: n_rot = 128
llama_model_load: f16 = 2
llama_model_load: n_ff = 13824
llama_model_load: ggml ctx size = 8559.49 MB
llama_model_load: memory_size = 800.00 MB, n_mem = 20480
llama_model_load: tensor 'tok_embeddings.weight' has wrong size in model file
main: failed to load model from './models/13B/ggml-model-q4_0.bin'
I have followed the commands in the readme to quantize the model, i.e.:
python3 convert-pth-to-ggml.py models/13B/ 1
./quantize ./models/13B/ggml-model-f16.bin ./models/13B/ggml-model-q4_0.bin 2
./quantize ./models/13B/ggml-model-f16.bin.1 ./models/13B/ggml-model-q4_0.bin.1 2
I am using a M1 MacBook Pro. Any thoughts on how to resolve this issue?
Metadata
Metadata
Assignees
Labels
Type
Projects
Milestone
Relationships
Development
No branches or pull requests
Activity
shaunabanana commentedon Mar 11, 2023
I just realized that I had been using binaries (
quantize
andmain
) compiled from a previous version. Recompiling solved the issue. Thank you for your awesome work!Merge pull request ggml-org#14 from hypnopump/update_macos
Merge pull request ggml-org#14 from anon998/do-completion-update
Update LICENSE and TODOs in README (ggml-org#14)