Skip to content

[Feature request?]: Running larger models without quantization. #118

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
acheong08 opened this issue Mar 14, 2023 · 6 comments
Closed

[Feature request?]: Running larger models without quantization. #118

acheong08 opened this issue Mar 14, 2023 · 6 comments

Comments

@acheong08
Copy link

Current error

[1]    11624 segmentation fault (core dumped)  ./llama -m ./models/13B/ggml-model-f16.bin -p  -t 8 --temp 0.5 --top_p 1 
@acheong08
Copy link
Author

Everything works when quantized.

@gjmulder
Copy link
Collaborator

Current error

[1]    11624 segmentation fault (core dumped)  ./llama -m ./models/13B/ggml-model-f16.bin -p  -t 8 --temp 0.5 --top_p 1 

Check whether you have enough memory:

Size Precision Approx. RAM req.
13B fp16 25GB
30B fp16 62GB
65B fp16 122GB

You can add swap on Linux, but even using NVME it will be very slow due to the random access patterns of the code.

@acheong08
Copy link
Author

Even my primary memory + swap is not enough.

@gjmulder
Copy link
Collaborator

As per the comments from @MarkSchmidty in issue #53 use the 4bit quantized models to reduce your memory requirements by approx. 4X. The loss in quality should be negligible.

@acheong08
Copy link
Author

Yes. The quantized models work perfectly for me. I just wanted to test out the other versions to check out how they perform

@gjmulder
Copy link
Collaborator

Slowly 😄

If there's a specific model size and prompt you need I can compare the 4bit to the f16. I'm currently exploring different option permutations in issue #69

Deadsg pushed a commit to Deadsg/llama.cpp that referenced this issue Dec 19, 2023
Fix UnicodeDecodeError permanently
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants