-
Notifications
You must be signed in to change notification settings - Fork 373
Add support for new k-quants quantization format #301
Comments
Is this actually not supported yet? Either way, probably should keep this in mind: ggml-org/llama.cpp#1919 The solution there was to add checks in the quantization tool to prevent trying to quantize those tensors (or models) with k-quants but a project using GGML at a lower level would need to verify that the rows and columns are divisible by |
Is someone already on this? |
I think the first thing you'd need to do is to check if the llama.cpp binding llm depends on is compiling with k-quants. If it is, you probably don't have to do anything more than just add the k-quants types to enums where quantization types are currently listed. In the case of the quantization tool (and maybe loading models also) you'd have to check that the tensor rows/columns are divisible by As long as the binding part is against a new enough version of GGML and it's getting compiled with k-quants this might be a simple change. |
@KerfuffleV2 thanks for your input. I might be mistaken/misunderstanding, but I think llm only depends on |
Sorry, I might have been incorrect there. I thought I saw a pull a while back saying something like it was now depending on llama.cpp. You could possibly look at the approach I use here: https://github.com/KerfuffleV2/ggml-sys-bleedingedge - well, actually two approaches. One is just to build the old style way and include the k-quants stuff unless it's disabled by a feature. The other way is to use cmake to build which makes stuff like compiling with CUDA simpler (although it doesn't help with determining how to link). For the latter approach, I was able to get a |
@nightscape If we want to support the k-quants we probably have to wrap |
See #326 |
llama.cpp
now supports new k-quants quantizations which achieve good model perplexity even in high quantizations. See ggml-org/llama.cpp#1684 .We should also support these new quantization formats.
The text was updated successfully, but these errors were encountered: