Add support for new k-quants quantization format #301

LLukas22 · 2023-06-07T07:20:07Z

llama.cpp now supports new k-quants quantizations which achieve good model perplexity even in high quantizations. See ggml-org/llama.cpp#1684 .

We should also support these new quantization formats.

The text was updated successfully, but these errors were encountered:

KerfuffleV2 · 2023-06-21T12:46:28Z

Is this actually not supported yet?

Either way, probably should keep this in mind: ggml-org/llama.cpp#1919

The solution there was to add checks in the quantization tool to prevent trying to quantize those tensors (or models) with k-quants but a project using GGML at a lower level would need to verify that the rows and columns are divisible by QK_K.

nightscape · 2023-06-22T07:49:04Z

Is someone already on this?
I'm a Rust noob and would probably have to chicken out when it gets complicated, but I could at least start with it 😃
I would start by taking all #ifdef GGML_USE_K_QUANTS from llama.cpp, have GPT4 convert them to Rust and try to fit them in the llm codebase.

KerfuffleV2 · 2023-06-22T07:56:38Z

I think the first thing you'd need to do is to check if the llama.cpp binding llm depends on is compiling with k-quants. If it is, you probably don't have to do anything more than just add the k-quants types to enums where quantization types are currently listed. In the case of the quantization tool (and maybe loading models also) you'd have to check that the tensor rows/columns are divisible by QK_K (you can possibly just hardcode 256 there).

As long as the binding part is against a new enough version of GGML and it's getting compiled with k-quants this might be a simple change.

nightscape · 2023-06-22T08:14:39Z

@KerfuffleV2 thanks for your input. I might be mistaken/misunderstanding, but I think llm only depends on ggml.c and reimplements the file loading from llama.cpp.
At least I'm finding places where the quantization types are listed as enums or constants.
@LLukas22 @philpax could you enlighten us?

KerfuffleV2 · 2023-06-22T08:26:08Z

I might be mistaken/misunderstanding, but I think llm only depends on ggml.c

Sorry, I might have been incorrect there. I thought I saw a pull a while back saying something like it was now depending on llama.cpp.

You could possibly look at the approach I use here: https://github.com/KerfuffleV2/ggml-sys-bleedingedge - well, actually two approaches. One is just to build the old style way and include the k-quants stuff unless it's disabled by a feature. The other way is to use cmake to build which makes stuff like compiling with CUDA simpler (although it doesn't help with determining how to link). For the latter approach, I was able to get a ggml_static target added so you actually don't have to link against llama.cpp when you only need ggml (but it will require linking with C++ because of BLAS stuff).

LLukas22 · 2023-06-22T12:27:38Z

@nightscape If we want to support the k-quants we probably have to wrap k_quants.h in a similar way to ggml.h (see here) and we probably have to compile ggml with activated k_quants in the build.rs file. If thats implemented we have to extend our enums and extend the quantization logic to call the correct wrapped functions.

nightscape · 2023-06-22T15:09:56Z

See #326

LLukas22 added issue:enhancement New feature or request meta:help-wanted Extra attention is needed labels Jun 7, 2023

louisgv mentioned this issue Jun 8, 2023

Can not display the out put of model on windows. Please check it louisgv/local.ai#6

Closed

LLukas22 mentioned this issue Jun 19, 2023

Support new quantization formats #321

Closed

nightscape mentioned this issue Jun 22, 2023

K-quants #326

Merged

philpax closed this as completed in #326 Jun 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for new k-quants quantization format #301

Add support for new k-quants quantization format #301

LLukas22 commented Jun 7, 2023

KerfuffleV2 commented Jun 21, 2023

Uh oh!

nightscape commented Jun 22, 2023

Uh oh!

KerfuffleV2 commented Jun 22, 2023

Uh oh!

nightscape commented Jun 22, 2023

Uh oh!

KerfuffleV2 commented Jun 22, 2023

Uh oh!

LLukas22 commented Jun 22, 2023

Uh oh!

nightscape commented Jun 22, 2023

Uh oh!

Add support for new k-quants quantization format #301

Add support for new k-quants quantization format #301

Comments

LLukas22 commented Jun 7, 2023

KerfuffleV2 commented Jun 21, 2023

Uh oh!

nightscape commented Jun 22, 2023

Uh oh!

KerfuffleV2 commented Jun 22, 2023

Uh oh!

nightscape commented Jun 22, 2023

Uh oh!

KerfuffleV2 commented Jun 22, 2023

Uh oh!

LLukas22 commented Jun 22, 2023

Uh oh!

nightscape commented Jun 22, 2023

Uh oh!