2-bit integer quantization 

Add `Q2_0` and `Q2_1` quantization support to `ggml`:

- Follow the existing `Q4_0` and `Q4_1` implementations
- Implement [reference scalar quantization and dequantization routines](https://github.com/ggerganov/llama.cpp/blob/3cd8dde0d1357b7f11bdd25c45d5bf5e97e284a0/ggml.c#L407-L449)
- I suspect we might have to use `QK == 16` in this case to compensate for further accuracy losses
- Add SIMD support for a specific architecture - investigate best strategy to perform the `ggml_vec_dot_q2()` computation
- No need to implement `ggml_vec_mad_q2()` - these will be deprecated soon
- Compute perplexity scores

The expected model sizes for 7B and `QK == 16` are:

- `Q2_0` - 3.2 GB

For `QK == 32` we have:

- `Q2_0` - 2.4 GB
- `Q2_1` - 3.2 GB

Before you send me papers that show 2-bit quantization does not work - no need. I want to have this supported anyway. I have something in mind. The efforts needed to add this support are so small that there is no reason not to do it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

2-bit integer quantization #456

3 remaining items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

2-bit integer quantization #456

Description

Activity

dakennedyd commented on Mar 24, 2023

ggerganov commented on Mar 24, 2023

Green-Sky commented on Mar 24, 2023

prusnak commented on Mar 24, 2023

sw commented on Mar 25, 2023

sw commented on Mar 27, 2023

CamiloMM commented on Mar 31, 2023

Interpause commented on Apr 2, 2023

prusnak commented on Apr 2, 2023

Lolagatorade commented on Apr 12, 2023

3 remaining items

ggerganov commented on Jun 24, 2023

MrMage commented on Jun 26, 2023

Green-Sky commented on Jun 26, 2023

neelr commented on Nov 17, 2023

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions