Skip to content

Misc. bug: Quantizing Olmo models with imatrix failing on some sizes #11764

@bartowski1182

Description

@bartowski1182

Name and Version

version 4585

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-quantize

Command line

./llama-quantize --imatrix /models/OLMo-2-1124-7B-Instruct-GGUF/allenai_OLMo-2-1124-7B-Instruct.imatrix /models/OLMo-2-1124-7B-Instruct-GGUF/allenai_OLMo-2-1124-7B-Instruct-f32.gguf /models/OLMo-2-1124-7B-Instruct-GGUF/allenai_OLMo-2-1124-7B-Instruct-Q5_K_M.gguf Q5_K_M

Problem description & steps to reproduce

Without imatrix I don't get any issues.

Quantizing OLMo-2 7B to Q5_K_M, Q5_K_S, Q4_K_M, and Q4_K_S, and Q2_K with imatrix results in:

blk.7.attn_q.weight - [ 4096,  4096,     1,     1], type =    f32, converting to q4_K .. ggml_validate_row_data: found nan value at block 48
ggml_validate_row_data: found nan value at block 16
blk.7.attn_q.weight - [ 4096,  4096,     1,     1], type =    f32, converting to q5_K .. ggml_validate_row_data: found nan value at block 48
ggml_validate_row_data: found nan value at block 16
blk.7.attn_q.weight - [ 4096,  4096,     1,     1], type =    f32, converting to q2_K .. ggml_validate_row_data: found nan value at block 48
ggml_validate_row_data: found nan value at block 16

All other sizes quantize without issue..

Additionally, the 13B model fails in a different way on IQ2_M and IQ2_S:

[  95/ 443]                  blk.8.attn_q.weight - [ 5120,  5120,     1,     1], type =    f32, converting to iq2_xs .. /llama.cpp/ggml/src/ggml-quants.c:3279: fatal error
Oops: found point 4 not on grid: 0 1 0 0 0 0 0 0
libggml-base.so(+0x159cb)[0x72a78fe039cb]
libggml-base.so(ggml_abort+0x15f)[0x72a78fe03d6f]
libggml-base.so(+0x3bcbb)[0x72a78fe29cbb]
libggml-base.so(quantize_iq2_xs+0x81)[0x72a78fe45691]
libggml-base.so(ggml_quantize_chunk+0x371)[0x72a78fe12431]
libllama.so(+0xeaa35)[0x72a78ffaca35]
libllama.so(llama_model_quantize+0xf4)[0x72a78ffae094]
./llama-quantize(+0x17d6a)[0x60055e597d6a]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x72a78f8b5d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x72a78f8b5e40]
./llama-quantize(+0x18c25)[0x60055e598c25]

All other sizes have no issues

I've uploaded both F32 conversions as well as imatrix files here:

https://huggingface.co/bartowski/PleaseIgnore_uploaded_for_testing

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions