Closed
Description
Name and Version
built with cc (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0 for x86_64-linux-gnu
Operating systems
Linux
GGML backends
CPU
Hardware
13th Gen Intel(R) Core(TM) i9-13900H
Models
DeepSeek-V2-Lite-Q4_K_M
Problem description & steps to reproduce
Usage: ./llama-simple -m $Model_Path/DeepSeek-Coder-V2-Lite-Instruct-Q4_K_M.gguf
I used the git bisect tool to find out that after submitting 3d82dbc , the program does not work properly. And this feature is was introduced on #12332 . This directly caused my CPU to have an overflow error when calculating “ffn-moe-gate”.
Unfortunately, I am not familiar with this featrue.Could anyone fix this bug? @Srihari-mcw @ggerganov
First Bad Commit
Relevant log output
repack: repack tensor blk.0.attn_kv_a_mqa.weight with q4_K_8x8
repack: repack tensor blk.0.attn_kv_b.weight with q4_K_8x8
repack: repack tensor blk.0.attn_output.weight with q4_K_8x8
......
......
llama-simple: \~/workspace/github/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:7669: ggml_compute_forward_silu_f32: Assertion `!isinf(x)' failed.
Aborted (core dumped)