Skip to content

Eval bug: Program not working properly due to new features of "repack Q4_K tensor" #12528

Closed
@Yangxiaoz

Description

@Yangxiaoz

Name and Version

built with cc (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0 for x86_64-linux-gnu

Operating systems

Linux

GGML backends

CPU

Hardware

13th Gen Intel(R) Core(TM) i9-13900H

Models

DeepSeek-V2-Lite-Q4_K_M

Problem description & steps to reproduce

Usage: ./llama-simple -m $Model_Path/DeepSeek-Coder-V2-Lite-Instruct-Q4_K_M.gguf

I used the git bisect tool to find out that after submitting 3d82dbc , the program does not work properly. And this feature is was introduced on #12332 . This directly caused my CPU to have an overflow error when calculating “ffn-moe-gate”.
Unfortunately, I am not familiar with this featrue.Could anyone fix this bug? @Srihari-mcw @ggerganov

First Bad Commit

3d82dbc

Relevant log output

repack: repack tensor blk.0.attn_kv_a_mqa.weight with q4_K_8x8
repack: repack tensor blk.0.attn_kv_b.weight with q4_K_8x8
repack: repack tensor blk.0.attn_output.weight with q4_K_8x8
......
......

llama-simple: \~/workspace/github/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:7669: ggml_compute_forward_silu_f32: Assertion `!isinf(x)' failed.
Aborted (core dumped)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions