Skip to content

Misc. bug: ggml_cann_rms_norm causes CANN kernel crash when running llama-bench. #15330

@yuchuan-cao

Description

@yuchuan-cao

Name and Version

./build/bin/llama-cli --version
version: 6150 (b3e1666)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for aarch64-linux-gnu

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-bench

Command line

llama-bench -fa 0 -n 0 -p 512 -r 50 -m {qwen2_based_model} -ngl 99

Problem description & steps to reproduce

Description

When running llama-bench on Ascend 910B NPU (in autodl container), the CANN kernel crashes in ggml_cann_rms_norm.

Error code: 507035.
Error message: EZ9999 The error from device(chipId:1, dieId:0), serial number is 112, there is an aivec error exception, core id is 28...

Error message indicates that it should be a kernel operator crash. After switching between the history commits on ggml-cann, I found that the problem first occurs in #14002, which replaced ggml_cann_async_memset with aclnnInplaceZero to create zero tensors.

After debugging, I found that in ggml_cann_rms_norm, there might be a memory issue when setting tensor acl_rstd to 0.

size_t zero_tensor_n_bytes =
        src->ne[1] * src->ne[2] * src->ne[3] * ggml_element_size(src);
ggml_cann_pool_alloc zero_tensor_allocator(ctx.pool(), zero_tensor_n_bytes);
aclTensor* acl_rstd =
        aclnn_zero(ctx, zero_tensor_allocator.get(), zero_tensor_n_bytes,
                   src->ne, GGML_MAX_DIMS, ggml_cann_type_mapping(src->type),
                   ggml_element_size(src));

acl_rstd allocs src->ne[1] * src->ne[2] * src->ne[3] * ggml_element_size(src) size buffer, but has {src->ne[0], src->ne[1], src->ne[2], src->ne[3]} in shape. I suppose it cause the kernel to crash.

Steps to Reproduce

pp512 test for example

cmake -B build -DCMAKE_BUILD_TYPE=Debug -DGGML_CANN=ON 
cmake --build build --config Debug
./build/bin/llama-bench -fa 0 -n 0 -p 512 -r 50 -m (your qwen2 model path) -ngl 99

Others

I found llama-cli and test-backend-ops seems not affected by this issue.

First Bad Commit

#14002

Relevant log output

Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
0x0000ffffb04a6800 in __GI___wait4 (pid=<optimized out>, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30      ../sysdeps/unix/sysv/linux/wait4.c: No such file or directory.
#0  0x0000ffffb04a6800 in __GI___wait4 (pid=<optimized out>, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30      in ../sysdeps/unix/sysv/linux/wait4.c
#1  0x0000ffffb08f66fc in ggml_print_backtrace () at /root/llama.cpp/ggml/src/ggml.c:196
196             waitpid(child_pid, NULL, 0);
#2  0x0000ffffb08f68d4 in ggml_abort (file=0xffffb0a4ceb0 "/root/llama.cpp/ggml/src/ggml-cann/ggml-cann.cpp", line=69, fmt=0xffffb0a4cea0 "CANN error") at /root/llama.cpp/ggml/src/ggml.c:230
230             ggml_print_backtrace();
#3  0x0000ffffb0a2ef94 in ggml_cann_error (stmt=0xffffb0a4e7e0 "aclrtSynchronizeStream(cann_ctx->stream())", func=0xffffb0a4e7c0 "ggml_backend_cann_synchronize", file=0xffffb0a4ceb0 "/root/llama.cpp/ggml/src/ggml-cann/ggml-cann.cpp", line=2075, msg=0xaaab114ea528 "EZ9999: Inner Error!\nEZ9999: [PID: 165866] 2025-08-15-03:15:49.571.649 The error from device(chipId:1, dieId:0), serial number is 112, there is an aivec error exception, core id is 28, error code = 0x"...) at /root/llama.cpp/ggml/src/ggml-cann/ggml-cann.cpp:69
69          GGML_ABORT("CANN error");
#4  0x0000ffffb0a34e3c in ggml_backend_cann_synchronize (backend=0xaaab0e4994e0) at /root/llama.cpp/ggml/src/ggml-cann/ggml-cann.cpp:2075
2075        ACL_CHECK(aclrtSynchronizeStream(cann_ctx->stream()));
#5  0x0000ffffb090f480 in ggml_backend_synchronize (backend=0xaaab0e4994e0) at /root/llama.cpp/ggml/src/ggml-backend.cpp:306
306         backend->iface.synchronize(backend);
#6  0x0000ffffb0913d00 in ggml_backend_sched_synchronize (sched=0xaaab0e4a1860) at /root/llama.cpp/ggml/src/ggml-backend.cpp:1595
1595            ggml_backend_synchronize(sched->backends[i]);
#7  0x0000ffffb10672e4 in llama_context::synchronize (this=0xaaab0c10f3c0) at /root/llama.cpp/src/llama-context.cpp:374
374         ggml_backend_sched_synchronize(sched.get());
#8  0x0000ffffb106e4c4 in llama_synchronize (ctx=0xaaab0c10f3c0) at /root/llama.cpp/src/llama-context.cpp:2400
2400        ctx->synchronize();
#9  0x0000aaaad615f5bc in test_prompt (ctx=0xaaab0c10f3c0, n_prompt=512, n_batch=2048, n_threads=192) at /root/llama.cpp/tools/llama-bench/llama-bench.cpp:1787
1787        llama_synchronize(ctx);
#10 0x0000aaaad615ff30 in main (argc=13, argv=0xffffcdc95ba8) at /root/llama.cpp/tools/llama-bench/llama-bench.cpp:1955
1955                    bool res = test_prompt(ctx, t.n_prompt, t.n_batch, t.n_threads);
[Inferior 1 (process 165866) detached]
Aborted (core dumped)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions