-
Notifications
You must be signed in to change notification settings - Fork 13k
Description
Name and Version
./build/bin/llama-cli --version
version: 6150 (b3e1666)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for aarch64-linux-gnu
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-bench
Command line
llama-bench -fa 0 -n 0 -p 512 -r 50 -m {qwen2_based_model} -ngl 99
Problem description & steps to reproduce
Description
When running llama-bench
on Ascend 910B NPU (in autodl container), the CANN kernel crashes in ggml_cann_rms_norm
.
Error code: 507035.
Error message: EZ9999 The error from device(chipId:1, dieId:0), serial number is 112, there is an aivec error exception, core id is 28...
Error message indicates that it should be a kernel operator crash. After switching between the history commits on ggml-cann, I found that the problem first occurs in #14002, which replaced ggml_cann_async_memset
with aclnnInplaceZero
to create zero tensors.
After debugging, I found that in ggml_cann_rms_norm
, there might be a memory issue when setting tensor acl_rstd to 0.
size_t zero_tensor_n_bytes =
src->ne[1] * src->ne[2] * src->ne[3] * ggml_element_size(src);
ggml_cann_pool_alloc zero_tensor_allocator(ctx.pool(), zero_tensor_n_bytes);
aclTensor* acl_rstd =
aclnn_zero(ctx, zero_tensor_allocator.get(), zero_tensor_n_bytes,
src->ne, GGML_MAX_DIMS, ggml_cann_type_mapping(src->type),
ggml_element_size(src));
acl_rstd allocs src->ne[1] * src->ne[2] * src->ne[3] * ggml_element_size(src)
size buffer, but has {src->ne[0], src->ne[1], src->ne[2], src->ne[3]}
in shape. I suppose it cause the kernel to crash.
Steps to Reproduce
pp512 test for example
cmake -B build -DCMAKE_BUILD_TYPE=Debug -DGGML_CANN=ON
cmake --build build --config Debug
./build/bin/llama-bench -fa 0 -n 0 -p 512 -r 50 -m (your qwen2 model path) -ngl 99
Others
I found llama-cli
and test-backend-ops
seems not affected by this issue.
First Bad Commit
Relevant log output
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
0x0000ffffb04a6800 in __GI___wait4 (pid=<optimized out>, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30 ../sysdeps/unix/sysv/linux/wait4.c: No such file or directory.
#0 0x0000ffffb04a6800 in __GI___wait4 (pid=<optimized out>, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30 in ../sysdeps/unix/sysv/linux/wait4.c
#1 0x0000ffffb08f66fc in ggml_print_backtrace () at /root/llama.cpp/ggml/src/ggml.c:196
196 waitpid(child_pid, NULL, 0);
#2 0x0000ffffb08f68d4 in ggml_abort (file=0xffffb0a4ceb0 "/root/llama.cpp/ggml/src/ggml-cann/ggml-cann.cpp", line=69, fmt=0xffffb0a4cea0 "CANN error") at /root/llama.cpp/ggml/src/ggml.c:230
230 ggml_print_backtrace();
#3 0x0000ffffb0a2ef94 in ggml_cann_error (stmt=0xffffb0a4e7e0 "aclrtSynchronizeStream(cann_ctx->stream())", func=0xffffb0a4e7c0 "ggml_backend_cann_synchronize", file=0xffffb0a4ceb0 "/root/llama.cpp/ggml/src/ggml-cann/ggml-cann.cpp", line=2075, msg=0xaaab114ea528 "EZ9999: Inner Error!\nEZ9999: [PID: 165866] 2025-08-15-03:15:49.571.649 The error from device(chipId:1, dieId:0), serial number is 112, there is an aivec error exception, core id is 28, error code = 0x"...) at /root/llama.cpp/ggml/src/ggml-cann/ggml-cann.cpp:69
69 GGML_ABORT("CANN error");
#4 0x0000ffffb0a34e3c in ggml_backend_cann_synchronize (backend=0xaaab0e4994e0) at /root/llama.cpp/ggml/src/ggml-cann/ggml-cann.cpp:2075
2075 ACL_CHECK(aclrtSynchronizeStream(cann_ctx->stream()));
#5 0x0000ffffb090f480 in ggml_backend_synchronize (backend=0xaaab0e4994e0) at /root/llama.cpp/ggml/src/ggml-backend.cpp:306
306 backend->iface.synchronize(backend);
#6 0x0000ffffb0913d00 in ggml_backend_sched_synchronize (sched=0xaaab0e4a1860) at /root/llama.cpp/ggml/src/ggml-backend.cpp:1595
1595 ggml_backend_synchronize(sched->backends[i]);
#7 0x0000ffffb10672e4 in llama_context::synchronize (this=0xaaab0c10f3c0) at /root/llama.cpp/src/llama-context.cpp:374
374 ggml_backend_sched_synchronize(sched.get());
#8 0x0000ffffb106e4c4 in llama_synchronize (ctx=0xaaab0c10f3c0) at /root/llama.cpp/src/llama-context.cpp:2400
2400 ctx->synchronize();
#9 0x0000aaaad615f5bc in test_prompt (ctx=0xaaab0c10f3c0, n_prompt=512, n_batch=2048, n_threads=192) at /root/llama.cpp/tools/llama-bench/llama-bench.cpp:1787
1787 llama_synchronize(ctx);
#10 0x0000aaaad615ff30 in main (argc=13, argv=0xffffcdc95ba8) at /root/llama.cpp/tools/llama-bench/llama-bench.cpp:1955
1955 bool res = test_prompt(ctx, t.n_prompt, t.n_batch, t.n_threads);
[Inferior 1 (process 165866) detached]
Aborted (core dumped)