Misc. bug: ggml_cann_rms_norm causes CANN kernel crash when running llama-bench.

### Name and Version

./build/bin/llama-cli --version
version: 6150 (b3e16665)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for aarch64-linux-gnu

### Operating systems

Linux

### Which llama.cpp modules do you know to be affected?

llama-bench

### Command line

```shell
llama-bench -fa 0 -n 0 -p 512 -r 50 -m {qwen2_based_model} -ngl 99
```

### Problem description & steps to reproduce

## Description
When running `llama-bench` on Ascend 910B NPU (in autodl container), the CANN kernel crashes in `ggml_cann_rms_norm`. 

**Error code**: 507035. 
**Error message**: EZ9999 The error from device(chipId:1, dieId:0), serial number is 112, there is an aivec error exception, core id is 28...

Error message indicates that it should be a kernel operator crash. After switching between the history commits on ggml-cann, I found that the problem first occurs in https://github.com/ggml-org/llama.cpp/pull/14002, which replaced `ggml_cann_async_memset` with `aclnnInplaceZero` to create zero tensors. 

After debugging, I found that in `ggml_cann_rms_norm`, there might be a memory issue when setting tensor **acl_rstd** to 0. 

```
size_t zero_tensor_n_bytes =
        src->ne[1] * src->ne[2] * src->ne[3] * ggml_element_size(src);
ggml_cann_pool_alloc zero_tensor_allocator(ctx.pool(), zero_tensor_n_bytes);
aclTensor* acl_rstd =
        aclnn_zero(ctx, zero_tensor_allocator.get(), zero_tensor_n_bytes,
                   src->ne, GGML_MAX_DIMS, ggml_cann_type_mapping(src->type),
                   ggml_element_size(src));
```
**acl_rstd** allocs `src->ne[1] * src->ne[2] * src->ne[3] * ggml_element_size(src)` size buffer, but has `{src->ne[0], src->ne[1], src->ne[2], src->ne[3]}` in shape. I suppose it cause the kernel to crash. 

## Steps to Reproduce
pp512 test for example
```
cmake -B build -DCMAKE_BUILD_TYPE=Debug -DGGML_CANN=ON 
cmake --build build --config Debug
./build/bin/llama-bench -fa 0 -n 0 -p 512 -r 50 -m (your qwen2 model path) -ngl 99
```

## Others
I found `llama-cli` and `test-backend-ops` seems not affected by this issue.

### First Bad Commit

[#14002](https://github.com/ggml-org/llama.cpp/pull/14002)


### Relevant log output

```shell
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
0x0000ffffb04a6800 in __GI___wait4 (pid=<optimized out>, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30      ../sysdeps/unix/sysv/linux/wait4.c: No such file or directory.
#0  0x0000ffffb04a6800 in __GI___wait4 (pid=<optimized out>, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30      in ../sysdeps/unix/sysv/linux/wait4.c
#1  0x0000ffffb08f66fc in ggml_print_backtrace () at /root/llama.cpp/ggml/src/ggml.c:196
196             waitpid(child_pid, NULL, 0);
#2  0x0000ffffb08f68d4 in ggml_abort (file=0xffffb0a4ceb0 "/root/llama.cpp/ggml/src/ggml-cann/ggml-cann.cpp", line=69, fmt=0xffffb0a4cea0 "CANN error") at /root/llama.cpp/ggml/src/ggml.c:230
230             ggml_print_backtrace();
#3  0x0000ffffb0a2ef94 in ggml_cann_error (stmt=0xffffb0a4e7e0 "aclrtSynchronizeStream(cann_ctx->stream())", func=0xffffb0a4e7c0 "ggml_backend_cann_synchronize", file=0xffffb0a4ceb0 "/root/llama.cpp/ggml/src/ggml-cann/ggml-cann.cpp", line=2075, msg=0xaaab114ea528 "EZ9999: Inner Error!\nEZ9999: [PID: 165866] 2025-08-15-03:15:49.571.649 The error from device(chipId:1, dieId:0), serial number is 112, there is an aivec error exception, core id is 28, error code = 0x"...) at /root/llama.cpp/ggml/src/ggml-cann/ggml-cann.cpp:69
69          GGML_ABORT("CANN error");
#4  0x0000ffffb0a34e3c in ggml_backend_cann_synchronize (backend=0xaaab0e4994e0) at /root/llama.cpp/ggml/src/ggml-cann/ggml-cann.cpp:2075
2075        ACL_CHECK(aclrtSynchronizeStream(cann_ctx->stream()));
#5  0x0000ffffb090f480 in ggml_backend_synchronize (backend=0xaaab0e4994e0) at /root/llama.cpp/ggml/src/ggml-backend.cpp:306
306         backend->iface.synchronize(backend);
#6  0x0000ffffb0913d00 in ggml_backend_sched_synchronize (sched=0xaaab0e4a1860) at /root/llama.cpp/ggml/src/ggml-backend.cpp:1595
1595            ggml_backend_synchronize(sched->backends[i]);
#7  0x0000ffffb10672e4 in llama_context::synchronize (this=0xaaab0c10f3c0) at /root/llama.cpp/src/llama-context.cpp:374
374         ggml_backend_sched_synchronize(sched.get());
#8  0x0000ffffb106e4c4 in llama_synchronize (ctx=0xaaab0c10f3c0) at /root/llama.cpp/src/llama-context.cpp:2400
2400        ctx->synchronize();
#9  0x0000aaaad615f5bc in test_prompt (ctx=0xaaab0c10f3c0, n_prompt=512, n_batch=2048, n_threads=192) at /root/llama.cpp/tools/llama-bench/llama-bench.cpp:1787
1787        llama_synchronize(ctx);
#10 0x0000aaaad615ff30 in main (argc=13, argv=0xffffcdc95ba8) at /root/llama.cpp/tools/llama-bench/llama-bench.cpp:1955
1955                    bool res = test_prompt(ctx, t.n_prompt, t.n_batch, t.n_threads);
[Inferior 1 (process 165866) detached]
Aborted (core dumped)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: ggml_cann_rms_norm causes CANN kernel crash when running llama-bench. #15330

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

Description

Steps to Reproduce

Others

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: ggml_cann_rms_norm causes CANN kernel crash when running llama-bench. #15330

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

Description

Steps to Reproduce

Others

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions