Add Ascend NPU as a new backend

# Prerequisites

Please answer the following questions for yourself before submitting an issue.

- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [x] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md).
- [x] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/ggerganov/llama.cpp/discussions), and have a new bug or useful enhancement to share.

# Feature Description

Add Ascend NPU as a new backend.

# Motivation

Ascend is a full-stack AI computing infrastructure for industry applications and services based on Huawei Ascend processors and software. For more information about Ascend, see [Ascend Community](https://www.hiascend.com/en/).

[CANN](https://www.hiascend.com/en/software/cann) (Compute Architecture of Neural Networks), developped by Huawei, is a heterogeneous computing architecture for AI.

Pytorch has officially announced [support for Ascend NPU](https://pytorch.org/blog/pytorch-2-1/) (through key PrivateUse1), please see the PrivateUse1 tutorial [here](https://pytorch.org/tutorials/advanced/privateuseone.html).

Provide new backend support for llama.cpp, allowing users who are using Ascend NPU to inference model with llama.cpp.

# Possible Implementation

Currently, the community has provided a convenient backend access mechanism. Ascend NPU is a CUDA-LIKE device, and I plan to reference CUDA's implementation to complete the Ascend NPU backend.

Due to the large workload, I plan to complete this feature in multiple stages. First, I will focus on compiling, backend registration, and device runtime functionalities. Additionally, I will add a new test file to validate backend registration, memory allocation, tensor operations, and other functionalities.

Next, I will proceed to implement tensor operators and validate them.

Afterward, do performance implementation, including split tensor support.

See also: very first commit #6035.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Ascend NPU as a new backend #6034

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add Ascend NPU as a new backend #6034

Description

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions