@@ -53,14 +53,19 @@ We have tested PowerInfer on the following platforms:
53
53
- x86-64 CPU (with AVX2 instructions) on Linux
54
54
- x86-64 CPU and NVIDIA GPU on Linux
55
55
- Apple M Chips on macOS (As we do not optimize for Mac, the performance improvement is not significant now.)
56
-
57
56
57
+ And new features coming soon:
58
+
59
+ - Mistral-7B model
60
+ - Online fine-grained FFN offloading to GPU
61
+ - Metal backend for sparse inference on macOS
62
+
58
63
## Getting Started
59
64
60
- - [ Installation] ( ## setup--installation )
61
- - [ Model Weights] ( ## model-weights )
65
+ - [ Installation] ( #setup-and -installation )
66
+ - [ Model Weights] ( #model-weights )
62
67
63
- ## Setup & Installation
68
+ ## Setup and Installation
64
69
### Get the Code
65
70
66
71
``` bash
@@ -70,12 +75,7 @@ cd PowerInfer
70
75
### Build
71
76
In order to build PowerInfer you have two different options. These commands are supposed to be run from the root directory of the project.
72
77
73
- Using ` make ` on Linux or macOS:
74
- ``` bash
75
- make
76
- ```
77
-
78
- Using ` CMake ` :
78
+ Using ` CMake ` on Linux or macOS:
79
79
* If you have one GPU:
80
80
``` bash
81
81
cmake -S . -B build -DLLAMA_CUBLAS=ON
@@ -130,6 +130,7 @@ PowerInfer achieves up to 11x and 8x speedup for FP16 and INT4 models!
130
130
We will release the code and data in the following order, please stay tuned!
131
131
132
132
- [x] Release core code of PowerInfer, supporting Llama-2, Falcon-40B.
133
+ - [ ] Support Mistral-7B
133
134
- [ ] Release perplexity evaluation code
134
135
- [ ] Support Metal for Mac
135
136
- [ ] Release code for OPT models
0 commit comments