You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/getting_started/cpu-installation.rst
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,11 +5,11 @@ Installation with CPU
5
5
6
6
vLLM initially supports basic model inferencing and serving on x86 CPU platform, with data types FP32, FP16 and BF16. vLLM CPU backend supports the following vLLM features:
7
7
8
-
- Tensor Parallel (``-tp = N``)
9
-
- Quantization (``INT8 W8A8, AWQ``)
10
-
11
-
.. note::
12
-
More advanced features on `chunked-prefill`, `prefix-caching` and `FP8 KV cache` are under development and will be available soon.
0 commit comments