|
| 1 | + |
| 2 | +# llama-cpp-python - MacOS Install with Metal GPU |
| 3 | + |
| 4 | + |
| 5 | +**(1) Make sure you have xcode installed... at least the command line parts** |
| 6 | +``` |
| 7 | +# check the path of your xcode install |
| 8 | +xcode-select -p |
| 9 | +
|
| 10 | +# xcode installed returns |
| 11 | +# /Applications/Xcode-beta.app/Contents/Developer |
| 12 | +
|
| 13 | +# if xcode is missing then install it... it takes ages; |
| 14 | +xcode-select --install |
| 15 | +``` |
| 16 | + |
| 17 | +**(2) Install the conda version for MacOS that supports Metal GPU** |
| 18 | +``` |
| 19 | +wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh |
| 20 | +bash Miniforge3-MacOSX-arm64.sh |
| 21 | +``` |
| 22 | + |
| 23 | +**(3) Make a conda environment** |
| 24 | +``` |
| 25 | +conda create -n llama python=3.9.16 |
| 26 | +conda activate llama |
| 27 | +``` |
| 28 | + |
| 29 | +**(4) Install the LATEST llama-cpp-python.. which, as of just today, happily supports MacOS Metal GPU** |
| 30 | + *(you needed xcode installed in order pip to build/compile the C++ code)* |
| 31 | +``` |
| 32 | +pip uninstall llama-cpp-python -y |
| 33 | +CMAKE_ARGS="-DLLAMA_METAL=on" FORCE_CMAKE=1 pip install -U llama-cpp-python --no-cache-dir |
| 34 | +pip install 'llama-cpp-python[server]' |
| 35 | +
|
| 36 | +# you should now have llama-cpp-python v0.1.62 installed |
| 37 | +llama-cpp-python 0.1.62 |
| 38 | +
|
| 39 | +``` |
| 40 | + |
| 41 | +**(4) Download a v3 ggml llama/vicuna/alpaca model** |
| 42 | + - **ggmlv3** |
| 43 | + - file name ends with **q4_0.bin** - indicating it is 4bit quantized, with quantisation method 0 |
| 44 | + |
| 45 | +https://huggingface.co/vicuna/ggml-vicuna-13b-1.1/blob/main/ggml-vic13b-q4_0.bin |
| 46 | +https://huggingface.co/vicuna/ggml-vicuna-13b-1.1/blob/main/ggml-vic13b-uncensored-q4_0.bin |
| 47 | +https://huggingface.co/TheBloke/LLaMa-7B-GGML/blob/main/llama-7b.ggmlv3.q4_0.bin |
| 48 | +https://huggingface.co/TheBloke/LLaMa-13B-GGML/blob/main/llama-13b.ggmlv3.q4_0.bin |
| 49 | + |
| 50 | + |
| 51 | +**(6) run the llama-cpp-python API server with MacOS Metal GPU support** |
| 52 | +``` |
| 53 | +# config your ggml model path |
| 54 | +# make sure it is ggml v3 |
| 55 | +# make sure it is q4_0 |
| 56 | +export MODEL=[path to your llama.cpp ggml models]]/[ggml-model-name]]q4_0.bin |
| 57 | +python3 -m llama_cpp.server --model $MODEL --n_gpu_layers 1 |
| 58 | +``` |
| 59 | + |
| 60 | +***Note:** If you omit the `--n_gpu_layers 1` then CPU will be used* |
| 61 | + |
| 62 | + |
0 commit comments