-
Notifications
You must be signed in to change notification settings - Fork 1.1k
CUBLAS and CLBLAST builds on Windows #463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
#182 might help, but if it doesn't I can't help you as I use Ubuntu. |
If your issue is specific to |
Thanks for a quick reply! I will try that tomorrow. |
This worked for me (win10 + AMD GPU)
conda install -c conda-forge clblast
set CMAKE_ARGS="-DLLAMA_CLBLAST=on" && set FORCE_CMAKE=1 && set LLAMA_CLBLAST=1 && pip install llama-cpp-python --no-cache-dir
(base) C:\WINDOWS\system32>python
Python 3.10.12 | packaged by Anaconda, Inc. | (main, Jul 5 2023, 19:01:18) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from llama_cpp import Llama
ggml_opencl: selecting platform: 'AMD Accelerated Parallel Processing'
ggml_opencl: selecting device: 'gfx1031'
ggml_opencl: device FP16 support: true and when loading model >>> llm = Llama(model_path='./../models/nous-hermes-13b.ggmlv3.q4_K_M.bin', n_gpu_layers=8)
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
|
i will say that this worked well for me, i can make a cublas and a clblast version, and its fine on windows. caveats:
|
For the installation and the solution that produced the result, see user jllllllllll's post: Problem to install llama-cpp-python on Windows 10 with GPU NVidia Support CUBlast, BLAS = 0 #721 |
Please write an instruction how to make CUBLAS and CLBLAST builds on Windows. I have spent like half of the day without any success. My current attempt for CUBLAS is the following bat file:
SET CUDAFLAGS="-arch=all -lcublas" && SET LLAMA_CUBLAS=1 && SET CMAKE_ARGS="-DLLAMA_CUBLAS=on" && SET FORCE_CMAKE=1 && pip install llama-cpp-python[server] --force-reinstall --upgrade --no-cache-dir
pause
pip uninstall pydantic
pip install "pydantic==1.*"
And for CLBLAST:
SET LLAMA_CLBLAST=1 && SET CMAKE_ARGS="-DLLAMA_CLBLAST=on" && SET FORCE_CMAKE=1 && pip install llama-cpp-python[server] --force-reinstall --upgrade --no-cache-dir
pause
pip uninstall pydantic
pip install "pydantic==1.*"
Somehow it doesn't like pydantic v2.* and I had to downgrade it.
Neither of them seem to work. When I run
python -m llama_cpp.server --model c:\ai\llama\Wizard-Vicuna-13B-Uncensored.ggmlv3.q5_K_M.bin --n_gpu_layers 100 --use_mmap 0
All layers are loaded in to the RAM.
The text was updated successfully, but these errors were encountered: