Skip to content

CUBLAS and CLBLAST builds on Windows #463

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
goodglitch opened this issue Jul 9, 2023 · 6 comments
Open

CUBLAS and CLBLAST builds on Windows #463

goodglitch opened this issue Jul 9, 2023 · 6 comments
Labels
documentation Improvements or additions to documentation duplicate This issue or pull request already exists windows A Windoze-specific issue

Comments

@goodglitch
Copy link

goodglitch commented Jul 9, 2023

Please write an instruction how to make CUBLAS and CLBLAST builds on Windows. I have spent like half of the day without any success. My current attempt for CUBLAS is the following bat file:

SET CUDAFLAGS="-arch=all -lcublas" && SET LLAMA_CUBLAS=1 && SET CMAKE_ARGS="-DLLAMA_CUBLAS=on" && SET FORCE_CMAKE=1 && pip install llama-cpp-python[server] --force-reinstall --upgrade --no-cache-dir
pause
pip uninstall pydantic
pip install "pydantic==1.*"

And for CLBLAST:

SET LLAMA_CLBLAST=1 && SET CMAKE_ARGS="-DLLAMA_CLBLAST=on" && SET FORCE_CMAKE=1 && pip install llama-cpp-python[server] --force-reinstall --upgrade --no-cache-dir
pause
pip uninstall pydantic
pip install "pydantic==1.*"

Somehow it doesn't like pydantic v2.* and I had to downgrade it.

Neither of them seem to work. When I run
python -m llama_cpp.server --model c:\ai\llama\Wizard-Vicuna-13B-Uncensored.ggmlv3.q5_K_M.bin --n_gpu_layers 100 --use_mmap 0

All layers are loaded in to the RAM.

@gjmulder gjmulder added documentation Improvements or additions to documentation windows A Windoze-specific issue labels Jul 9, 2023
@gjmulder
Copy link
Contributor

gjmulder commented Jul 9, 2023

#182 might help, but if it doesn't I can't help you as I use Ubuntu. llama.cpp is not well supported on Windows as it was written for MacOS and ported to Linux.

@gjmulder gjmulder added the duplicate This issue or pull request already exists label Jul 9, 2023
@gjmulder
Copy link
Contributor

gjmulder commented Jul 9, 2023

If your issue is specific to pydantic #457 may be relevant.

@goodglitch
Copy link
Author

#182 might help, but if it doesn't I can't help you as I use Ubuntu. llama.cpp is not well supported on Windows as it was written for MacOS and ported to Linux.

Thanks for a quick reply! I will try that tomorrow.

@Kolyan1414
Copy link

This worked for me (win10 + AMD GPU)

  • Ensure you have cmake installed
  • install python clblast package using conda
    • for some reason installation of pyclblast failed for me, so I used conda
conda install -c conda-forge clblast
  • then just install llama-cpp-python adding required PATH variables
set CMAKE_ARGS="-DLLAMA_CLBLAST=on" && set FORCE_CMAKE=1 && set LLAMA_CLBLAST=1 && pip install llama-cpp-python --no-cache-dir
  • You can optionally add --verbose argument to verify it found and used BLAS
  • Check if it works
(base) C:\WINDOWS\system32>python
Python 3.10.12 | packaged by Anaconda, Inc. | (main, Jul  5 2023, 19:01:18) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from llama_cpp import Llama
ggml_opencl: selecting platform: 'AMD Accelerated Parallel Processing'
ggml_opencl: selecting device: 'gfx1031'
ggml_opencl: device FP16 support: true

and when loading model

>>> llm = Llama(model_path='./../models/nous-hermes-13b.ggmlv3.q4_K_M.bin', n_gpu_layers=8)
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 

| BLAS = 1 |

  • Probably you also need to add GGML_OPENCL_DEVICE=... to system variables to specify which device to use by default (in the case you have several)

@earonesty
Copy link
Contributor

i will say that this worked well for me, i can make a cublas and a clblast version, and its fine on windows.

caveats:

  • clblast will work with nvidia, but won't use f16 (cuz bad nv opencl drivers)
  • clblast doesn't do multigpu (wish it could span my amd and nvidia! that would be cool!)
  • cublas tensor splitting is .... idiosyncratic. not sure how to get it to "just auto-split" evenly and "only if needed". manually setting the splits always seems to work tho.

@ForwardForward
Copy link

For the installation and the solution that produced the result, see user jllllllllll's post:

Problem to install llama-cpp-python on Windows 10 with GPU NVidia Support CUBlast, BLAS = 0 #721
#721 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation duplicate This issue or pull request already exists windows A Windoze-specific issue
Projects
None yet
Development

No branches or pull requests

5 participants