Skip to content

Fixed CUBLAS DLL load issues on Windows #225

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 17, 2023
Merged

Fixed CUBLAS DLL load issues on Windows #225

merged 1 commit into from
May 17, 2023

Conversation

aneeshjoy
Copy link
Contributor

Description:

This change fixes DLL load issues when CUBLAS is enabled on Windows.

Symptoms:

Traceback (most recent call last):
  File "path redacted...\venv\lib\site-packages\llama_cpp\llama_cpp.py", line 55, in _load_shared_library
    return ctypes.CDLL(str(_lib_path))
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\lib\ctypes\__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
FileNotFoundError: Could not find module 'path redacted...\venv\Lib\site-packages\llama_cpp\llama.dll' (or one of its dependencies). Try using the full path with constructor syntax.

Root cause:
Python for Windows does not search for DLLs in directories listed in PATH environment variables. So CUDA libraries are not available while loading llama.dll. Hence the error module or dependencies not found.

Python change log for Windows

Fix:
Add CUDA_PATH to dll search directory via add_dll_directory

Result:

llama.cpp: loading model from .....redacted.....
llama_model_load_internal: format     = ggjt v2 (latest)
llama_model_load_internal: n_vocab    = 32001
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =  72.75 KB
llama_model_load_internal: mem required  = 5809.34 MB (+ 1026.00 MB per state)
llama_model_load_internal: [cublas] offloading 24 layers to GPU
llama_model_load_internal: [cublas] total VRAM used: 2895 MB
llama_init_from_file: kv self size  =  256.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 

@abetlen
Copy link
Owner

abetlen commented May 17, 2023

@aneeshjoy I see the python changelog, that makes sense, however is bin the right path? In any case I've merged this, hope it helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build windows A Windoze-specific issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants