-
Notifications
You must be signed in to change notification settings - Fork 1.1k
CUBLAS compiled but not working with batch_size = 512 #113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@gjmulder what does the utilisation look like when running cuBLAS with the llama.cpp examples? |
Currently exploring different batch sizes using
|
See #131. It looks to be and issue with |
Hi, update I compilied per instructions for GPU-
No errors. But the when running the model in jupyter notebook, i clearly see the model load with Did i do something wrong or we have an issue? |
@Free-Radical Try with |
Did not work, |
I am trying to run the server with with CUBLAS compiled in. I've upped
n_batch
to 512, and reducedn_ctx
to 512:CUBLAS v12:
Both are changed to
512
for optimal performance w/alpaca-lora-65B-GGML
:Build seems fine:
Looks installed correctly:
BLAS enabled.
n_ctx = 512
, so it looks like has picked up my changes to__main__.py
:nvidia-smi
reports python PID #105410
, same as the server above:But no GPU utilisation when I hit the API 😞
The text was updated successfully, but these errors were encountered: