-
Notifications
You must be signed in to change notification settings - Fork 12k
does not compile on CUDA 10 anymore #4123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Makefile needs to be modified because 10's nvcc doesn't have the --forward-unknown-to-host-compiler option, nor the -arch=native |
I second this with CUDA 11 on Ubuntu 22.04. I did not succeed installing CUDA 12 on my Ubuntu 22.04, so I am stuck with 11. ifdef WEICON_BROKEN
NVCCFLAGS += -arch=compute_86
else
NVCCFLAGS += -arch=native
endif |
So there is hope for me building this on windows 8.1 with cublas? |
Hi I tried to use your patch to compile on my Nvidia Jetson Nano, but I'm getting some new errors because of it. The jetson runs cuda 10.2, any idea what is wrong?
|
I fix follow by this: ggml-org/whisper.cpp#1018 same error: user@ubuntu:~/llama.cpp$ make LLAMA_CUBLAS=1
I llama.cpp build info:
I UNAME_S: Linux
I UNAME_P: aarch64
I UNAME_M: aarch64
I CFLAGS: -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/usr/local/cuda-10.2/targets/aarch64-linux/include -std=c11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -pthread -mcpu=armv8.3-a
I CXXFLAGS: -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/usr/local/cuda-10.2/targets/aarch64-linux/include -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mcpu=armv8.3-a -Wno-array-bounds -Wno-format-truncation
I NVCCFLAGS: --compiler-options=" " -use_fast_math -arch=compute_62 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128
I LDFLAGS: -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/usr/local/cuda-10.2/targets/aarch64-linux/lib
I CC: cc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
I CXX: g++ (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
nvcc --compiler-options=" " -use_fast_math -arch=compute_62 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -c ggml-cuda.cu -o ggml-cuda.o
ggml-cuda.cu(6947): error: identifier "CUBLAS_COMPUTE_16F" is undefined
ggml-cuda.cu(7923): error: identifier "CUBLAS_COMPUTE_16F" is undefined
ggml-cuda.cu(7957): error: identifier "CUBLAS_COMPUTE_16F" is undefined
3 errors detected in the compilation of "/tmp/tmpxft_00007758_00000000-6_ggml-cuda.cpp1.ii".
Makefile:457: recipe for target 'ggml-cuda.o' failed
make: *** [ggml-cuda.o] Error 1 |
0001-fix-old-jetson-compile-error.patch |
It looks like a promising patch, thanks! I can only test this in the new year, unfortunately, but I'll let know the results by then. |
Nice, that patch does fix the compile issue. However, something else is up:
But it actually dies at various lines. Hmm I'll check past revisions or something. |
Okay, it's broken since bcc0eb4 Which is the "per-layer KV cache + quantum K cache" update. |
I tried the patch on my Nano ubuntu18 with cuda10.2, but it doesn't work for me. I believe i have the setup ok. also updates gcc and g++ to version8. any ideas what it going wrong? |
I tested it on Jetson Tx2 and compiled gcc 8.5 myself. Do not use gcc8 from the apt source , it does not work,I have submitted the content of the patch to the repository, you can directly compile it using the latest code |
Thanks a lot for the help! I am not sure what repository you mean though. Do you have one with the correctly compiled gcc8.5? i gave it a quick try to do it myself, but it has a some parameters i am not sure how to set. |
This takes a long long long long time and take a lot of space, |
UPDATE: Managed to compile now! Needed to export the gcc installation for make by:
I've installed gcc 8.5 from source
and after removing this line in the Makefile to get rid of an error
|
your cc is gcc7,not gcc8 |
Any updates? |
I just got my TX2 working with the latest commit of the master branch(a33e6a0, 02/26 2024). And the following is what I have done.
llama_print_timings: load time = 15632.56 ms |
Hmm, it does compile with CUDA 10.2 (but not with CUDA 10.1 which I previously used). I didn't even bother compiling a proper gcc, just disabled the version check in /cuda-toolkit/targets/x86_64-linux/include/crt/host_config.h Then first compiled ggml-cuda.cu by hand like so:
And continued with make LLAMA_CUBLAS=1 as usual. |
2bf8d0f broke it on CUDA 10.2 |
@whoreson this is getting a bit tiresome. Are you going to ask people to harass me over this again? Let's be clear: I have no interest in supporting ancient versions of CUDA. If this is important for you, you are welcome to fix it yourself and open a PR. |
I have no intention to support CUDA 10. As slaren said, if you want it supported you are free to put in the effort yourself and I will then happily review your PRs. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This issue was closed because it has been inactive for 14 days since being marked as stale. |
This version 81bc can be compiled successfully in win10 + VS2019 + Nvidia Toolkit 10.2 with the help of the following patch for
|
I made it work on Ubuntu 18.04, Jetson Nano with CUDA 10.2. with gcc 8.5
|
Ever since this got merged:
https://github.com/ggerganov/llama.cpp/pull/3370
The text was updated successfully, but these errors were encountered: