does not compile on CUDA 10 anymore #4123

whoreson · 2023-11-18T10:38:49Z

Ever since this got merged:
https://github.com/ggerganov/llama.cpp/pull/3370

whoreson · 2023-11-18T10:50:42Z

Makefile needs to be modified because 10's nvcc doesn't have the --forward-unknown-to-host-compiler option, nor the -arch=native
cuda10.patch

WeirdConstructor · 2023-11-18T10:59:07Z

I second this with CUDA 11 on Ubuntu 22.04. I did not succeed installing CUDA 12 on my Ubuntu 22.04, so I am stuck with 11.
I added following ugly hack to my Makefile, which seems to work for my system:

ifdef WEICON_BROKEN
	NVCCFLAGS += -arch=compute_86
else
	NVCCFLAGS += -arch=native
endif

Ph0rk0z · 2023-11-18T14:07:21Z

So there is hope for me building this on windows 8.1 with cublas?

rvandernoort · 2023-11-27T12:01:05Z

Makefile needs to be modified because 10's nvcc doesn't have the --forward-unknown-to-host-compiler option, nor the -arch=native cuda10.patch

Hi I tried to use your patch to compile on my Nvidia Jetson Nano, but I'm getting some new errors because of it. The jetson runs cuda 10.2, any idea what is wrong?

make LLAMA_CUBLAS=1
I llama.cpp build info: 
I UNAME_S:   Linux
I UNAME_P:   aarch64
I UNAME_M:   aarch64
I CFLAGS:    -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include  -std=c11   -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -pthread -mcpu=native 
I CXXFLAGS:  -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mcpu=native  -Wno-array-bounds -Wno-format-truncation 
I NVCCFLAGS: --compiler-options="  " -use_fast_math -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128
I LDFLAGS:   -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib 
I CC:        cc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
I CXX:       g++ (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0

cc  -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include  -std=c11   -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -pthread -mcpu=native    -c ggml.c -o ggml.o
g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mcpu=native  -Wno-array-bounds -Wno-format-truncation  -c llama.cpp -o llama.o
g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mcpu=native  -Wno-array-bounds -Wno-format-truncation  -c common/common.cpp -o common.o
g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mcpu=native  -Wno-array-bounds -Wno-format-truncation  -c common/sampling.cpp -o sampling.o
g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mcpu=native  -Wno-array-bounds -Wno-format-truncation  -c common/grammar-parser.cpp -o grammar-parser.o
g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mcpu=native  -Wno-array-bounds -Wno-format-truncation  -c common/build-info.cpp -o build-info.o
g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mcpu=native  -Wno-array-bounds -Wno-format-truncation  -c common/console.cpp -o console.o
nvcc --compiler-options="  " -use_fast_math -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -c ggml-cuda.cu -o ggml-cuda.o
ggml-cuda.cu(5970): error: identifier "CUBLAS_TF32_TENSOR_OP_MATH" is undefined

ggml-cuda.cu(6617): error: identifier "CUBLAS_COMPUTE_16F" is undefined

ggml-cuda.cu(7552): error: identifier "CUBLAS_COMPUTE_16F" is undefined

ggml-cuda.cu(7586): error: identifier "CUBLAS_COMPUTE_16F" is undefined

4 errors detected in the compilation of "/tmp/tmpxft_00002e62_00000000-6_ggml-cuda.cpp1.ii".
Makefile:440: recipe for target 'ggml-cuda.o' failed
make: *** [ggml-cuda.o] Error 1

chenxuuu · 2023-12-13T07:06:31Z

I fix follow by this: ggml-org/whisper.cpp#1018

same error:

user@ubuntu:~/llama.cpp$ make LLAMA_CUBLAS=1
I llama.cpp build info:
I UNAME_S:   Linux
I UNAME_P:   aarch64
I UNAME_M:   aarch64
I CFLAGS:    -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/usr/local/cuda-10.2/targets/aarch64-linux/include  -std=c11   -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -pthread -mcpu=armv8.3-a
I CXXFLAGS:  -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/usr/local/cuda-10.2/targets/aarch64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mcpu=armv8.3-a  -Wno-array-bounds -Wno-format-truncation
I NVCCFLAGS: --compiler-options="  " -use_fast_math -arch=compute_62 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128
I LDFLAGS:   -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/usr/local/cuda-10.2/targets/aarch64-linux/lib
I CC:        cc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
I CXX:       g++ (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0

nvcc --compiler-options="  " -use_fast_math -arch=compute_62 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -c ggml-cuda.cu -o ggml-cuda.o
ggml-cuda.cu(6947): error: identifier "CUBLAS_COMPUTE_16F" is undefined

ggml-cuda.cu(7923): error: identifier "CUBLAS_COMPUTE_16F" is undefined

ggml-cuda.cu(7957): error: identifier "CUBLAS_COMPUTE_16F" is undefined

3 errors detected in the compilation of "/tmp/tmpxft_00007758_00000000-6_ggml-cuda.cpp1.ii".
Makefile:457: recipe for target 'ggml-cuda.o' failed
make: *** [ggml-cuda.o] Error 1

FantasyGmm · 2023-12-21T06:32:53Z

0001-fix-old-jetson-compile-error.patch
This patch will be useful. I checked the cuda10 documentation and modified some of the code. Now I have successfully on tx2(ub18 jetpack4 cuda10), and the performance is quite good.
Need to compile and install a new gcc8, this is the latest gcc supported by cuda10, it can solve the C compiler part of the error

rvandernoort · 2023-12-22T11:30:17Z

It looks like a promising patch, thanks! I can only test this in the new year, unfortunately, but I'll let know the results by then.

whoreson · 2023-12-22T20:16:14Z

Nice, that patch does fix the compile issue. However, something else is up:

current device: 0
GGML_ASSERT: ggml-cuda.cu:8498: !"cuBLAS error"

But it actually dies at various lines. Hmm I'll check past revisions or something.

whoreson · 2023-12-22T20:55:44Z

Okay, it's broken since bcc0eb4

Which is the "per-layer KV cache + quantum K cache" update.

he-man86 · 2024-01-04T14:28:26Z

0001-fix-old-jetson-compile-error.patch This patch will be useful. I checked the cuda10 documentation and modified some of the code. Now I have successfully on tx2(ub18 jetpack4 cuda10), and the performance is quite good. Need to compile and install a new gcc8, this is the latest gcc supported by cuda10, it can solve the C compiler part of the error

I tried the patch on my Nano ubuntu18 with cuda10.2, but it doesn't work for me. I believe i have the setup ok. also updates gcc and g++ to version8.

any ideas what it going wrong?

FantasyGmm · 2024-01-04T15:24:06Z

0001-fix-old-jetson-compile-error.patch This patch will be useful. I checked the cuda10 documentation and modified some of the code. Now I have successfully on tx2(ub18 jetpack4 cuda10), and the performance is quite good. Need to compile and install a new gcc8, this is the latest gcc supported by cuda10, it can solve the C compiler part of the error

I tried the patch on my Nano ubuntu18 with cuda10.2, but it doesn't work for me. I believe i have the setup ok. also updates gcc and g++ to version8.

any ideas what it going wrong?

I tested it on Jetson Tx2 and compiled gcc 8.5 myself. Do not use gcc8 from the apt source , it does not work,I have submitted the content of the patch to the repository, you can directly compile it using the latest code

he-man86 · 2024-01-04T16:37:38Z

0001-fix-old-jetson-compile-error.patch This patch will be useful. I checked the cuda10 documentation and modified some of the code. Now I have successfully on tx2(ub18 jetpack4 cuda10), and the performance is quite good. Need to compile and install a new gcc8, this is the latest gcc supported by cuda10, it can solve the C compiler part of the error

I tried the patch on my Nano ubuntu18 with cuda10.2, but it doesn't work for me. I believe i have the setup ok. also updates gcc and g++ to version8.
any ideas what it going wrong?

I tested it on Jetson Tx2 and compiled gcc 8.5 myself. Do not use gcc8 from the apt source , it does not work,I have submitted the content of the patch to the repository, you can directly compile it using the latest code

Thanks a lot for the help! I am not sure what repository you mean though. Do you have one with the correctly compiled gcc8.5? i gave it a quick try to do it myself, but it has a some parameters i am not sure how to set.

FantasyGmm · 2024-01-05T02:13:38Z

0001-fix-old-jetson-compile-error.patch This patch will be useful. I checked the cuda10 documentation and modified some of the code. Now I have successfully on tx2(ub18 jetpack4 cuda10), and the performance is quite good. Need to compile and install a new gcc8, this is the latest gcc supported by cuda10, it can solve the C compiler part of the error

I tried the patch on my Nano ubuntu18 with cuda10.2, but it doesn't work for me. I believe i have the setup ok. also updates gcc and g++ to version8.
any ideas what it going wrong?

I tested it on Jetson Tx2 and compiled gcc 8.5 myself. Do not use gcc8 from the apt source , it does not work,I have submitted the content of the patch to the repository, you can directly compile it using the latest code

Thanks a lot for the help! I am not sure what repository you mean though. Do you have one with the correctly compiled gcc8.5? i gave it a quick try to do it myself, but it has a some parameters i am not sure how to set.

sudo tar -zvxf gcc-8.5.0.tar.gz --directory=/usr/local/
./contrib/download_prerequisites
 mkdir build 
cd build 
sudo ../configure -enable-checking=release -enable-languages=c,c++
make -j6
make install
gcc -v

This takes a long long long long time and take a lot of space,
You can delete the gcc folder after make install.

rvandernoort · 2024-01-11T14:04:13Z

UPDATE: Managed to compile now! Needed to export the gcc installation for make by:

export CC=/usr/local/bin/gcc
export CXX=/usr/local/bin/g++

I've installed gcc 8.5 from source

gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/aarch64-unknown-linux-gnu/8.5.0/lto-wrapper
Target: aarch64-unknown-linux-gnu
Configured with: ../configure -enable-checking=release -enable-languages=c,c++
Thread model: posix
gcc version 8.5.0 (GCC)

and after removing this line in the Makefile to get rid of an error #MK_CXXFLAGS += -mcpu=native and using CUDA_DOCKER_ARCH=sm_52, I still get the following error similar to the one with cmake i've described here#3880:

make LLAMA_CUBLAS=1 CUDA_DOCKER_ARCH=sm_52
I llama.cpp build info: 
I UNAME_S:   Linux
I UNAME_P:   aarch64
I UNAME_M:   aarch64
I CFLAGS:    -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -I/usr/local/cuda/targets/aarch64-linux/include  -std=c11   -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -pthread -mcpu=native -Wdouble-promotion 
I CXXFLAGS:  -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -I/usr/local/cuda/targets/aarch64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread   -Wno-array-bounds -Wno-format-truncation
I NVCCFLAGS: -use_fast_math --forward-unknown-to-host-compiler -Wno-deprecated-gpu-targets -arch=sm_52 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 
I LDFLAGS:   -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib -L/usr/local/cuda/targets/aarch64-linux/lib -L/usr/lib/wsl/lib 
I CC:        cc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
I CXX:       g++ (GCC) 8.5.0

cc  -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -I/usr/local/cuda/targets/aarch64-linux/include  -std=c11   -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -pthread -mcpu=native -Wdouble-promotion    -c ggml.c -o ggml.o
g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -I/usr/local/cuda/targets/aarch64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread   -Wno-array-bounds -Wno-format-truncation -c llama.cpp -o llama.o
g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -I/usr/local/cuda/targets/aarch64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread   -Wno-array-bounds -Wno-format-truncation -c common/common.cpp -o common.o
g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -I/usr/local/cuda/targets/aarch64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread   -Wno-array-bounds -Wno-format-truncation -c common/sampling.cpp -o sampling.o
g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -I/usr/local/cuda/targets/aarch64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread   -Wno-array-bounds -Wno-format-truncation -c common/grammar-parser.cpp -o grammar-parser.o
g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -I/usr/local/cuda/targets/aarch64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread   -Wno-array-bounds -Wno-format-truncation -c common/build-info.cpp -o build-info.o
g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -I/usr/local/cuda/targets/aarch64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread   -Wno-array-bounds -Wno-format-truncation -c common/console.cpp -o console.o
nvcc -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -I/usr/local/cuda/targets/aarch64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread  -use_fast_math --forward-unknown-to-host-compiler -Wno-deprecated-gpu-targets -arch=sm_52 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128  -Wno-pedantic -Xcompiler "-Wno-array-bounds -Wno-format-truncation -Wextra-semi" -c ggml-cuda.cu -o ggml-cuda.o
ggml-cuda.cu(598): warning: function "warp_reduce_sum(half2)" was declared but never referenced

ggml-cuda.cu(619): warning: function "warp_reduce_max(half2)" was declared but never referenced

cc  -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -I/usr/local/cuda/targets/aarch64-linux/include  -std=c11   -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -pthread -mcpu=native -Wdouble-promotion    -c ggml-alloc.c -o ggml-alloc.o
cc  -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -I/usr/local/cuda/targets/aarch64-linux/include  -std=c11   -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -pthread -mcpu=native -Wdouble-promotion    -c ggml-backend.c -o ggml-backend.o
cc -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -I/usr/local/cuda/targets/aarch64-linux/include  -std=c11   -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -pthread -mcpu=native -Wdouble-promotion     -c ggml-quants.c -o ggml-quants.o
ggml-quants.c: In function ‘ggml_vec_dot_q2_K_q8_K’:
ggml-quants.c:403:27: error: implicit declaration of function ‘vld1q_s16_x2’; did you mean ‘vld1q_s16’? [-Werror=implicit-function-declaration]
 #define ggml_vld1q_s16_x2 vld1q_s16_x2
                           ^
ggml-quants.c:3725:41: note: in expansion of macro ‘ggml_vld1q_s16_x2’
         const ggml_int16x8x2_t q8sums = ggml_vld1q_s16_x2(y[i].bsums);
                                         ^~~~~~~~~~~~~~~~~
ggml-quants.c:403:27: error: invalid initializer
 #define ggml_vld1q_s16_x2 vld1q_s16_x2
                           ^
ggml-quants.c:3725:41: note: in expansion of macro ‘ggml_vld1q_s16_x2’
         const ggml_int16x8x2_t q8sums = ggml_vld1q_s16_x2(y[i].bsums);
                                         ^~~~~~~~~~~~~~~~~
ggml-quants.c:404:27: error: implicit declaration of function ‘vld1q_u8_x2’; did you mean ‘vld1q_u32’? [-Werror=implicit-function-declaration]
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
ggml-quants.c:3749:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
             const ggml_uint8x16x2_t q2bits = ggml_vld1q_u8_x2(q2); q2 += 32;
                                              ^~~~~~~~~~~~~~~~
ggml-quants.c:404:27: error: invalid initializer
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
ggml-quants.c:3749:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
             const ggml_uint8x16x2_t q2bits = ggml_vld1q_u8_x2(q2); q2 += 32;
                                              ^~~~~~~~~~~~~~~~
ggml-quants.c:406:27: error: implicit declaration of function ‘vld1q_s8_x2’; did you mean ‘vld1q_s32’? [-Werror=implicit-function-declaration]
 #define ggml_vld1q_s8_x2  vld1q_s8_x2
                           ^
ggml-quants.c:3751:40: note: in expansion of macro ‘ggml_vld1q_s8_x2’
             ggml_int8x16x2_t q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;
                                        ^~~~~~~~~~~~~~~~
ggml-quants.c:406:27: error: invalid initializer
 #define ggml_vld1q_s8_x2  vld1q_s8_x2
                           ^
ggml-quants.c:3751:40: note: in expansion of macro ‘ggml_vld1q_s8_x2’
             ggml_int8x16x2_t q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;
                                        ^~~~~~~~~~~~~~~~
ggml-quants.c:3743:17: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
         q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;\
                 ^
ggml-quants.c:3757:13: note: in expansion of macro ‘SHIFT_MULTIPLY_ACCUM_WITH_SCALE’
             SHIFT_MULTIPLY_ACCUM_WITH_SCALE(2, 2);
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml-quants.c:3743:17: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
         q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;\
                 ^
ggml-quants.c:3758:13: note: in expansion of macro ‘SHIFT_MULTIPLY_ACCUM_WITH_SCALE’
             SHIFT_MULTIPLY_ACCUM_WITH_SCALE(4, 4);
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml-quants.c:3743:17: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
         q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;\
                 ^
ggml-quants.c:3759:13: note: in expansion of macro ‘SHIFT_MULTIPLY_ACCUM_WITH_SCALE’
             SHIFT_MULTIPLY_ACCUM_WITH_SCALE(6, 6);
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml-quants.c: In function ‘ggml_vec_dot_q3_K_q8_K’:
ggml-quants.c:404:27: error: invalid initializer
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
ggml-quants.c:4365:36: note: in expansion of macro ‘ggml_vld1q_u8_x2’
         ggml_uint8x16x2_t qhbits = ggml_vld1q_u8_x2(qh);
                                    ^~~~~~~~~~~~~~~~
ggml-quants.c:404:27: error: invalid initializer
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
ggml-quants.c:4383:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
             const ggml_uint8x16x2_t q3bits = ggml_vld1q_u8_x2(q3); q3 += 32;
                                              ^~~~~~~~~~~~~~~~
ggml-quants.c:407:27: error: implicit declaration of function ‘vld1q_s8_x4’; did you mean ‘vld1q_s64’? [-Werror=implicit-function-declaration]
 #define ggml_vld1q_s8_x4  vld1q_s8_x4
                           ^
ggml-quants.c:4384:48: note: in expansion of macro ‘ggml_vld1q_s8_x4’
             const ggml_int8x16x4_t q8bytes_1 = ggml_vld1q_s8_x4(q8); q8 += 64;
                                                ^~~~~~~~~~~~~~~~
ggml-quants.c:407:27: error: invalid initializer
 #define ggml_vld1q_s8_x4  vld1q_s8_x4
                           ^
ggml-quants.c:4384:48: note: in expansion of macro ‘ggml_vld1q_s8_x4’
             const ggml_int8x16x4_t q8bytes_1 = ggml_vld1q_s8_x4(q8); q8 += 64;
                                                ^~~~~~~~~~~~~~~~
ggml-quants.c:407:27: error: invalid initializer
 #define ggml_vld1q_s8_x4  vld1q_s8_x4
                           ^
ggml-quants.c:4385:48: note: in expansion of macro ‘ggml_vld1q_s8_x4’
             const ggml_int8x16x4_t q8bytes_2 = ggml_vld1q_s8_x4(q8); q8 += 64;
                                                ^~~~~~~~~~~~~~~~
ggml-quants.c: In function ‘ggml_vec_dot_q4_K_q8_K’:
ggml-quants.c:404:27: error: invalid initializer
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
ggml-quants.c:5244:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
             const ggml_uint8x16x2_t q4bits = ggml_vld1q_u8_x2(q4); q4 += 32;
                                              ^~~~~~~~~~~~~~~~
ggml-quants.c:5246:21: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
             q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;
                     ^
ggml-quants.c:5253:21: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
             q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;
                     ^
ggml-quants.c: In function ‘ggml_vec_dot_q5_K_q8_K’:
ggml-quants.c:404:27: error: invalid initializer
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
ggml-quants.c:5840:36: note: in expansion of macro ‘ggml_vld1q_u8_x2’
         ggml_uint8x16x2_t qhbits = ggml_vld1q_u8_x2(qh);
                                    ^~~~~~~~~~~~~~~~
ggml-quants.c:404:27: error: invalid initializer
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
ggml-quants.c:5848:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
             const ggml_uint8x16x2_t q5bits = ggml_vld1q_u8_x2(q5); q5 += 32;
                                              ^~~~~~~~~~~~~~~~
ggml-quants.c:407:27: error: invalid initializer
 #define ggml_vld1q_s8_x4  vld1q_s8_x4
                           ^
ggml-quants.c:5849:46: note: in expansion of macro ‘ggml_vld1q_s8_x4’
             const ggml_int8x16x4_t q8bytes = ggml_vld1q_s8_x4(q8); q8 += 64;
                                              ^~~~~~~~~~~~~~~~
ggml-quants.c: In function ‘ggml_vec_dot_q6_K_q8_K’:
ggml-quants.c:403:27: error: invalid initializer
 #define ggml_vld1q_s16_x2 vld1q_s16_x2
                           ^
ggml-quants.c:6506:41: note: in expansion of macro ‘ggml_vld1q_s16_x2’
         const ggml_int16x8x2_t q8sums = ggml_vld1q_s16_x2(y[i].bsums);
                                         ^~~~~~~~~~~~~~~~~
ggml-quants.c:404:27: error: invalid initializer
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
ggml-quants.c:6520:40: note: in expansion of macro ‘ggml_vld1q_u8_x2’
             ggml_uint8x16x2_t qhbits = ggml_vld1q_u8_x2(qh); qh += 32;
                                        ^~~~~~~~~~~~~~~~
ggml-quants.c:405:27: error: implicit declaration of function ‘vld1q_u8_x4’; did you mean ‘vld1q_u64’? [-Werror=implicit-function-declaration]
 #define ggml_vld1q_u8_x4  vld1q_u8_x4
                           ^
ggml-quants.c:6521:40: note: in expansion of macro ‘ggml_vld1q_u8_x4’
             ggml_uint8x16x4_t q6bits = ggml_vld1q_u8_x4(q6); q6 += 64;
                                        ^~~~~~~~~~~~~~~~
ggml-quants.c:405:27: error: invalid initializer
 #define ggml_vld1q_u8_x4  vld1q_u8_x4
                           ^
ggml-quants.c:6521:40: note: in expansion of macro ‘ggml_vld1q_u8_x4’
             ggml_uint8x16x4_t q6bits = ggml_vld1q_u8_x4(q6); q6 += 64;
                                        ^~~~~~~~~~~~~~~~
ggml-quants.c:407:27: error: invalid initializer
 #define ggml_vld1q_s8_x4  vld1q_s8_x4
                           ^
ggml-quants.c:6522:40: note: in expansion of macro ‘ggml_vld1q_s8_x4’
             ggml_int8x16x4_t q8bytes = ggml_vld1q_s8_x4(q8); q8 += 64;
                                        ^~~~~~~~~~~~~~~~
ggml-quants.c:6547:21: error: incompatible types when assigning to type ‘int8x16x4_t {aka struct int8x16x4_t}’ from type ‘int’
             q8bytes = ggml_vld1q_s8_x4(q8); q8 += 64;
                     ^
ggml-quants.c: In function ‘ggml_vec_dot_iq2_xxs_q8_K’:
ggml-quants.c:7264:17: error: incompatible types when assigning to type ‘int8x16x4_t {aka struct int8x16x4_t}’ from type ‘int’
             q8b = ggml_vld1q_s8_x4(q8); q8 += 64;
                 ^
cc1: some warnings being treated as errors
Makefile:552: recipe for target 'ggml-quants.o' failed
make: *** [ggml-quants.o] Error 1

FantasyGmm · 2024-01-12T13:35:44Z

UPDATE: Managed to compile now! Needed to export the gcc installation for make by:


export CC=/usr/local/bin/gcc

export CXX=/usr/local/bin/g++

I've installed gcc 8.5 from source


gcc -v

Using built-in specs.

COLLECT_GCC=gcc

COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/aarch64-unknown-linux-gnu/8.5.0/lto-wrapper

Target: aarch64-unknown-linux-gnu

Configured with: ../configure -enable-checking=release -enable-languages=c,c++

Thread model: posix

gcc version 8.5.0 (GCC)

and after removing this line in the Makefile to get rid of an error #MK_CXXFLAGS += -mcpu=native and using CUDA_DOCKER_ARCH=sm_52, I still get the following error similar to the one with cmake i've described here#3880:


make LLAMA_CUBLAS=1 CUDA_DOCKER_ARCH=sm_52

I llama.cpp build info: 

I UNAME_S:   Linux

I UNAME_P:   aarch64

I UNAME_M:   aarch64

I CFLAGS:    -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -I/usr/local/cuda/targets/aarch64-linux/include  -std=c11   -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -pthread -mcpu=native -Wdouble-promotion 

I CXXFLAGS:  -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -I/usr/local/cuda/targets/aarch64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread   -Wno-array-bounds -Wno-format-truncation

I NVCCFLAGS: -use_fast_math --forward-unknown-to-host-compiler -Wno-deprecated-gpu-targets -arch=sm_52 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 

I LDFLAGS:   -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib -L/usr/local/cuda/targets/aarch64-linux/lib -L/usr/lib/wsl/lib 

I CC:        cc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0

I CXX:       g++ (GCC) 8.5.0



cc  -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -I/usr/local/cuda/targets/aarch64-linux/include  -std=c11   -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -pthread -mcpu=native -Wdouble-promotion    -c ggml.c -o ggml.o

g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -I/usr/local/cuda/targets/aarch64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread   -Wno-array-bounds -Wno-format-truncation -c llama.cpp -o llama.o

g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -I/usr/local/cuda/targets/aarch64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread   -Wno-array-bounds -Wno-format-truncation -c common/common.cpp -o common.o

g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -I/usr/local/cuda/targets/aarch64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread   -Wno-array-bounds -Wno-format-truncation -c common/sampling.cpp -o sampling.o

g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -I/usr/local/cuda/targets/aarch64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread   -Wno-array-bounds -Wno-format-truncation -c common/grammar-parser.cpp -o grammar-parser.o

g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -I/usr/local/cuda/targets/aarch64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread   -Wno-array-bounds -Wno-format-truncation -c common/build-info.cpp -o build-info.o

g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -I/usr/local/cuda/targets/aarch64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread   -Wno-array-bounds -Wno-format-truncation -c common/console.cpp -o console.o

nvcc -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -I/usr/local/cuda/targets/aarch64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread  -use_fast_math --forward-unknown-to-host-compiler -Wno-deprecated-gpu-targets -arch=sm_52 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128  -Wno-pedantic -Xcompiler "-Wno-array-bounds -Wno-format-truncation -Wextra-semi" -c ggml-cuda.cu -o ggml-cuda.o

ggml-cuda.cu(598): warning: function "warp_reduce_sum(half2)" was declared but never referenced



ggml-cuda.cu(619): warning: function "warp_reduce_max(half2)" was declared but never referenced



cc  -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -I/usr/local/cuda/targets/aarch64-linux/include  -std=c11   -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -pthread -mcpu=native -Wdouble-promotion    -c ggml-alloc.c -o ggml-alloc.o

cc  -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -I/usr/local/cuda/targets/aarch64-linux/include  -std=c11   -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -pthread -mcpu=native -Wdouble-promotion    -c ggml-backend.c -o ggml-backend.o

cc -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -I/usr/local/cuda/targets/aarch64-linux/include  -std=c11   -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -pthread -mcpu=native -Wdouble-promotion     -c ggml-quants.c -o ggml-quants.o

ggml-quants.c: In function ‘ggml_vec_dot_q2_K_q8_K’:

ggml-quants.c:403:27: error: implicit declaration of function ‘vld1q_s16_x2’; did you mean ‘vld1q_s16’? [-Werror=implicit-function-declaration]

 #define ggml_vld1q_s16_x2 vld1q_s16_x2

                           ^

ggml-quants.c:3725:41: note: in expansion of macro ‘ggml_vld1q_s16_x2’

         const ggml_int16x8x2_t q8sums = ggml_vld1q_s16_x2(y[i].bsums);

                                         ^~~~~~~~~~~~~~~~~

ggml-quants.c:403:27: error: invalid initializer

 #define ggml_vld1q_s16_x2 vld1q_s16_x2

                           ^

ggml-quants.c:3725:41: note: in expansion of macro ‘ggml_vld1q_s16_x2’

         const ggml_int16x8x2_t q8sums = ggml_vld1q_s16_x2(y[i].bsums);

                                         ^~~~~~~~~~~~~~~~~

ggml-quants.c:404:27: error: implicit declaration of function ‘vld1q_u8_x2’; did you mean ‘vld1q_u32’? [-Werror=implicit-function-declaration]

 #define ggml_vld1q_u8_x2  vld1q_u8_x2

                           ^

ggml-quants.c:3749:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’

             const ggml_uint8x16x2_t q2bits = ggml_vld1q_u8_x2(q2); q2 += 32;

                                              ^~~~~~~~~~~~~~~~

ggml-quants.c:404:27: error: invalid initializer

 #define ggml_vld1q_u8_x2  vld1q_u8_x2

                           ^

ggml-quants.c:3749:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’

             const ggml_uint8x16x2_t q2bits = ggml_vld1q_u8_x2(q2); q2 += 32;

                                              ^~~~~~~~~~~~~~~~

ggml-quants.c:406:27: error: implicit declaration of function ‘vld1q_s8_x2’; did you mean ‘vld1q_s32’? [-Werror=implicit-function-declaration]

 #define ggml_vld1q_s8_x2  vld1q_s8_x2

                           ^

ggml-quants.c:3751:40: note: in expansion of macro ‘ggml_vld1q_s8_x2’

             ggml_int8x16x2_t q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;

                                        ^~~~~~~~~~~~~~~~

ggml-quants.c:406:27: error: invalid initializer

 #define ggml_vld1q_s8_x2  vld1q_s8_x2

                           ^

ggml-quants.c:3751:40: note: in expansion of macro ‘ggml_vld1q_s8_x2’

             ggml_int8x16x2_t q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;

                                        ^~~~~~~~~~~~~~~~

ggml-quants.c:3743:17: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’

         q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;\

                 ^

ggml-quants.c:3757:13: note: in expansion of macro ‘SHIFT_MULTIPLY_ACCUM_WITH_SCALE’

             SHIFT_MULTIPLY_ACCUM_WITH_SCALE(2, 2);

             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

ggml-quants.c:3743:17: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’

         q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;\

                 ^

ggml-quants.c:3758:13: note: in expansion of macro ‘SHIFT_MULTIPLY_ACCUM_WITH_SCALE’

             SHIFT_MULTIPLY_ACCUM_WITH_SCALE(4, 4);

             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

ggml-quants.c:3743:17: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’

         q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;\

                 ^

ggml-quants.c:3759:13: note: in expansion of macro ‘SHIFT_MULTIPLY_ACCUM_WITH_SCALE’

             SHIFT_MULTIPLY_ACCUM_WITH_SCALE(6, 6);

             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

ggml-quants.c: In function ‘ggml_vec_dot_q3_K_q8_K’:

ggml-quants.c:404:27: error: invalid initializer

 #define ggml_vld1q_u8_x2  vld1q_u8_x2

                           ^

ggml-quants.c:4365:36: note: in expansion of macro ‘ggml_vld1q_u8_x2’

         ggml_uint8x16x2_t qhbits = ggml_vld1q_u8_x2(qh);

                                    ^~~~~~~~~~~~~~~~

ggml-quants.c:404:27: error: invalid initializer

 #define ggml_vld1q_u8_x2  vld1q_u8_x2

                           ^

ggml-quants.c:4383:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’

             const ggml_uint8x16x2_t q3bits = ggml_vld1q_u8_x2(q3); q3 += 32;

                                              ^~~~~~~~~~~~~~~~

ggml-quants.c:407:27: error: implicit declaration of function ‘vld1q_s8_x4’; did you mean ‘vld1q_s64’? [-Werror=implicit-function-declaration]

 #define ggml_vld1q_s8_x4  vld1q_s8_x4

                           ^

ggml-quants.c:4384:48: note: in expansion of macro ‘ggml_vld1q_s8_x4’

             const ggml_int8x16x4_t q8bytes_1 = ggml_vld1q_s8_x4(q8); q8 += 64;

                                                ^~~~~~~~~~~~~~~~

ggml-quants.c:407:27: error: invalid initializer

 #define ggml_vld1q_s8_x4  vld1q_s8_x4

                           ^

ggml-quants.c:4384:48: note: in expansion of macro ‘ggml_vld1q_s8_x4’

             const ggml_int8x16x4_t q8bytes_1 = ggml_vld1q_s8_x4(q8); q8 += 64;

                                                ^~~~~~~~~~~~~~~~

ggml-quants.c:407:27: error: invalid initializer

 #define ggml_vld1q_s8_x4  vld1q_s8_x4

                           ^

ggml-quants.c:4385:48: note: in expansion of macro ‘ggml_vld1q_s8_x4’

             const ggml_int8x16x4_t q8bytes_2 = ggml_vld1q_s8_x4(q8); q8 += 64;

                                                ^~~~~~~~~~~~~~~~

ggml-quants.c: In function ‘ggml_vec_dot_q4_K_q8_K’:

ggml-quants.c:404:27: error: invalid initializer

 #define ggml_vld1q_u8_x2  vld1q_u8_x2

                           ^

ggml-quants.c:5244:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’

             const ggml_uint8x16x2_t q4bits = ggml_vld1q_u8_x2(q4); q4 += 32;

                                              ^~~~~~~~~~~~~~~~

ggml-quants.c:5246:21: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’

             q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;

                     ^

ggml-quants.c:5253:21: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’

             q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;

                     ^

ggml-quants.c: In function ‘ggml_vec_dot_q5_K_q8_K’:

ggml-quants.c:404:27: error: invalid initializer

 #define ggml_vld1q_u8_x2  vld1q_u8_x2

                           ^

ggml-quants.c:5840:36: note: in expansion of macro ‘ggml_vld1q_u8_x2’

         ggml_uint8x16x2_t qhbits = ggml_vld1q_u8_x2(qh);

                                    ^~~~~~~~~~~~~~~~

ggml-quants.c:404:27: error: invalid initializer

 #define ggml_vld1q_u8_x2  vld1q_u8_x2

                           ^

ggml-quants.c:5848:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’

             const ggml_uint8x16x2_t q5bits = ggml_vld1q_u8_x2(q5); q5 += 32;

                                              ^~~~~~~~~~~~~~~~

ggml-quants.c:407:27: error: invalid initializer

 #define ggml_vld1q_s8_x4  vld1q_s8_x4

                           ^

ggml-quants.c:5849:46: note: in expansion of macro ‘ggml_vld1q_s8_x4’

             const ggml_int8x16x4_t q8bytes = ggml_vld1q_s8_x4(q8); q8 += 64;

                                              ^~~~~~~~~~~~~~~~

ggml-quants.c: In function ‘ggml_vec_dot_q6_K_q8_K’:

ggml-quants.c:403:27: error: invalid initializer

 #define ggml_vld1q_s16_x2 vld1q_s16_x2

                           ^

ggml-quants.c:6506:41: note: in expansion of macro ‘ggml_vld1q_s16_x2’

         const ggml_int16x8x2_t q8sums = ggml_vld1q_s16_x2(y[i].bsums);

                                         ^~~~~~~~~~~~~~~~~

ggml-quants.c:404:27: error: invalid initializer

 #define ggml_vld1q_u8_x2  vld1q_u8_x2

                           ^

ggml-quants.c:6520:40: note: in expansion of macro ‘ggml_vld1q_u8_x2’

             ggml_uint8x16x2_t qhbits = ggml_vld1q_u8_x2(qh); qh += 32;

                                        ^~~~~~~~~~~~~~~~

ggml-quants.c:405:27: error: implicit declaration of function ‘vld1q_u8_x4’; did you mean ‘vld1q_u64’? [-Werror=implicit-function-declaration]

 #define ggml_vld1q_u8_x4  vld1q_u8_x4

                           ^

ggml-quants.c:6521:40: note: in expansion of macro ‘ggml_vld1q_u8_x4’

             ggml_uint8x16x4_t q6bits = ggml_vld1q_u8_x4(q6); q6 += 64;

                                        ^~~~~~~~~~~~~~~~

ggml-quants.c:405:27: error: invalid initializer

 #define ggml_vld1q_u8_x4  vld1q_u8_x4

                           ^

ggml-quants.c:6521:40: note: in expansion of macro ‘ggml_vld1q_u8_x4’

             ggml_uint8x16x4_t q6bits = ggml_vld1q_u8_x4(q6); q6 += 64;

                                        ^~~~~~~~~~~~~~~~

ggml-quants.c:407:27: error: invalid initializer

 #define ggml_vld1q_s8_x4  vld1q_s8_x4

                           ^

ggml-quants.c:6522:40: note: in expansion of macro ‘ggml_vld1q_s8_x4’

             ggml_int8x16x4_t q8bytes = ggml_vld1q_s8_x4(q8); q8 += 64;

                                        ^~~~~~~~~~~~~~~~

ggml-quants.c:6547:21: error: incompatible types when assigning to type ‘int8x16x4_t {aka struct int8x16x4_t}’ from type ‘int’

             q8bytes = ggml_vld1q_s8_x4(q8); q8 += 64;

                     ^

ggml-quants.c: In function ‘ggml_vec_dot_iq2_xxs_q8_K’:

ggml-quants.c:7264:17: error: incompatible types when assigning to type ‘int8x16x4_t {aka struct int8x16x4_t}’ from type ‘int’

             q8b = ggml_vld1q_s8_x4(q8); q8 += 64;

                 ^

cc1: some warnings being treated as errors

Makefile:552: recipe for target 'ggml-quants.o' failed

make: *** [ggml-quants.o] Error 1

your cc is gcc7，not gcc8

oiwn · 2024-02-26T01:41:01Z

Any updates?

otaGran · 2024-02-26T21:05:32Z

I just got my TX2 working with the latest commit of the master branch(a33e6a0, 02/26 2024). And the following is what I have done.

A factory reset TX2 to JetPack 4.6.4, the last version which still support the Jestson TX2. JetPack 4.6.4 provide Cuda 10.2 and GCC 7.
Enable all six cores by sudo nvpmodel -m 0, use jetson-fan-ctl to keep the fan running, and jetson-stats to monitor the usage.

Compile and install GCC 8.5 following @FantasyGmm 's guide. I have made a copy to here:

wget https://bigsearcher.com/mirrors/gcc/releases/gcc-8.5.0/gcc-8.5.0.tar.gz
sudo tar -zvxf gcc-8.5.0.tar.gz --directory=/usr/local/
./contrib/download_prerequisites
mkdir build 
cd build 
sudo ../configure -enable-checking=release -enable-languages=c,c++
make -j6
make install

Set the correct gcc/g++

export CC=/usr/local/bin/gcc
export CXX=/usr/local/bin/g++

Changed the line in Makefile from
```
MK_NVCCFLAGS  += -O3
```
to
```
MK_NVCCFLAGS += -maxrregcount=80
```
The original -O3 will cause nvcc to report an error of "nvcc fatal : redefinition of argument 'optimize'."

The -maxrregcount=80 is a workaround for the error too many resources for launch during the inference. I'm not a CUDA expert, the number 80 is from this link.
make LLAMA_CUBLAS=1 CUDA_DOCKER_ARCH=sm_62 -j 6
When running the llama.cpp, I still need -ngl 33 ( using llama2-7b) to exsiply offload all layers to Jetson TX2 GPU.

./main -m llama-2-7b.Q4_0.gguf -ngl 33 -c 256 -b 512 -n 128 --keep 48

llama_print_timings: load time = 15632.56 ms
llama_print_timings: sample time = 3.56 ms / 24 runs ( 0.15 ms per token, 6735.90 tokens per second)
llama_print_timings: prompt eval time = 13273.26 ms / 145 tokens ( 91.54 ms per token, 10.92 tokens per second)
llama_print_timings: eval time = 5457.13 ms / 23 runs ( 237.27 ms per token, 4.21 tokens per second)
llama_print_timings: total time = 32417.66 ms / 168 tokens

whoreson · 2024-03-02T13:02:56Z

Hmm, it does compile with CUDA 10.2 (but not with CUDA 10.1 which I previously used). I didn't even bother compiling a proper gcc, just disabled the version check in /cuda-toolkit/targets/x86_64-linux/include/crt/host_config.h

Then first compiled ggml-cuda.cu by hand like so:

~/cuda-10.2/cuda-toolkit/bin/nvcc --compiler-options="" -use_fast_math -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 --use_fast_math --compiler-options="-I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wmissing-declarations -Wno-unused-function -Wno-multichar -Wno-format-truncation -Wno-array-bounds -pthread    -Wno-pedantic -march=native -mtune=native " -c ggml-cuda.cu -o ggml-cuda.o

And continued with make LLAMA_CUBLAS=1 as usual.

whoreson · 2024-03-21T19:23:02Z

2bf8d0f broke it on CUDA 10.2
@slaren @JohannesGaessler

slaren · 2024-03-21T19:34:03Z

@whoreson this is getting a bit tiresome. Are you going to ask people to harass me over this again? Let's be clear: I have no interest in supporting ancient versions of CUDA. If this is important for you, you are welcome to fix it yourself and open a PR.

JohannesGaessler · 2024-03-21T20:02:02Z

I have no intention to support CUDA 10. As slaren said, if you want it supported you are free to put in the effort yourself and I will then happily review your PRs.

github-actions · 2024-05-07T01:06:46Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

zhaofeng-shu33 · 2025-03-13T16:16:39Z

This version 81bc can be compiled successfully in win10 + VS2019 + Nvidia Toolkit 10.2 with the help of the following patch for ggml-cuda.cu. It dates back to 2023/12/7.

#include <cuda_fp16.h>
+#define CUBLAS_TF32_TENSOR_OP_MATH CUBLAS_TENSOR_OP_MATH
+#define CUBLAS_COMPUTE_16F CUDA_R_16F
+#define CUBLAS_COMPUTE_32F CUDA_R_32F

-DLLAMA_CUBLAS=ON option is needed to give.

anuragdogra2192 · 2025-03-24T10:33:29Z

I made it work on Ubuntu 18.04, Jetson Nano with CUDA 10.2. with gcc 8.5
This version 81bc,
by defining the below before including <cuda_runtime.h> in ggml-cuda.cu

#if CUDA_VERSION < 11000
#define CUBLAS_TF32_TENSOR_OP_MATH CUBLAS_TENSOR_OP_MATH
#define CUBLAS_COMPUTE_16F CUDA_R_16F
#define CUBLAS_COMPUTE_32F CUDA_R_32F
#endif

rvandernoort mentioned this issue Nov 21, 2023

Compilation error on Nvidia Jetson Nano #4099

Closed

4 tasks

ggml-org deleted a comment from whoreson Dec 21, 2023

lndshrk504 mentioned this issue Jan 6, 2024

Can't compile "llama.cpp/ggml-quants.c" #3880

Closed

4 tasks

This comment was marked as off-topic.

Sign in to view

github-actions bot added the stale label Apr 22, 2024

github-actions bot closed this as completed May 7, 2024

sharpden mentioned this issue Feb 1, 2025

server: Windows 7 compatibility #8208

Open

4 tasks

does not compile on CUDA 10 anymore #4123

does not compile on CUDA 10 anymore #4123

Comments

whoreson commented Nov 18, 2023

whoreson commented Nov 18, 2023

Uh oh!

WeirdConstructor commented Nov 18, 2023

Uh oh!

Ph0rk0z commented Nov 18, 2023

Uh oh!

rvandernoort commented Nov 27, 2023

Uh oh!

chenxuuu commented Dec 13, 2023

Uh oh!

FantasyGmm commented Dec 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rvandernoort commented Dec 22, 2023

Uh oh!

whoreson commented Dec 22, 2023

Uh oh!

whoreson commented Dec 22, 2023

Uh oh!

he-man86 commented Jan 4, 2024

Uh oh!

FantasyGmm commented Jan 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

he-man86 commented Jan 4, 2024

Uh oh!

FantasyGmm commented Jan 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rvandernoort commented Jan 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FantasyGmm commented Jan 12, 2024

Uh oh!

oiwn commented Feb 26, 2024

Uh oh!

otaGran commented Feb 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

whoreson commented Mar 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

whoreson commented Mar 21, 2024

Uh oh!

slaren commented Mar 21, 2024

Uh oh!

JohannesGaessler commented Mar 21, 2024

Uh oh!

This comment was marked as off-topic.

This comment was marked as off-topic.

github-actions bot commented May 7, 2024

Uh oh!

zhaofeng-shu33 commented Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anuragdogra2192 commented Mar 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FantasyGmm commented Dec 21, 2023 •

edited

Loading

FantasyGmm commented Jan 4, 2024 •

edited

Loading

FantasyGmm commented Jan 5, 2024 •

edited

Loading

rvandernoort commented Jan 11, 2024 •

edited

Loading

otaGran commented Feb 26, 2024 •

edited

Loading

whoreson commented Mar 2, 2024 •

edited

Loading

zhaofeng-shu33 commented Mar 13, 2025 •

edited

Loading

anuragdogra2192 commented Mar 24, 2025 •

edited

Loading