Compilation error on Nvidia Jetson Nano

# Prerequisites

Please answer the following questions for yourself before submitting an issue.

- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [x] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md).
- [x] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed). 
- [x] I reviewed the [Discussions](https://github.com/ggerganov/llama.cpp/discussions), and have a new bug or useful enhancement to share.

# Expected Behavior

Please provide a detailed written description of what you were trying to do, and what you expected `llama.cpp` to do.

Hi! I'm trying to compile llamacpp on an Nvidia Jetson Nano 2GB with CuBLAS, because I want to use the cuda cores, but I'm facing some issues with compilation. 

# Current Behavior

Please provide a detailed written description of what `llama.cpp` did, instead.

Both make and cmake compilation methods results in various errors.

# Environment and Context

Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.

* Physical (or virtual) hardware you are using, e.g. for Linux:

`$ lscpu`

```
Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  1
Core(s) per socket:  4
Socket(s):           1
Vendor ID:           ARM
Model:               1
Model name:          Cortex-A57
Stepping:            r1p1
CPU max MHz:         1479,0000
CPU min MHz:         102,0000
BogoMIPS:            38.40
L1d cache:           32K
L1i cache:           48K
L2 cache:            2048K
Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32
```

* Operating System, e.g. for Linux:

`$ uname -a`

Linux rover-NVIDIA-JETSON 4.9.337-tegra #1 SMP PREEMPT Thu Jun 8 21:19:14 PDT 2023 aarch64 aarch64 aarch64 GNU/Linux

* SDK version, e.g. for Linux:

```
$ python3 --version
Python 3.6.9
$ make --version
GNU Make 4.1
$ g++ --version
g++ (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
$ cmake --version
cmake version 3.28.0-rc4
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_28_22:34:44_PST_2021
Cuda compilation tools, release 10.2, V10.2.300
Build cuda_10.2_r440.TC440_70.29663091_0
```

# Failure Information (for bugs)

Please help provide information about the failure / bug.

# Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

1. make
2. make LLAMA_CUBLAS=1
3. make LLAMA_CUBLAS=1 with -arch=compute_53
4. make LLAMA_CUBLAS=1 with -arch=compute_53 and removed -mcpu
5. cmake .. + cmake --build . --config Release (failure)
6. cmake .. -DLLAMA_CUBLAS=ON + cmake --build . --config Release
7. 

# Failure Logs

```
$ make
I llama.cpp build info: 
I UNAME_S:   Linux
I UNAME_P:   aarch64
I UNAME_M:   aarch64
I CFLAGS:    -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG  -std=c11   -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -pthread -mcpu=native 
I CXXFLAGS:  -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mcpu=native  -Wno-array-bounds -Wno-format-truncation 
I NVCCFLAGS:  -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mcpu=native    -Wno-pedantic -Xcompiler "-Wno-array-bounds -Wno-format-truncation "
I LDFLAGS:    
I CC:        cc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
I CXX:       g++ (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0

cc -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG  -std=c11   -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -pthread -mcpu=native     -c ggml-quants.c -o ggml-quants.o
ggml-quants.c: In function ‘ggml_vec_dot_q2_K_q8_K’:
ggml-quants.c:403:27: error: implicit declaration of function ‘vld1q_s16_x2’; did you mean ‘vld1q_s16’? [-Werror=implicit-function-declaration]
 #define ggml_vld1q_s16_x2 vld1q_s16_x2
                           ^
ggml-quants.c:3679:41: note: in expansion of macro ‘ggml_vld1q_s16_x2’
         const ggml_int16x8x2_t q8sums = ggml_vld1q_s16_x2(y[i].bsums);
                                         ^~~~~~~~~~~~~~~~~
ggml-quants.c:403:27: error: invalid initializer
 #define ggml_vld1q_s16_x2 vld1q_s16_x2
                           ^
ggml-quants.c:3679:41: note: in expansion of macro ‘ggml_vld1q_s16_x2’
         const ggml_int16x8x2_t q8sums = ggml_vld1q_s16_x2(y[i].bsums);
                                         ^~~~~~~~~~~~~~~~~
ggml-quants.c:3680:41: warning: missing braces around initializer [-Wmissing-braces]
         const ggml_int16x8x2_t mins16 = {vreinterpretq_s16_u16(vmovl_u8(vget_low_u8(mins))), vreinterpretq_s16_u16(vmovl_u8(vget_high_u8(mins)))};
                                         ^
                                          {                                                                                                      }
ggml-quants.c:404:27: error: implicit declaration of function ‘vld1q_u8_x2’; did you mean ‘vld1q_u32’? [-Werror=implicit-function-declaration]
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
ggml-quants.c:3716:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
             const ggml_uint8x16x2_t q2bits = ggml_vld1q_u8_x2(q2); q2 += 32;
                                              ^~~~~~~~~~~~~~~~
ggml-quants.c:404:27: error: invalid initializer
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
ggml-quants.c:3716:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
             const ggml_uint8x16x2_t q2bits = ggml_vld1q_u8_x2(q2); q2 += 32;
                                              ^~~~~~~~~~~~~~~~
ggml-quants.c:406:27: error: implicit declaration of function ‘vld1q_s8_x2’; did you mean ‘vld1q_s32’? [-Werror=implicit-function-declaration]
 #define ggml_vld1q_s8_x2  vld1q_s8_x2
                           ^
ggml-quants.c:3718:40: note: in expansion of macro ‘ggml_vld1q_s8_x2’
             ggml_int8x16x2_t q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;
                                        ^~~~~~~~~~~~~~~~
ggml-quants.c:406:27: error: invalid initializer
 #define ggml_vld1q_s8_x2  vld1q_s8_x2
                           ^
ggml-quants.c:3718:40: note: in expansion of macro ‘ggml_vld1q_s8_x2’
             ggml_int8x16x2_t q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;
                                        ^~~~~~~~~~~~~~~~
ggml-quants.c:3708:17: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
         q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;\
                 ^
ggml-quants.c:3723:13: note: in expansion of macro ‘SHIFT_MULTIPLY_ACCUM_WITH_SCALE’
             SHIFT_MULTIPLY_ACCUM_WITH_SCALE(2, 2);
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml-quants.c:3708:17: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
         q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;\
                 ^
ggml-quants.c:3725:13: note: in expansion of macro ‘SHIFT_MULTIPLY_ACCUM_WITH_SCALE’
             SHIFT_MULTIPLY_ACCUM_WITH_SCALE(4, 4);
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml-quants.c:3708:17: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
         q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;\
                 ^
ggml-quants.c:3727:13: note: in expansion of macro ‘SHIFT_MULTIPLY_ACCUM_WITH_SCALE’
             SHIFT_MULTIPLY_ACCUM_WITH_SCALE(6, 6);
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml-quants.c: In function ‘ggml_vec_dot_q3_K_q8_K’:
ggml-quants.c:404:27: error: invalid initializer
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
ggml-quants.c:4353:36: note: in expansion of macro ‘ggml_vld1q_u8_x2’
         ggml_uint8x16x2_t qhbits = ggml_vld1q_u8_x2(qh);
                                    ^~~~~~~~~~~~~~~~
ggml-quants.c:404:27: error: invalid initializer
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
ggml-quants.c:4371:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
             const ggml_uint8x16x2_t q3bits = ggml_vld1q_u8_x2(q3); q3 += 32;
                                              ^~~~~~~~~~~~~~~~
ggml-quants.c:407:27: error: implicit declaration of function ‘vld1q_s8_x4’; did you mean ‘vld1q_s64’? [-Werror=implicit-function-declaration]
 #define ggml_vld1q_s8_x4  vld1q_s8_x4
                           ^
ggml-quants.c:4372:48: note: in expansion of macro ‘ggml_vld1q_s8_x4’
             const ggml_int8x16x4_t q8bytes_1 = ggml_vld1q_s8_x4(q8); q8 += 64;
                                                ^~~~~~~~~~~~~~~~
ggml-quants.c:407:27: error: invalid initializer
 #define ggml_vld1q_s8_x4  vld1q_s8_x4
                           ^
ggml-quants.c:4372:48: note: in expansion of macro ‘ggml_vld1q_s8_x4’
             const ggml_int8x16x4_t q8bytes_1 = ggml_vld1q_s8_x4(q8); q8 += 64;
                                                ^~~~~~~~~~~~~~~~
ggml-quants.c:407:27: error: invalid initializer
 #define ggml_vld1q_s8_x4  vld1q_s8_x4
                           ^
ggml-quants.c:4373:48: note: in expansion of macro ‘ggml_vld1q_s8_x4’
             const ggml_int8x16x4_t q8bytes_2 = ggml_vld1q_s8_x4(q8); q8 += 64;
                                                ^~~~~~~~~~~~~~~~
ggml-quants.c: In function ‘ggml_vec_dot_q4_K_q8_K’:
ggml-quants.c:404:27: error: invalid initializer
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
ggml-quants.c:5273:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
             const ggml_uint8x16x2_t q4bits = ggml_vld1q_u8_x2(q4); q4 += 32;
                                              ^~~~~~~~~~~~~~~~
ggml-quants.c:5291:21: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
             q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;
                     ^
ggml-quants.c:5300:21: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
             q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;
                     ^
ggml-quants.c: In function ‘ggml_vec_dot_q5_K_q8_K’:
ggml-quants.c:404:27: error: invalid initializer
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
ggml-quants.c:5918:36: note: in expansion of macro ‘ggml_vld1q_u8_x2’
         ggml_uint8x16x2_t qhbits = ggml_vld1q_u8_x2(qh);
                                    ^~~~~~~~~~~~~~~~
ggml-quants.c:404:27: error: invalid initializer
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
ggml-quants.c:5926:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
             const ggml_uint8x16x2_t q5bits = ggml_vld1q_u8_x2(q5); q5 += 32;
                                              ^~~~~~~~~~~~~~~~
ggml-quants.c:407:27: error: invalid initializer
 #define ggml_vld1q_s8_x4  vld1q_s8_x4
                           ^
ggml-quants.c:5927:46: note: in expansion of macro ‘ggml_vld1q_s8_x4’
             const ggml_int8x16x4_t q8bytes = ggml_vld1q_s8_x4(q8); q8 += 64;
                                              ^~~~~~~~~~~~~~~~
ggml-quants.c: In function ‘ggml_vec_dot_q6_K_q8_K’:
ggml-quants.c:403:27: error: invalid initializer
 #define ggml_vld1q_s16_x2 vld1q_s16_x2
                           ^
ggml-quants.c:6627:41: note: in expansion of macro ‘ggml_vld1q_s16_x2’
         const ggml_int16x8x2_t q8sums = ggml_vld1q_s16_x2(y[i].bsums);
                                         ^~~~~~~~~~~~~~~~~
ggml-quants.c:6629:43: warning: missing braces around initializer [-Wmissing-braces]
         const ggml_int16x8x2_t q6scales = {vmovl_s8(vget_low_s8(scales)), vmovl_s8(vget_high_s8(scales))};
                                           ^
                                            {                                                            }
ggml-quants.c:404:27: error: invalid initializer
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
ggml-quants.c:6641:40: note: in expansion of macro ‘ggml_vld1q_u8_x2’
             ggml_uint8x16x2_t qhbits = ggml_vld1q_u8_x2(qh); qh += 32;
                                        ^~~~~~~~~~~~~~~~
ggml-quants.c:405:27: error: implicit declaration of function ‘vld1q_u8_x4’; did you mean ‘vld1q_u64’? [-Werror=implicit-function-declaration]
 #define ggml_vld1q_u8_x4  vld1q_u8_x4
                           ^
ggml-quants.c:6642:40: note: in expansion of macro ‘ggml_vld1q_u8_x4’
             ggml_uint8x16x4_t q6bits = ggml_vld1q_u8_x4(q6); q6 += 64;
                                        ^~~~~~~~~~~~~~~~
ggml-quants.c:405:27: error: invalid initializer
 #define ggml_vld1q_u8_x4  vld1q_u8_x4
                           ^
ggml-quants.c:6642:40: note: in expansion of macro ‘ggml_vld1q_u8_x4’
             ggml_uint8x16x4_t q6bits = ggml_vld1q_u8_x4(q6); q6 += 64;
                                        ^~~~~~~~~~~~~~~~
ggml-quants.c:407:27: error: invalid initializer
 #define ggml_vld1q_s8_x4  vld1q_s8_x4
                           ^
ggml-quants.c:6643:40: note: in expansion of macro ‘ggml_vld1q_s8_x4’
             ggml_int8x16x4_t q8bytes = ggml_vld1q_s8_x4(q8); q8 += 64;
                                        ^~~~~~~~~~~~~~~~
ggml-quants.c:6686:21: error: incompatible types when assigning to type ‘int8x16x4_t {aka struct int8x16x4_t}’ from type ‘int’
             q8bytes = ggml_vld1q_s8_x4(q8); q8 += 64;
                     ^
cc1: some warnings being treated as errors
Makefile:542: recipe for target 'ggml-quants.o' failed
make: *** [ggml-quants.o] Error 1
```

```
$ make LLAMA_CUBLAS=1
I llama.cpp build info: 
I UNAME_S:   Linux
I UNAME_P:   aarch64
I UNAME_M:   aarch64
I CFLAGS:    -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include  -std=c11   -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -pthread -mcpu=native 
I CXXFLAGS:  -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mcpu=native  -Wno-array-bounds -Wno-format-truncation 
I NVCCFLAGS: --forward-unknown-to-host-compiler -use_fast_math -arch=native -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mcpu=native    -Wno-pedantic -Xcompiler "-Wno-array-bounds -Wno-format-truncation "
I LDFLAGS:   -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib 
I CC:        cc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
I CXX:       g++ (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0

nvcc --forward-unknown-to-host-compiler -use_fast_math -arch=native -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mcpu=native    -Wno-pedantic -Xcompiler "-Wno-array-bounds -Wno-format-truncation " -c ggml-cuda.cu -o ggml-cuda.o
nvcc fatal   : Value 'native' is not defined for option 'gpu-architecture'
Makefile:439: recipe for target 'ggml-cuda.o' failed
make: *** [ggml-cuda.o] Error 1
```

Setting the -arch to `compute_53` in the Makefile
```
make LLAMA_CUBLAS=1
I llama.cpp build info: 
I UNAME_S:   Linux
I UNAME_P:   aarch64
I UNAME_M:   aarch64
I CFLAGS:    -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include  -std=c11   -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -pthread -mcpu=native 
I CXXFLAGS:  -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mcpu=native  -Wno-array-bounds -Wno-format-truncation 
I NVCCFLAGS: --forward-unknown-to-host-compiler -use_fast_math -arch=compute_53 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mcpu=native    -Wno-pedantic -Xcompiler "-Wno-array-bounds -Wno-format-truncation "
I LDFLAGS:   -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib 
I CC:        cc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
I CXX:       g++ (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0

nvcc --forward-unknown-to-host-compiler -use_fast_math -arch=compute_53 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -mcpu=native    -Wno-pedantic -Xcompiler "-Wno-array-bounds -Wno-format-truncation " -c ggml-cuda.cu -o ggml-cuda.o
nvcc fatal   : 'cpu=native': expected a number
Makefile:439: recipe for target 'ggml-cuda.o' failed
make: *** [ggml-cuda.o] Error 1
```

I could not find what number I had to insert here for the cpu variable, and removing it also fails. 
```
make LLAMA_CUBLAS=1
I llama.cpp build info: 
I UNAME_S:   Linux
I UNAME_P:   aarch64
I UNAME_M:   aarch64
I CFLAGS:    -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include  -std=c11   -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -pthread 
I CXXFLAGS:  -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread  -Wno-array-bounds -Wno-format-truncation 
I NVCCFLAGS: --forward-unknown-to-host-compiler -use_fast_math -arch=compute_53 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread    -Wno-pedantic -Xcompiler "-Wno-array-bounds -Wno-format-truncation "
I LDFLAGS:   -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib 
I CC:        cc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
I CXX:       g++ (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0

cc  -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include  -std=c11   -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -pthread    -c ggml.c -o ggml.o
g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread  -Wno-array-bounds -Wno-format-truncation  -c llama.cpp -o llama.o
g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread  -Wno-array-bounds -Wno-format-truncation  -c common/common.cpp -o common.o
g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread  -Wno-array-bounds -Wno-format-truncation  -c common/sampling.cpp -o sampling.o
g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread  -Wno-array-bounds -Wno-format-truncation  -c common/grammar-parser.cpp -o grammar-parser.o
g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread  -Wno-array-bounds -Wno-format-truncation  -c common/build-info.cpp -o build-info.o
g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread  -Wno-array-bounds -Wno-format-truncation  -c common/console.cpp -o console.o
nvcc --forward-unknown-to-host-compiler -use_fast_math -arch=compute_53 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread    -Wno-pedantic -Xcompiler "-Wno-array-bounds -Wno-format-truncation " -c ggml-cuda.cu -o ggml-cuda.o
ggml-cuda.cu(5933): error: identifier "CUBLAS_TF32_TENSOR_OP_MATH" is undefined

ggml-cuda.cu(6578): error: identifier "CUBLAS_COMPUTE_16F" is undefined

ggml-cuda.cu(7514): error: identifier "CUBLAS_COMPUTE_16F" is undefined

ggml-cuda.cu(7548): error: identifier "CUBLAS_COMPUTE_16F" is undefined

4 errors detected in the compilation of "/tmp/tmpxft_00001bc3_00000000-6_ggml-cuda.cpp1.ii".
Makefile:439: recipe for target 'ggml-cuda.o' failed
make: *** [ggml-cuda.o] Error 1
```

```
$ cmake ..
-- cuBLAS found
-- Using CUDA architectures: 53
GNU ld (GNU Binutils for Ubuntu) 2.30
-- CMAKE_SYSTEM_PROCESSOR: aarch64
-- ARM detected
-- Configuring done (0.4s)
-- Generating done (0.9s)
-- Build files have been written to: /home/rover/llama.cpp/build
cmake --build . --config Release
[  1%] Building C object CMakeFiles/ggml.dir/ggml-quants.c.o
/home/rover/llama.cpp/ggml-quants.c: In function ‘ggml_vec_dot_q2_K_q8_K’:
/home/rover/llama.cpp/ggml-quants.c:403:27: error: implicit declaration of function ‘vld1q_s16_x2’; did you mean ‘vld1q_s16’? [-Werror=implicit-function-declaration]
 #define ggml_vld1q_s16_x2 vld1q_s16_x2
                           ^
/home/rover/llama.cpp/ggml-quants.c:3679:41: note: in expansion of macro ‘ggml_vld1q_s16_x2’
         const ggml_int16x8x2_t q8sums = ggml_vld1q_s16_x2(y[i].bsums);
                                         ^~~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:403:27: error: invalid initializer
 #define ggml_vld1q_s16_x2 vld1q_s16_x2
                           ^
/home/rover/llama.cpp/ggml-quants.c:3679:41: note: in expansion of macro ‘ggml_vld1q_s16_x2’
         const ggml_int16x8x2_t q8sums = ggml_vld1q_s16_x2(y[i].bsums);
                                         ^~~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:3680:41: warning: missing braces around initializer [-Wmissing-braces]
         const ggml_int16x8x2_t mins16 = {vreinterpretq_s16_u16(vmovl_u8(vget_low_u8(mins))), vreinterpretq_s16_u16(vmovl_u8(vget_high_u8(mins)))};
                                         ^
                                          {                                                                                                      }
/home/rover/llama.cpp/ggml-quants.c:404:27: error: implicit declaration of function ‘vld1q_u8_x2’; did you mean ‘vld1q_u32’? [-Werror=implicit-function-declaration]
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
/home/rover/llama.cpp/ggml-quants.c:3716:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
             const ggml_uint8x16x2_t q2bits = ggml_vld1q_u8_x2(q2); q2 += 32;
                                              ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:404:27: error: invalid initializer
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
/home/rover/llama.cpp/ggml-quants.c:3716:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
             const ggml_uint8x16x2_t q2bits = ggml_vld1q_u8_x2(q2); q2 += 32;
                                              ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:406:27: error: implicit declaration of function ‘vld1q_s8_x2’; did you mean ‘vld1q_s32’? [-Werror=implicit-function-declaration]
 #define ggml_vld1q_s8_x2  vld1q_s8_x2
                           ^
/home/rover/llama.cpp/ggml-quants.c:3718:40: note: in expansion of macro ‘ggml_vld1q_s8_x2’
             ggml_int8x16x2_t q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;
                                        ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:406:27: error: invalid initializer
 #define ggml_vld1q_s8_x2  vld1q_s8_x2
                           ^
/home/rover/llama.cpp/ggml-quants.c:3718:40: note: in expansion of macro ‘ggml_vld1q_s8_x2’
             ggml_int8x16x2_t q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;
                                        ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:3708:17: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
         q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;\
                 ^
/home/rover/llama.cpp/ggml-quants.c:3723:13: note: in expansion of macro ‘SHIFT_MULTIPLY_ACCUM_WITH_SCALE’
             SHIFT_MULTIPLY_ACCUM_WITH_SCALE(2, 2);
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:3708:17: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
         q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;\
                 ^
/home/rover/llama.cpp/ggml-quants.c:3725:13: note: in expansion of macro ‘SHIFT_MULTIPLY_ACCUM_WITH_SCALE’
             SHIFT_MULTIPLY_ACCUM_WITH_SCALE(4, 4);
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:3708:17: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
         q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;\
                 ^
/home/rover/llama.cpp/ggml-quants.c:3727:13: note: in expansion of macro ‘SHIFT_MULTIPLY_ACCUM_WITH_SCALE’
             SHIFT_MULTIPLY_ACCUM_WITH_SCALE(6, 6);
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c: In function ‘ggml_vec_dot_q3_K_q8_K’:
/home/rover/llama.cpp/ggml-quants.c:404:27: error: invalid initializer
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
/home/rover/llama.cpp/ggml-quants.c:4353:36: note: in expansion of macro ‘ggml_vld1q_u8_x2’
         ggml_uint8x16x2_t qhbits = ggml_vld1q_u8_x2(qh);
                                    ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:404:27: error: invalid initializer
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
/home/rover/llama.cpp/ggml-quants.c:4371:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
             const ggml_uint8x16x2_t q3bits = ggml_vld1q_u8_x2(q3); q3 += 32;
                                              ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:407:27: error: implicit declaration of function ‘vld1q_s8_x4’; did you mean ‘vld1q_s64’? [-Werror=implicit-function-declaration]
 #define ggml_vld1q_s8_x4  vld1q_s8_x4
                           ^
/home/rover/llama.cpp/ggml-quants.c:4372:48: note: in expansion of macro ‘ggml_vld1q_s8_x4’
             const ggml_int8x16x4_t q8bytes_1 = ggml_vld1q_s8_x4(q8); q8 += 64;
                                                ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:407:27: error: invalid initializer
 #define ggml_vld1q_s8_x4  vld1q_s8_x4
                           ^
/home/rover/llama.cpp/ggml-quants.c:4372:48: note: in expansion of macro ‘ggml_vld1q_s8_x4’
             const ggml_int8x16x4_t q8bytes_1 = ggml_vld1q_s8_x4(q8); q8 += 64;
                                                ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:407:27: error: invalid initializer
 #define ggml_vld1q_s8_x4  vld1q_s8_x4
                           ^
/home/rover/llama.cpp/ggml-quants.c:4373:48: note: in expansion of macro ‘ggml_vld1q_s8_x4’
             const ggml_int8x16x4_t q8bytes_2 = ggml_vld1q_s8_x4(q8); q8 += 64;
                                                ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c: In function ‘ggml_vec_dot_q4_K_q8_K’:
/home/rover/llama.cpp/ggml-quants.c:404:27: error: invalid initializer
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
/home/rover/llama.cpp/ggml-quants.c:5273:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
             const ggml_uint8x16x2_t q4bits = ggml_vld1q_u8_x2(q4); q4 += 32;
                                              ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:5291:21: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
             q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;
                     ^
/home/rover/llama.cpp/ggml-quants.c:5300:21: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
             q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;
                     ^
/home/rover/llama.cpp/ggml-quants.c: In function ‘ggml_vec_dot_q5_K_q8_K’:
/home/rover/llama.cpp/ggml-quants.c:404:27: error: invalid initializer
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
/home/rover/llama.cpp/ggml-quants.c:5918:36: note: in expansion of macro ‘ggml_vld1q_u8_x2’
         ggml_uint8x16x2_t qhbits = ggml_vld1q_u8_x2(qh);
                                    ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:404:27: error: invalid initializer
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
/home/rover/llama.cpp/ggml-quants.c:5926:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
             const ggml_uint8x16x2_t q5bits = ggml_vld1q_u8_x2(q5); q5 += 32;
                                              ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:407:27: error: invalid initializer
 #define ggml_vld1q_s8_x4  vld1q_s8_x4
                           ^
/home/rover/llama.cpp/ggml-quants.c:5927:46: note: in expansion of macro ‘ggml_vld1q_s8_x4’
             const ggml_int8x16x4_t q8bytes = ggml_vld1q_s8_x4(q8); q8 += 64;
                                              ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c: In function ‘ggml_vec_dot_q6_K_q8_K’:
/home/rover/llama.cpp/ggml-quants.c:403:27: error: invalid initializer
 #define ggml_vld1q_s16_x2 vld1q_s16_x2
                           ^
/home/rover/llama.cpp/ggml-quants.c:6627:41: note: in expansion of macro ‘ggml_vld1q_s16_x2’
         const ggml_int16x8x2_t q8sums = ggml_vld1q_s16_x2(y[i].bsums);
                                         ^~~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:6629:43: warning: missing braces around initializer [-Wmissing-braces]
         const ggml_int16x8x2_t q6scales = {vmovl_s8(vget_low_s8(scales)), vmovl_s8(vget_high_s8(scales))};
                                           ^
                                            {                                                            }
/home/rover/llama.cpp/ggml-quants.c:404:27: error: invalid initializer
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
/home/rover/llama.cpp/ggml-quants.c:6641:40: note: in expansion of macro ‘ggml_vld1q_u8_x2’
             ggml_uint8x16x2_t qhbits = ggml_vld1q_u8_x2(qh); qh += 32;
                                        ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:405:27: error: implicit declaration of function ‘vld1q_u8_x4’; did you mean ‘vld1q_u64’? [-Werror=implicit-function-declaration]
 #define ggml_vld1q_u8_x4  vld1q_u8_x4
                           ^
/home/rover/llama.cpp/ggml-quants.c:6642:40: note: in expansion of macro ‘ggml_vld1q_u8_x4’
             ggml_uint8x16x4_t q6bits = ggml_vld1q_u8_x4(q6); q6 += 64;
                                        ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:405:27: error: invalid initializer
 #define ggml_vld1q_u8_x4  vld1q_u8_x4
                           ^
/home/rover/llama.cpp/ggml-quants.c:6642:40: note: in expansion of macro ‘ggml_vld1q_u8_x4’
             ggml_uint8x16x4_t q6bits = ggml_vld1q_u8_x4(q6); q6 += 64;
                                        ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:407:27: error: invalid initializer
 #define ggml_vld1q_s8_x4  vld1q_s8_x4
                           ^
/home/rover/llama.cpp/ggml-quants.c:6643:40: note: in expansion of macro ‘ggml_vld1q_s8_x4’
             ggml_int8x16x4_t q8bytes = ggml_vld1q_s8_x4(q8); q8 += 64;
                                        ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:6686:21: error: incompatible types when assigning to type ‘int8x16x4_t {aka struct int8x16x4_t}’ from type ‘int’
             q8bytes = ggml_vld1q_s8_x4(q8); q8 += 64;
                     ^
cc1: some warnings being treated as errors
CMakeFiles/ggml.dir/build.make:117: recipe for target 'CMakeFiles/ggml.dir/ggml-quants.c.o' failed
make[2]: *** [CMakeFiles/ggml.dir/ggml-quants.c.o] Error 1
CMakeFiles/Makefile2:623: recipe for target 'CMakeFiles/ggml.dir/all' failed
make[1]: *** [CMakeFiles/ggml.dir/all] Error 2
Makefile:145: recipe for target 'all' failed
make: *** [all] Error 2
```
```
$ cmake .. -DLLAMA_CUBLAS=ON
-- cuBLAS found
-- Using CUDA architectures: 53
GNU ld (GNU Binutils for Ubuntu) 2.30
-- CMAKE_SYSTEM_PROCESSOR: aarch64
-- ARM detected
-- Configuring done (0.3s)
-- Generating done (0.3s)
-- Build files have been written to: /home/rover/llama.cpp/build

$ cmake --build . --config Release
[  1%] Building C object CMakeFiles/ggml.dir/ggml-quants.c.o
/home/rover/llama.cpp/ggml-quants.c: In function ‘ggml_vec_dot_q2_K_q8_K’:
/home/rover/llama.cpp/ggml-quants.c:403:27: error: implicit declaration of function ‘vld1q_s16_x2’; did you mean ‘vld1q_s16’? [-Werror=implicit-function-declaration]
 #define ggml_vld1q_s16_x2 vld1q_s16_x2
                           ^
/home/rover/llama.cpp/ggml-quants.c:3679:41: note: in expansion of macro ‘ggml_vld1q_s16_x2’
         const ggml_int16x8x2_t q8sums = ggml_vld1q_s16_x2(y[i].bsums);
                                         ^~~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:403:27: error: invalid initializer
 #define ggml_vld1q_s16_x2 vld1q_s16_x2
                           ^
/home/rover/llama.cpp/ggml-quants.c:3679:41: note: in expansion of macro ‘ggml_vld1q_s16_x2’
         const ggml_int16x8x2_t q8sums = ggml_vld1q_s16_x2(y[i].bsums);
                                         ^~~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:3680:41: warning: missing braces around initializer [-Wmissing-braces]
         const ggml_int16x8x2_t mins16 = {vreinterpretq_s16_u16(vmovl_u8(vget_low_u8(mins))), vreinterpretq_s16_u16(vmovl_u8(vget_high_u8(mins)))};
                                         ^
                                          {                                                                                                      }
/home/rover/llama.cpp/ggml-quants.c:404:27: error: implicit declaration of function ‘vld1q_u8_x2’; did you mean ‘vld1q_u32’? [-Werror=implicit-function-declaration]
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
/home/rover/llama.cpp/ggml-quants.c:3716:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
             const ggml_uint8x16x2_t q2bits = ggml_vld1q_u8_x2(q2); q2 += 32;
                                              ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:404:27: error: invalid initializer
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
/home/rover/llama.cpp/ggml-quants.c:3716:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
             const ggml_uint8x16x2_t q2bits = ggml_vld1q_u8_x2(q2); q2 += 32;
                                              ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:406:27: error: implicit declaration of function ‘vld1q_s8_x2’; did you mean ‘vld1q_s32’? [-Werror=implicit-function-declaration]
 #define ggml_vld1q_s8_x2  vld1q_s8_x2
                           ^
/home/rover/llama.cpp/ggml-quants.c:3718:40: note: in expansion of macro ‘ggml_vld1q_s8_x2’
             ggml_int8x16x2_t q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;
                                        ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:406:27: error: invalid initializer
 #define ggml_vld1q_s8_x2  vld1q_s8_x2
                           ^
/home/rover/llama.cpp/ggml-quants.c:3718:40: note: in expansion of macro ‘ggml_vld1q_s8_x2’
             ggml_int8x16x2_t q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;
                                        ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:3708:17: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
         q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;\
                 ^
/home/rover/llama.cpp/ggml-quants.c:3723:13: note: in expansion of macro ‘SHIFT_MULTIPLY_ACCUM_WITH_SCALE’
             SHIFT_MULTIPLY_ACCUM_WITH_SCALE(2, 2);
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:3708:17: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
         q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;\
                 ^
/home/rover/llama.cpp/ggml-quants.c:3725:13: note: in expansion of macro ‘SHIFT_MULTIPLY_ACCUM_WITH_SCALE’
             SHIFT_MULTIPLY_ACCUM_WITH_SCALE(4, 4);
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:3708:17: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
         q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;\
                 ^
/home/rover/llama.cpp/ggml-quants.c:3727:13: note: in expansion of macro ‘SHIFT_MULTIPLY_ACCUM_WITH_SCALE’
             SHIFT_MULTIPLY_ACCUM_WITH_SCALE(6, 6);
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c: In function ‘ggml_vec_dot_q3_K_q8_K’:
/home/rover/llama.cpp/ggml-quants.c:404:27: error: invalid initializer
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
/home/rover/llama.cpp/ggml-quants.c:4353:36: note: in expansion of macro ‘ggml_vld1q_u8_x2’
         ggml_uint8x16x2_t qhbits = ggml_vld1q_u8_x2(qh);
                                    ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:404:27: error: invalid initializer
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
/home/rover/llama.cpp/ggml-quants.c:4371:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
             const ggml_uint8x16x2_t q3bits = ggml_vld1q_u8_x2(q3); q3 += 32;
                                              ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:407:27: error: implicit declaration of function ‘vld1q_s8_x4’; did you mean ‘vld1q_s64’? [-Werror=implicit-function-declaration]
 #define ggml_vld1q_s8_x4  vld1q_s8_x4
                           ^
/home/rover/llama.cpp/ggml-quants.c:4372:48: note: in expansion of macro ‘ggml_vld1q_s8_x4’
             const ggml_int8x16x4_t q8bytes_1 = ggml_vld1q_s8_x4(q8); q8 += 64;
                                                ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:407:27: error: invalid initializer
 #define ggml_vld1q_s8_x4  vld1q_s8_x4
                           ^
/home/rover/llama.cpp/ggml-quants.c:4372:48: note: in expansion of macro ‘ggml_vld1q_s8_x4’
             const ggml_int8x16x4_t q8bytes_1 = ggml_vld1q_s8_x4(q8); q8 += 64;
                                                ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:407:27: error: invalid initializer
 #define ggml_vld1q_s8_x4  vld1q_s8_x4
                           ^
/home/rover/llama.cpp/ggml-quants.c:4373:48: note: in expansion of macro ‘ggml_vld1q_s8_x4’
             const ggml_int8x16x4_t q8bytes_2 = ggml_vld1q_s8_x4(q8); q8 += 64;
                                                ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c: In function ‘ggml_vec_dot_q4_K_q8_K’:
/home/rover/llama.cpp/ggml-quants.c:404:27: error: invalid initializer
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
/home/rover/llama.cpp/ggml-quants.c:5273:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
             const ggml_uint8x16x2_t q4bits = ggml_vld1q_u8_x2(q4); q4 += 32;
                                              ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:5291:21: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
             q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;
                     ^
/home/rover/llama.cpp/ggml-quants.c:5300:21: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
             q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;
                     ^
/home/rover/llama.cpp/ggml-quants.c: In function ‘ggml_vec_dot_q5_K_q8_K’:
/home/rover/llama.cpp/ggml-quants.c:404:27: error: invalid initializer
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
/home/rover/llama.cpp/ggml-quants.c:5918:36: note: in expansion of macro ‘ggml_vld1q_u8_x2’
         ggml_uint8x16x2_t qhbits = ggml_vld1q_u8_x2(qh);
                                    ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:404:27: error: invalid initializer
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
/home/rover/llama.cpp/ggml-quants.c:5926:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
             const ggml_uint8x16x2_t q5bits = ggml_vld1q_u8_x2(q5); q5 += 32;
                                              ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:407:27: error: invalid initializer
 #define ggml_vld1q_s8_x4  vld1q_s8_x4
                           ^
/home/rover/llama.cpp/ggml-quants.c:5927:46: note: in expansion of macro ‘ggml_vld1q_s8_x4’
             const ggml_int8x16x4_t q8bytes = ggml_vld1q_s8_x4(q8); q8 += 64;
                                              ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c: In function ‘ggml_vec_dot_q6_K_q8_K’:
/home/rover/llama.cpp/ggml-quants.c:403:27: error: invalid initializer
 #define ggml_vld1q_s16_x2 vld1q_s16_x2
                           ^
/home/rover/llama.cpp/ggml-quants.c:6627:41: note: in expansion of macro ‘ggml_vld1q_s16_x2’
         const ggml_int16x8x2_t q8sums = ggml_vld1q_s16_x2(y[i].bsums);
                                         ^~~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:6629:43: warning: missing braces around initializer [-Wmissing-braces]
         const ggml_int16x8x2_t q6scales = {vmovl_s8(vget_low_s8(scales)), vmovl_s8(vget_high_s8(scales))};
                                           ^
                                            {                                                            }
/home/rover/llama.cpp/ggml-quants.c:404:27: error: invalid initializer
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
/home/rover/llama.cpp/ggml-quants.c:6641:40: note: in expansion of macro ‘ggml_vld1q_u8_x2’
             ggml_uint8x16x2_t qhbits = ggml_vld1q_u8_x2(qh); qh += 32;
                                        ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:405:27: error: implicit declaration of function ‘vld1q_u8_x4’; did you mean ‘vld1q_u64’? [-Werror=implicit-function-declaration]
 #define ggml_vld1q_u8_x4  vld1q_u8_x4
                           ^
/home/rover/llama.cpp/ggml-quants.c:6642:40: note: in expansion of macro ‘ggml_vld1q_u8_x4’
             ggml_uint8x16x4_t q6bits = ggml_vld1q_u8_x4(q6); q6 += 64;
                                        ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:405:27: error: invalid initializer
 #define ggml_vld1q_u8_x4  vld1q_u8_x4
                           ^
/home/rover/llama.cpp/ggml-quants.c:6642:40: note: in expansion of macro ‘ggml_vld1q_u8_x4’
             ggml_uint8x16x4_t q6bits = ggml_vld1q_u8_x4(q6); q6 += 64;
                                        ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:407:27: error: invalid initializer
 #define ggml_vld1q_s8_x4  vld1q_s8_x4
                           ^
/home/rover/llama.cpp/ggml-quants.c:6643:40: note: in expansion of macro ‘ggml_vld1q_s8_x4’
             ggml_int8x16x4_t q8bytes = ggml_vld1q_s8_x4(q8); q8 += 64;
                                        ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:6686:21: error: incompatible types when assigning to type ‘int8x16x4_t {aka struct int8x16x4_t}’ from type ‘int’
             q8bytes = ggml_vld1q_s8_x4(q8); q8 += 64;
                     ^
cc1: some warnings being treated as errors
CMakeFiles/ggml.dir/build.make:117: recipe for target 'CMakeFiles/ggml.dir/ggml-quants.c.o' failed
make[2]: *** [CMakeFiles/ggml.dir/ggml-quants.c.o] Error 1
CMakeFiles/Makefile2:623: recipe for target 'CMakeFiles/ggml.dir/all' failed
make[1]: *** [CMakeFiles/ggml.dir/all] Error 2
Makefile:145: recipe for target 'all' failed
make: *** [all] Error 2
```

Similar error to #3880, 

I'm not sure what more to do, or if it is even supported, because the Nvidia Jetson Nano is running Ubuntu 18.04 with cuda 10.2, which is older but I cannot upgrade. Would really appreciate if someone could help me figure this out, if I need to provide more information let me know! 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Compilation error on Nvidia Jetson Nano #4099

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Compilation error on Nvidia Jetson Nano #4099

Description

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions