Skip to content

Can't compile "llama.cpp/ggml-quants.c" #3880

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
4 tasks done
ByerRA opened this issue Nov 1, 2023 · 20 comments
Closed
4 tasks done

Can't compile "llama.cpp/ggml-quants.c" #3880

ByerRA opened this issue Nov 1, 2023 · 20 comments

Comments

@ByerRA
Copy link

ByerRA commented Nov 1, 2023

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Current Behavior

While attempting to compile llama.cpp I encountered several warnings while compiling the "llama.cpp/ggml-quants.c" file and which are causing a "cc1: some warnings being treated as errors" issue causing the compile to fail.

Environment and Context

Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.

  • Physical (or virtual) hardware you are using, e.g. for Linux:

$ lscpu

Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  1
Core(s) per socket:  4
Socket(s):           1
Vendor ID:           ARM
Model:               1
Model name:          Cortex-A57
Stepping:            r1p1
CPU max MHz:         1479.0000
CPU min MHz:         102.0000
BogoMIPS:            38.40
L1d cache:           32K
L1i cache:           48K
L2 cache:            2048K
Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32
  • Operating System, e.g. for Linux:

$ uname -a

Linux dev 4.9.337-tegra #1 SMP PREEMPT Thu Jun 8 21:19:14 PDT 2023 aarch64 aarch64 aarch64 GNU/Linux
  • SDK version, e.g. for Linux:

$ python3 --version
Python 3.7.9

$ cmake --version
cmake version 3.28.20231031-g9c106e3

$ g++ --version
g++ (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0

Failure Information (for bugs)

Please help provide information about the failure / bug.

Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

$ git clone https://github.com/ggerganov/llama.cpp
$ cd llama.cpp
$ mkdir build
$ cd build
$ cmake .. -DLLAMA_CUBLAS=ON -DLLAMA_MPI=ON
$ cmake --build . --config Release

Failure Logs

rbyer@dev:~$ git clone https://github.com/ggerganov/llama.cpp
Cloning into 'llama.cpp'...
remote: Enumerating objects: 11791, done.
remote: Counting objects: 100% (3309/3309), done.
remote: Compressing objects: 100% (356/356), done.
remote: Total 11791 (delta 3093), reused 3084 (delta 2953), pack-reused 8482
Receiving objects: 100% (11791/11791), 13.73 MiB | 11.99 MiB/s, done.
Resolving deltas: 100% (8204/8204), done.
rbyer@dev:~$ cd llama.cpp
rbyer@dev:~/llama.cpp$ mkdir build
rbyer@dev:~/llama.cpp$ cd build
rbyer@dev:~/llama.cpp/build$ cmake .. -DLLAMA_CUBLAS=ON -DLLAMA_MPI=ON
-- The C compiler identification is GNU 7.5.0
-- The CXX compiler identification is GNU 7.5.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /usr/bin/git (found version "2.17.1")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE
-- Found CUDAToolkit: /usr/local/cuda-10.2/targets/aarch64-linux/include (found version "10.2.300")
-- cuBLAS found
-- The CUDA compiler identification is NVIDIA 10.2.300
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda-10.2/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Using CUDA architectures: 52;61;70
-- Found MPI_C: /usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi.so (found version "3.1")
-- Found MPI_CXX: /usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_cxx.so (found version "3.1")
-- Found MPI: TRUE (found version "3.1")
-- MPI found
-- CMAKE_SYSTEM_PROCESSOR: aarch64
-- ARM detected
-- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E
-- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E - Failed
-- Configuring done (9.6s)
-- Generating done (0.4s)
-- Build files have been written to: /home/rbyer/llama.cpp/build
rbyer@dev:~/llama.cpp/build$ cmake --build . --config Release
[  1%] Built target BUILD_INFO
[  2%] Building C object CMakeFiles/ggml.dir/ggml.c.o
[  3%] Building C object CMakeFiles/ggml.dir/ggml-alloc.c.o
[  4%] Building C object CMakeFiles/ggml.dir/ggml-backend.c.o
[  5%] Building C object CMakeFiles/ggml.dir/ggml-quants.c.o
/home/rbyer/llama.cpp/ggml-quants.c: In function ‘ggml_vec_dot_q2_K_q8_K’:
/home/rbyer/llama.cpp/ggml-quants.c:3577:36: error: implicit declaration of function ‘vld1q_s16_x2’; did you mean ‘vld1q_s16’? [-Werror=implicit-function-declaration]
         const int16x8x2_t q8sums = vld1q_s16_x2(y[i].bsums);
                                    ^~~~~~~~~~~~
                                    vld1q_s16
/home/rbyer/llama.cpp/ggml-quants.c:3577:36: error: invalid initializer
/home/rbyer/llama.cpp/ggml-quants.c:3578:36: warning: missing braces around initializer [-Wmissing-braces]
         const int16x8x2_t mins16 = {vreinterpretq_s16_u16(vmovl_u8(vget_low_u8(mins))), vreinterpretq_s16_u16(vmovl_u8(vget_high_u8(mins)))};
                                    ^
                                     {                                                                                                      }
/home/rbyer/llama.cpp/ggml-quants.c:3614:41: error: implicit declaration of function ‘vld1q_u8_x2’; did you mean ‘vld1q_u32’? [-Werror=implicit-function-declaration]
             const uint8x16x2_t q2bits = vld1q_u8_x2(q2); q2 += 32;
                                         ^~~~~~~~~~~
                                         vld1q_u32
/home/rbyer/llama.cpp/ggml-quants.c:3614:41: error: invalid initializer
/home/rbyer/llama.cpp/ggml-quants.c:3616:35: error: implicit declaration of function ‘vld1q_s8_x2’; did you mean ‘vld1q_s32’? [-Werror=implicit-function-declaration]
             int8x16x2_t q8bytes = vld1q_s8_x2(q8); q8 += 32;
                                   ^~~~~~~~~~~
                                   vld1q_s32
/home/rbyer/llama.cpp/ggml-quants.c:3616:35: error: invalid initializer
/home/rbyer/llama.cpp/ggml-quants.c:3606:17: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
         q8bytes = vld1q_s8_x2(q8); q8 += 32;\
                 ^
/home/rbyer/llama.cpp/ggml-quants.c:3621:13: note: in expansion of macro ‘SHIFT_MULTIPLY_ACCUM_WITH_SCALE’
             SHIFT_MULTIPLY_ACCUM_WITH_SCALE(2, 2);
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/rbyer/llama.cpp/ggml-quants.c:3606:17: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
         q8bytes = vld1q_s8_x2(q8); q8 += 32;\
                 ^
/home/rbyer/llama.cpp/ggml-quants.c:3623:13: note: in expansion of macro ‘SHIFT_MULTIPLY_ACCUM_WITH_SCALE’
             SHIFT_MULTIPLY_ACCUM_WITH_SCALE(4, 4);
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/rbyer/llama.cpp/ggml-quants.c:3606:17: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
         q8bytes = vld1q_s8_x2(q8); q8 += 32;\
                 ^
/home/rbyer/llama.cpp/ggml-quants.c:3625:13: note: in expansion of macro ‘SHIFT_MULTIPLY_ACCUM_WITH_SCALE’
             SHIFT_MULTIPLY_ACCUM_WITH_SCALE(6, 6);
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/rbyer/llama.cpp/ggml-quants.c: In function ‘ggml_vec_dot_q3_K_q8_K’:
/home/rbyer/llama.cpp/ggml-quants.c:4251:31: error: invalid initializer
         uint8x16x2_t qhbits = vld1q_u8_x2(qh);
                               ^~~~~~~~~~~
/home/rbyer/llama.cpp/ggml-quants.c:4269:41: error: invalid initializer
             const uint8x16x2_t q3bits = vld1q_u8_x2(q3); q3 += 32;
                                         ^~~~~~~~~~~
/home/rbyer/llama.cpp/ggml-quants.c:4270:43: error: implicit declaration of function ‘vld1q_s8_x4’; did you mean ‘vld1q_s64’? [-Werror=implicit-function-declaration]
             const int8x16x4_t q8bytes_1 = vld1q_s8_x4(q8); q8 += 64;
                                           ^~~~~~~~~~~
                                           vld1q_s64
/home/rbyer/llama.cpp/ggml-quants.c:4270:43: error: invalid initializer
/home/rbyer/llama.cpp/ggml-quants.c:4271:43: error: invalid initializer
             const int8x16x4_t q8bytes_2 = vld1q_s8_x4(q8); q8 += 64;
                                           ^~~~~~~~~~~
/home/rbyer/llama.cpp/ggml-quants.c: In function ‘ggml_vec_dot_q4_K_q8_K’:
/home/rbyer/llama.cpp/ggml-quants.c:5171:41: error: invalid initializer
             const uint8x16x2_t q4bits = vld1q_u8_x2(q4); q4 += 32;
                                         ^~~~~~~~~~~
/home/rbyer/llama.cpp/ggml-quants.c:5189:21: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
             q8bytes = vld1q_s8_x2(q8); q8 += 32;
                     ^
/home/rbyer/llama.cpp/ggml-quants.c:5198:21: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
             q8bytes = vld1q_s8_x2(q8); q8 += 32;
                     ^
/home/rbyer/llama.cpp/ggml-quants.c: In function ‘ggml_vec_dot_q5_K_q8_K’:
/home/rbyer/llama.cpp/ggml-quants.c:5816:31: error: invalid initializer
         uint8x16x2_t qhbits = vld1q_u8_x2(qh);
                               ^~~~~~~~~~~
/home/rbyer/llama.cpp/ggml-quants.c:5824:41: error: invalid initializer
             const uint8x16x2_t q5bits = vld1q_u8_x2(q5); q5 += 32;
                                         ^~~~~~~~~~~
/home/rbyer/llama.cpp/ggml-quants.c:5825:41: error: invalid initializer
             const int8x16x4_t q8bytes = vld1q_s8_x4(q8); q8 += 64;
                                         ^~~~~~~~~~~
/home/rbyer/llama.cpp/ggml-quants.c: In function ‘ggml_vec_dot_q6_K_q8_K’:
/home/rbyer/llama.cpp/ggml-quants.c:6525:36: error: invalid initializer
         const int16x8x2_t q8sums = vld1q_s16_x2(y[i].bsums);
                                    ^~~~~~~~~~~~
/home/rbyer/llama.cpp/ggml-quants.c:6527:38: warning: missing braces around initializer [-Wmissing-braces]
         const int16x8x2_t q6scales = {vmovl_s8(vget_low_s8(scales)), vmovl_s8(vget_high_s8(scales))};
                                      ^
                                       {                                                            }
/home/rbyer/llama.cpp/ggml-quants.c:6539:35: error: invalid initializer
             uint8x16x2_t qhbits = vld1q_u8_x2(qh); qh += 32;
                                   ^~~~~~~~~~~
/home/rbyer/llama.cpp/ggml-quants.c:6540:35: error: implicit declaration of function ‘vld1q_u8_x4’; did you mean ‘vld1q_u64’? [-Werror=implicit-function-declaration]
             uint8x16x4_t q6bits = vld1q_u8_x4(q6); q6 += 64;
                                   ^~~~~~~~~~~
                                   vld1q_u64
/home/rbyer/llama.cpp/ggml-quants.c:6540:35: error: invalid initializer
/home/rbyer/llama.cpp/ggml-quants.c:6541:35: error: invalid initializer
             int8x16x4_t q8bytes = vld1q_s8_x4(q8); q8 += 64;
                                   ^~~~~~~~~~~
/home/rbyer/llama.cpp/ggml-quants.c:6584:21: error: incompatible types when assigning to type ‘int8x16x4_t {aka struct int8x16x4_t}’ from type ‘int’
             q8bytes = vld1q_s8_x4(q8); q8 += 64;
                     ^
cc1: some warnings being treated as errors
CMakeFiles/ggml.dir/build.make:117: recipe for target 'CMakeFiles/ggml.dir/ggml-quants.c.o' failed
make[2]: *** [CMakeFiles/ggml.dir/ggml-quants.c.o] Error 1
CMakeFiles/Makefile2:647: recipe for target 'CMakeFiles/ggml.dir/all' failed
make[1]: *** [CMakeFiles/ggml.dir/all] Error 2
Makefile:145: recipe for target 'all' failed
make: *** [all] Error 2
rbyer@dev:~/llama.cpp/build$
@AutonomicPerfectionist
Copy link
Contributor

-DLLAMA_MPI=ON

I see you're trying to build with MPI support, that was broken around a month ago, and I've been working on a fix in #3334. Doesn't seem to be the cause of your issues here, but figured I'd point it out so you aren't surprised when it blows up after running

@themanyone
Copy link

themanyone commented Dec 22, 2023

Okay. So my solution was to implement the missing declarations myself, by loading the 4 chunks of 16 unsigned 8-bit integers into a 128-bit NEON register. Here's the patch.

diff --git a/ggml-quants.c b/ggml-quants.c
index 0e8163a..dfad15d 100644
--- a/ggml-quants.c
+++ b/ggml-quants.c
@@ -394,17 +394,40 @@ inline static ggml_int8x16x4_t ggml_vld1q_s8_x4(const int8_t * ptr) {
 
 #else
 
+typedef struct {
+    int8x16_t val[2];
+} ggml_int8x16x4_t;
+
+#define ggml_vld1q_s8_x4(ptr) ({ \
+    ggml_int8x16x4_t result; \
+    result.val[0] = vld1q_s8(ptr); \
+    result.val[1] = vld1q_s8(ptr + 16); \
+    result; \
+})
+
+typedef struct {
+    uint8x16_t val[4];
+} ggml_uint8x16x4_t;
+
+#define ggml_vld1q_u8_x4(ptr) ({ \
+    ggml_uint8x16x4_t result; \
+    result.val[0] = vld1q_u8(ptr); \
+    result.val[1] = vld1q_u8(ptr + 16); \
+    result.val[2] = vld1q_u8(ptr + 32); \
+    result.val[3] = vld1q_u8(ptr + 48); \
+    result; \
+})
+
 #define ggml_int16x8x2_t  int16x8x2_t
 #define ggml_uint8x16x2_t uint8x16x2_t
 #define ggml_uint8x16x4_t uint8x16x4_t
 #define ggml_int8x16x2_t  int8x16x2_t
 #define ggml_int8x16x4_t  int8x16x4_t
-
 #define ggml_vld1q_s16_x2 vld1q_s16_x2
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
-#define ggml_vld1q_u8_x4  vld1q_u8_x4
+//#define ggml_vld1q_u8_x4  vld1q_u8_x4
 #define ggml_vld1q_s8_x2  vld1q_s8_x2
-#define ggml_vld1q_s8_x4  vld1q_s8_x4
+//#define ggml_vld1q_s8_x4  vld1q_u8_x4
 
 #endif
 #endif

@lndshrk504
Copy link

Hi, I am encountering this issue as well when trying to build on 4GB Jetson Nano. Is there a fix or patch yet?

@paulohm2
Copy link

paulohm2 commented Jan 6, 2024

Same here. Any thoughts would be appreciated.

@lndshrk504
Copy link

I have found this (#4123) which suggests to install gcc 8.5 from source, haven't finished trying yet. 🤷🏻‍♂️🤷🏻‍♂️

@paulohm2
Copy link

paulohm2 commented Jan 6, 2024

compiling gcc 8.5 from source right now. will let everybody know if it works in a couple of hours.

@paulohm2
Copy link

paulohm2 commented Jan 7, 2024

After a couple of tweaks, I managed to make this work. Be sure to:

  1. compile gcc 8.5 from the source code https://ftp.gnu.org/gnu/gcc/gcc-8.5.0/
  2. after make install, make sure that your gcc symbolic link ate /usr/bin points to the right file
  3. tried using make but it didnt work. use cmake with the following options
    cmake .. -DLLAMA_CUBLAS=1 -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc
  4. had to tweak this file to include the -fPIC compiler option - CMakeFiles/ggml.dir/flags.make, otherwise ld returns an error

@paulohm2
Copy link

paulohm2 commented Jan 8, 2024

After a couple of tweaks, I managed to make this work. Be sure to:

  1. compile gcc 8.5 from the source code https://ftp.gnu.org/gnu/gcc/gcc-8.5.0/
  2. after make install, make sure that your gcc symbolic link ate /usr/bin points to the right file
  3. tried using make but it didnt work. use cmake with the following options
    cmake .. -DLLAMA_CUBLAS=1 -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc
  4. had to tweak this file to include the -fPIC compiler option - CMakeFiles/ggml.dir/flags.make, otherwise ld returns an error

Please also note that maybe you have to run your CUDA-enabled as sudo (some programs running as a regular user were unable to detect my CUDA setup, dont ask me why )

@lndshrk504
Copy link

lndshrk504 commented Jan 9, 2024

Can you please tell me where you put the “ -fPIC” compilation flag in that one file? I am getting errors when following your steps, after compiling gcc 8.5 and setting that gcc in my path. I put -fPIC as the first option for C_FLAGS in the ggml.dir/flags.make file

Terminal Output:

[ 15%] Building CXX object common/CMakeFiles/common.dir/train.cpp.o
[ 16%] Linking CXX static library libcommon.a
[ 16%] Built target common
[ 17%] Building CXX object tests/CMakeFiles/test-quantize-fns.dir/test-quantize-fns.cpp.o
[ 18%] Linking CXX executable ../bin/test-quantize-fns
/usr/bin/ld: ../libllama.a(ggml-cuda.cu.o): relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol stdout@@GLIBC_2.17' which may bind externally can not�be used when making a shared object; recompile with -fPIC /usr/bin/ld: ../libllama.a(ggml-cuda.cu.o)(.text+0x234): unresolvable R_AARCH64_ADR_PREL_PG_HI21 relocation against symbol stdout@@GLIBC_2.17'
/usr/bin/ld: final link failed: Bad value
collect2: error: ld returned 1 exit status
tests/CMakeFiles/test-quantize-fns.dir/build.make:103: recipe for target 'bin/test-quantize-fns' failed
make[2]: *** [bin/test-quantize-fns] Error 1
CMakeFiles/Makefile2:1542: recipe for target 'tests/CMakeFiles/test-quantize-fns.dir/all' failed
make[1]: *** [tests/CMakeFiles/test-quantize-fns.dir/all] Error 2
Makefile:145: recipe for target 'all' failed
make: *** [all] Error 2

@paulohm2
Copy link

paulohm2 commented Jan 9, 2024

Try editing this file after running cmake:

build/CMakeFiles/ggml.dir/flags.make

...
C_FLAGS = -fPIC ...
...
C_FLAGS = -fPIC ...
...

@lndshrk504
Copy link

It worked for me!! Thank you. BTW second place to put -fPIC is "CUDA_FLAGS." Had never seen a flags.make file before so I was confused.

Thanks again!

@rvandernoort
Copy link

Hey I'm also trying to compile on an Nvidia Jetson, but without MPI, however if I use this method I still get the same errors in ggml-quants.c.o. Any idea what to do next?

gcc is 8.5 self compiled

cmake .. -DLLAMA_CUBLAS=1 -DCMAKE_CUDA_COMPILER=/usr/local/cuda-10.2/bin/nvcc
-- The C compiler identification is GNU 7.5.0
-- The CXX compiler identification is GNU 7.5.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /usr/bin/git (found version "2.17.1") 
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE  
-- Found CUDAToolkit: /usr/local/cuda/targets/aarch64-linux/include (found version "10.2.300") 
-- cuBLAS found
-- The CUDA compiler identification is NVIDIA 10.2.300
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda-10.2/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Using CUDA architectures: 52;61;70
-- CUDA host compiler is GNU 8.5.0

-- CMAKE_SYSTEM_PROCESSOR: aarch64
-- ARM detected
-- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E
-- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E - Failed
-- Configuring done (6.4s)
-- Generating done (0.4s)
-- Build files have been written to: /home/rover/llama.cpp/build
# CMAKE generated file: DO NOT EDIT!
# Generated by "Unix Makefiles" Generator, CMake Version 3.28

# compile C with /usr/bin/cc
# compile CUDA with /usr/local/cuda-10.2/bin/nvcc
C_DEFINES = -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600

C_INCLUDES = -I/home/rover/llama.cpp/. -isystem /usr/local/cuda/targets/aarch64-linux/include

C_FLAGS = -fPIC -O3 -DNDEBUG -std=gnu11 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -pthread

CUDA_DEFINES = -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_USE_CUBLAS -DK_QUANTS_PER_ITERATION=2 -D_GNU_SOURCE -D_XOPEN_SOURCE=600

CUDA_INCLUDES = -I/home/rover/llama.cpp/. -isystem /usr/local/cuda/targets/aarch64-linux/include

CUDA_FLAGS = -fPIC -O3 -DNDEBUG -std=c++11 "--generate-code=arch=compute_52,code=[compute_52,sm_52]" "--generate-code=arch=compute_61,code=[compute_61,sm_61]" "--generate-code=arch=compute_70,code=[compute_70,sm_70]" -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -use_fast_math -Wno-pedantic -Xcompiler "-Wno-array-bounds -Wno-format-truncation -Wextra-semi" -Xcompiler -pthread
cmake --build . --config Release
[  1%] Building C object CMakeFiles/ggml.dir/ggml.c.o
[  2%] Building C object CMakeFiles/ggml.dir/ggml-alloc.c.o
[  3%] Building C object CMakeFiles/ggml.dir/ggml-backend.c.o
[  4%] Building C object CMakeFiles/ggml.dir/ggml-quants.c.o
/home/rover/llama.cpp/ggml-quants.c: In function ‘ggml_vec_dot_q2_K_q8_K’:
/home/rover/llama.cpp/ggml-quants.c:403:27: error: implicit declaration of function ‘vld1q_s16_x2’; did you mean ‘vld1q_s16’? [-Werror=implicit-function-declaration]
 #define ggml_vld1q_s16_x2 vld1q_s16_x2
                           ^
/home/rover/llama.cpp/ggml-quants.c:3725:41: note: in expansion of macro ‘ggml_vld1q_s16_x2’
         const ggml_int16x8x2_t q8sums = ggml_vld1q_s16_x2(y[i].bsums);
                                         ^~~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:403:27: error: invalid initializer
 #define ggml_vld1q_s16_x2 vld1q_s16_x2
                           ^
/home/rover/llama.cpp/ggml-quants.c:3725:41: note: in expansion of macro ‘ggml_vld1q_s16_x2’
         const ggml_int16x8x2_t q8sums = ggml_vld1q_s16_x2(y[i].bsums);
                                         ^~~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:404:27: error: implicit declaration of function ‘vld1q_u8_x2’; did you mean ‘vld1q_u32’? [-Werror=implicit-function-declaration]
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
/home/rover/llama.cpp/ggml-quants.c:3749:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
             const ggml_uint8x16x2_t q2bits = ggml_vld1q_u8_x2(q2); q2 += 32;
                                              ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:404:27: error: invalid initializer
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
/home/rover/llama.cpp/ggml-quants.c:3749:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
             const ggml_uint8x16x2_t q2bits = ggml_vld1q_u8_x2(q2); q2 += 32;
                                              ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:406:27: error: implicit declaration of function ‘vld1q_s8_x2’; did you mean ‘vld1q_s32’? [-Werror=implicit-function-declaration]
 #define ggml_vld1q_s8_x2  vld1q_s8_x2
                           ^
/home/rover/llama.cpp/ggml-quants.c:3751:40: note: in expansion of macro ‘ggml_vld1q_s8_x2’
             ggml_int8x16x2_t q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;
                                        ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:406:27: error: invalid initializer
 #define ggml_vld1q_s8_x2  vld1q_s8_x2
                           ^
/home/rover/llama.cpp/ggml-quants.c:3751:40: note: in expansion of macro ‘ggml_vld1q_s8_x2’
             ggml_int8x16x2_t q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;
                                        ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:3743:17: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
         q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;\
                 ^
/home/rover/llama.cpp/ggml-quants.c:3757:13: note: in expansion of macro ‘SHIFT_MULTIPLY_ACCUM_WITH_SCALE’
             SHIFT_MULTIPLY_ACCUM_WITH_SCALE(2, 2);
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:3743:17: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
         q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;\
                 ^
/home/rover/llama.cpp/ggml-quants.c:3758:13: note: in expansion of macro ‘SHIFT_MULTIPLY_ACCUM_WITH_SCALE’
             SHIFT_MULTIPLY_ACCUM_WITH_SCALE(4, 4);
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:3743:17: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
         q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;\
                 ^
/home/rover/llama.cpp/ggml-quants.c:3759:13: note: in expansion of macro ‘SHIFT_MULTIPLY_ACCUM_WITH_SCALE’
             SHIFT_MULTIPLY_ACCUM_WITH_SCALE(6, 6);
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c: In function ‘ggml_vec_dot_q3_K_q8_K’:
/home/rover/llama.cpp/ggml-quants.c:404:27: error: invalid initializer
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
/home/rover/llama.cpp/ggml-quants.c:4365:36: note: in expansion of macro ‘ggml_vld1q_u8_x2’
         ggml_uint8x16x2_t qhbits = ggml_vld1q_u8_x2(qh);
                                    ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:404:27: error: invalid initializer
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
/home/rover/llama.cpp/ggml-quants.c:4383:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
             const ggml_uint8x16x2_t q3bits = ggml_vld1q_u8_x2(q3); q3 += 32;
                                              ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:407:27: error: implicit declaration of function ‘vld1q_s8_x4’; did you mean ‘vld1q_s64’? [-Werror=implicit-function-declaration]
 #define ggml_vld1q_s8_x4  vld1q_s8_x4
                           ^
/home/rover/llama.cpp/ggml-quants.c:4384:48: note: in expansion of macro ‘ggml_vld1q_s8_x4’
             const ggml_int8x16x4_t q8bytes_1 = ggml_vld1q_s8_x4(q8); q8 += 64;
                                                ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:407:27: error: invalid initializer
 #define ggml_vld1q_s8_x4  vld1q_s8_x4
                           ^
/home/rover/llama.cpp/ggml-quants.c:4384:48: note: in expansion of macro ‘ggml_vld1q_s8_x4’
             const ggml_int8x16x4_t q8bytes_1 = ggml_vld1q_s8_x4(q8); q8 += 64;
                                                ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:407:27: error: invalid initializer
 #define ggml_vld1q_s8_x4  vld1q_s8_x4
                           ^
/home/rover/llama.cpp/ggml-quants.c:4385:48: note: in expansion of macro ‘ggml_vld1q_s8_x4’
             const ggml_int8x16x4_t q8bytes_2 = ggml_vld1q_s8_x4(q8); q8 += 64;
                                                ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c: In function ‘ggml_vec_dot_q4_K_q8_K’:
/home/rover/llama.cpp/ggml-quants.c:404:27: error: invalid initializer
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
/home/rover/llama.cpp/ggml-quants.c:5244:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
             const ggml_uint8x16x2_t q4bits = ggml_vld1q_u8_x2(q4); q4 += 32;
                                              ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:5246:21: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
             q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;
                     ^
/home/rover/llama.cpp/ggml-quants.c:5253:21: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
             q8bytes = ggml_vld1q_s8_x2(q8); q8 += 32;
                     ^
/home/rover/llama.cpp/ggml-quants.c: In function ‘ggml_vec_dot_q5_K_q8_K’:
/home/rover/llama.cpp/ggml-quants.c:404:27: error: invalid initializer
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
/home/rover/llama.cpp/ggml-quants.c:5840:36: note: in expansion of macro ‘ggml_vld1q_u8_x2’
         ggml_uint8x16x2_t qhbits = ggml_vld1q_u8_x2(qh);
                                    ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:404:27: error: invalid initializer
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
/home/rover/llama.cpp/ggml-quants.c:5848:46: note: in expansion of macro ‘ggml_vld1q_u8_x2’
             const ggml_uint8x16x2_t q5bits = ggml_vld1q_u8_x2(q5); q5 += 32;
                                              ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:407:27: error: invalid initializer
 #define ggml_vld1q_s8_x4  vld1q_s8_x4
                           ^
/home/rover/llama.cpp/ggml-quants.c:5849:46: note: in expansion of macro ‘ggml_vld1q_s8_x4’
             const ggml_int8x16x4_t q8bytes = ggml_vld1q_s8_x4(q8); q8 += 64;
                                              ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c: In function ‘ggml_vec_dot_q6_K_q8_K’:
/home/rover/llama.cpp/ggml-quants.c:403:27: error: invalid initializer
 #define ggml_vld1q_s16_x2 vld1q_s16_x2
                           ^
/home/rover/llama.cpp/ggml-quants.c:6506:41: note: in expansion of macro ‘ggml_vld1q_s16_x2’
         const ggml_int16x8x2_t q8sums = ggml_vld1q_s16_x2(y[i].bsums);
                                         ^~~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:404:27: error: invalid initializer
 #define ggml_vld1q_u8_x2  vld1q_u8_x2
                           ^
/home/rover/llama.cpp/ggml-quants.c:6520:40: note: in expansion of macro ‘ggml_vld1q_u8_x2’
             ggml_uint8x16x2_t qhbits = ggml_vld1q_u8_x2(qh); qh += 32;
                                        ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:405:27: error: implicit declaration of function ‘vld1q_u8_x4’; did you mean ‘vld1q_u64’? [-Werror=implicit-function-declaration]
 #define ggml_vld1q_u8_x4  vld1q_u8_x4
                           ^
/home/rover/llama.cpp/ggml-quants.c:6521:40: note: in expansion of macro ‘ggml_vld1q_u8_x4’
             ggml_uint8x16x4_t q6bits = ggml_vld1q_u8_x4(q6); q6 += 64;
                                        ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:405:27: error: invalid initializer
 #define ggml_vld1q_u8_x4  vld1q_u8_x4
                           ^
/home/rover/llama.cpp/ggml-quants.c:6521:40: note: in expansion of macro ‘ggml_vld1q_u8_x4’
             ggml_uint8x16x4_t q6bits = ggml_vld1q_u8_x4(q6); q6 += 64;
                                        ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:407:27: error: invalid initializer
 #define ggml_vld1q_s8_x4  vld1q_s8_x4
                           ^
/home/rover/llama.cpp/ggml-quants.c:6522:40: note: in expansion of macro ‘ggml_vld1q_s8_x4’
             ggml_int8x16x4_t q8bytes = ggml_vld1q_s8_x4(q8); q8 += 64;
                                        ^~~~~~~~~~~~~~~~
/home/rover/llama.cpp/ggml-quants.c:6547:21: error: incompatible types when assigning to type ‘int8x16x4_t {aka struct int8x16x4_t}’ from type ‘int’
             q8bytes = ggml_vld1q_s8_x4(q8); q8 += 64;
                     ^
/home/rover/llama.cpp/ggml-quants.c: In function ‘ggml_vec_dot_iq2_xxs_q8_K’:
/home/rover/llama.cpp/ggml-quants.c:7264:17: error: incompatible types when assigning to type ‘int8x16x4_t {aka struct int8x16x4_t}’ from type ‘int’
             q8b = ggml_vld1q_s8_x4(q8); q8 += 64;
                 ^
cc1: some warnings being treated as errors
CMakeFiles/ggml.dir/build.make:117: recipe for target 'CMakeFiles/ggml.dir/ggml-quants.c.o' failed
make[2]: *** [CMakeFiles/ggml.dir/ggml-quants.c.o] Error 1
CMakeFiles/Makefile2:697: recipe for target 'CMakeFiles/ggml.dir/all' failed
make[1]: *** [CMakeFiles/ggml.dir/all] Error 2
Makefile:145: recipe for target 'all' failed
make: *** [all] Error 2

@paulohm2
Copy link

Try using gcc 8.5, as posted above

@paulohm2
Copy link

THIS

After a couple of tweaks, I managed to make this work. Be sure to:

  1. compile gcc 8.5 from the source code https://ftp.gnu.org/gnu/gcc/gcc-8.5.0/
  2. after make install, make sure that your gcc symbolic link ate /usr/bin points to the right file
  3. tried using make but it didnt work. use cmake with the following options
    cmake .. -DLLAMA_CUBLAS=1 -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc
  4. had to tweak this file to include the -fPIC compiler option - CMakeFiles/ggml.dir/flags.make, otherwise ld returns an error

@rvandernoort
Copy link

gcc is 8.5 self compiled

I have done all that already, but still get the errors

gcc --version
gcc (GCC) 8.5.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

@paulohm2
Copy link

paulohm2 commented Jan 11, 2024 via email

@rvandernoort
Copy link

Alright thanks, that got me further again, however still an error unfortunately.

export CC=/usr/local/bin/gcc
export CXX=/usr/local/bin/g++
cmake .. -DLLAMA_CUBLAS=1 -DCMAKE_CUDA_COMPILER=/usr/local/cuda-10.2/bin/nvcc
-- The C compiler identification is GNU 8.5.0
-- The CXX compiler identification is GNU 8.5.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/local/bin/gcc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/local/bin/g++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /usr/bin/git (found version "2.17.1") 
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE  
-- Found CUDAToolkit: /usr/local/cuda/targets/aarch64-linux/include (found version "10.2.300") 
-- cuBLAS found
-- The CUDA compiler identification is NVIDIA 10.2.300
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda-10.2/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Using CUDA architectures: 52;61;70
-- CUDA host compiler is GNU 8.5.0

-- CMAKE_SYSTEM_PROCESSOR: aarch64
-- ARM detected
-- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E
-- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E - Failed
-- Configuring done (7.0s)
-- Generating done (0.4s)
-- Build files have been written to: /home/rover/llama.cpp/build
cmake --build . --config Release
[  1%] Building C object CMakeFiles/ggml.dir/ggml.c.o
[  2%] Building C object CMakeFiles/ggml.dir/ggml-alloc.c.o
[  3%] Building C object CMakeFiles/ggml.dir/ggml-backend.c.o
[  4%] Building C object CMakeFiles/ggml.dir/ggml-quants.c.o
[  5%] Building CUDA object CMakeFiles/ggml.dir/ggml-cuda.cu.o
/home/rover/llama.cpp/ggml-cuda.cu(626): error: identifier "__hmax2" is undefined

/home/rover/llama.cpp/ggml-cuda.cu(5462): error: identifier "__hmax2" is undefined

/home/rover/llama.cpp/ggml-cuda.cu(5474): error: identifier "__hmax" is undefined

/home/rover/llama.cpp/ggml-cuda.cu(5481): error: identifier "__hmax" is undefined

4 errors detected in the compilation of "/tmp/tmpxft_00003545_00000000-10_ggml-cuda.compute_70.cpp1.ii".
CMakeFiles/ggml.dir/build.make:131: recipe for target 'CMakeFiles/ggml.dir/ggml-cuda.cu.o' failed
make[2]: *** [CMakeFiles/ggml.dir/ggml-cuda.cu.o] Error 1
CMakeFiles/Makefile2:697: recipe for target 'CMakeFiles/ggml.dir/all' failed
make[1]: *** [CMakeFiles/ggml.dir/all] Error 2
Makefile:145: recipe for target 'all' failed
make: *** [all] Error 2

@paulohm2
Copy link

paulohm2 commented Jan 11, 2024 via email

@ggerganov
Copy link
Member

The hmax issue will be fixed with #4862

@github-actions github-actions bot added the stale label Mar 19, 2024
Copy link
Contributor

github-actions bot commented Apr 2, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions github-actions bot closed this as completed Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants