Dramatic increase of the C++ dlls size #1125

zsogitbe · 2025-03-08T06:52:56Z

Description

Does anyone know why the size of our C++ DLLs (CUDA) has suddenly increased to nearly 700 MB, compared to around 60 MB previously? I've also noticed some new files, including ggml-cuda.dll, which is over 600 MB. Could this be due to incorrect dynamic linking, instead of statically linking only the necessary code?

Thank you in advance for your help!

martindevans · 2025-03-08T15:27:56Z

Based on some of the comments in your thread on the main repo (here) I started a test run with GGML_CUDA_COMPRESSION_MODE=size, here.

Edit: the run failed for unrelated reasons, just some files now have new names so the copying paths are wrong. Overall it looks like a success. What do you think?

zsogitbe · 2025-03-09T06:37:33Z

Thank you, Martin. I am a bit hesitant because I’m confused about how ggerganov still manages to maintain the former file sizes (~60 MB for the DLL), while we are seeing hundreds of MB more (ggml-org/llama.cpp#12267). I don’t believe he is using the GGML_CUDA_COMPRESSION_MODE=size option. Before considering this option, I’d like to understand why this discrepancy is happening.

Additionally, if we decide to add the GGML_CUDA_COMPRESSION_MODE=size option, we’ll need to conduct benchmarks to ensure that performance isn’t negatively impacted. I’m not entirely convinced that this option only excludes unneeded components—it might affect runtime performance if the excluded components are essential for specific tasks or optimizations. We should proceed cautiously here.

Overall, the file sizes in your test appear very small, but you don’t have CUDA. Could you clarify where you’re sourcing the CUDA 12 DLLs/distributables for LLamaSharp?

zsogitbe · 2025-03-09T11:20:42Z

I believe I've identified the issue. I now have a smaller ggml-cuda.dll (51MB for the Windows Release with one architecture and 157MB with two architectures). The issue seems to stem from the -arch=native option. NVCC doesn't support this option, but it appears the code requires it for some reason. I had previously removed it, but I’ve now added it back. Additionally, the architecture(s) need to be defined manually (e.g., CMAKE_CUDA_ARCHITECTURES="61;89") while ensuring that -arch=native remains present in the CMake script.

martindevans · 2025-03-09T17:14:54Z

Overall, the file sizes in your test appear very small, but you don’t have CUDA. Could you clarify where you’re sourcing the CUDA 12 DLLs/distributables for LLamaSharp?

They are in that build, search for "MB" and that'll highlight the relevant items:

ggml-cuda-bin-linux-cublas-cu11.7.1-x64.so : 175MB
ggml-cuda-bin-linux-cublas-cu12.8.0-x64.so : 113MB
ggml-cuda-bin-win-cublas-cu11.7.1-x64.dll : 174MB
ggml-cuda-bin-win-cublas-cu12.8.0-x64.dll : 113MB

-arch=native

I don't think we can use this in the hosted build system. If I understand it correctly that will build for whatever architecture the build host is, which isn't suitable for binaries that are going to be used anywhere else of course!

sangyuxiaowu · 2025-03-10T02:23:58Z

My libggml-cuda.so on linux-arm64 Nvidia Jetson Device Cuda11。53M

zsogitbe · 2025-03-10T04:47:49Z

They are in that build, search for "MB" and that'll highlight the relevant items:
ggml-cuda-bin-linux-cublas-cu11.7.1-x64.so : 175MB
ggml-cuda-bin-linux-cublas-cu12.8.0-x64.so : 113MB
ggml-cuda-bin-win-cublas-cu11.7.1-x64.dll : 174MB
ggml-cuda-bin-win-cublas-cu12.8.0-x64.dll : 113MB
-arch=native

I don't think we can use this in the hosted build system. If I understand it correctly that will build for whatever architecture the build host is, which isn't suitable for binaries that are going to be used anywhere else of course!

Martin, the primary issue with the build is that the -arch=native option does not exist in NVCC, which constitutes a bug. Consequently, the NVIDIA Dev Kit's default architecture is utilized, likely corresponding to compute capability 50 (or close to this value, considering it's from around 2014).

Additionally, the build options for Windows are flawed, leading to excessively large DLL sizes—in my case, several hundred MB. Since I only use clean CUDA and not CUBLAS, I cannot evaluate whether your 113 MB size is reasonable. However, my guess is that the inclusion of CUBLAS adds approximately 70 MB (making it around 51 MB for a clean CUDA build), or the increased size may be due to buggy build options (this is more likely because with CUBLAS the DLL should be smaller, I guess...).

In any case, I recommend against using these pre-built DLLs. Instead, build your own based on the default options and add CUDA and the architecture manually. Be careful with changing cmake options because many will result in buggy very large DLLs (this is what I had). One critical cmake configuration to include is: CMAKE_CUDA_ARCHITECTURES="list with ; separated compute architectures"

The current official LLamaSharp distribution contains for CUDA 12 a DLL that is more than 300 MB, and that is definitely a BUG also.

zsogitbe · 2025-04-03T07:01:50Z

IMPORTANT NEWS: The last master from llama.cpp corrects the BUG with the DLL sizes and also the CUDA graphs problem (ggml-org/llama.cpp#12152) that was causing regular crashes in LLamaSharp.

Unfortunately 'llama_model_params' was changed in llama.cpp (extra parameter const struct llama_model_tensor_buft_override * tensor_buft_overrides;), so simply using the last version of llama.cpp will not work. We will need the help of @martindevans for the update to the last version of llama.cpp.

martindevans · 2025-04-03T13:23:52Z

That's great news! I've just started off a run here to see if it creates more reasonably sized binaries.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dramatic increase of the C++ dlls size #1125

Dramatic increase of the C++ dlls size #1125

zsogitbe commented Mar 8, 2025

martindevans commented Mar 8, 2025 •

edited

Loading

zsogitbe commented Mar 9, 2025

zsogitbe commented Mar 9, 2025

martindevans commented Mar 9, 2025

sangyuxiaowu commented Mar 10, 2025

zsogitbe commented Mar 10, 2025

zsogitbe commented Apr 3, 2025

martindevans commented Apr 3, 2025

Dramatic increase of the C++ dlls size #1125

Dramatic increase of the C++ dlls size #1125

Comments

zsogitbe commented Mar 8, 2025

Description

martindevans commented Mar 8, 2025 • edited Loading

zsogitbe commented Mar 9, 2025

zsogitbe commented Mar 9, 2025

martindevans commented Mar 9, 2025

sangyuxiaowu commented Mar 10, 2025

zsogitbe commented Mar 10, 2025

zsogitbe commented Apr 3, 2025

martindevans commented Apr 3, 2025

martindevans commented Mar 8, 2025 •

edited

Loading