Skip to content

Dramatic increase of the C++ dlls size #1125

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
zsogitbe opened this issue Mar 8, 2025 · 8 comments
Open

Dramatic increase of the C++ dlls size #1125

zsogitbe opened this issue Mar 8, 2025 · 8 comments

Comments

@zsogitbe
Copy link
Contributor

zsogitbe commented Mar 8, 2025

Description

Does anyone know why the size of our C++ DLLs (CUDA) has suddenly increased to nearly 700 MB, compared to around 60 MB previously? I've also noticed some new files, including ggml-cuda.dll, which is over 600 MB. Could this be due to incorrect dynamic linking, instead of statically linking only the necessary code?

Thank you in advance for your help!

@martindevans
Copy link
Member

martindevans commented Mar 8, 2025

Based on some of the comments in your thread on the main repo (here) I started a test run with GGML_CUDA_COMPRESSION_MODE=size, here.

Edit: the run failed for unrelated reasons, just some files now have new names so the copying paths are wrong. Overall it looks like a success. What do you think?

@zsogitbe
Copy link
Contributor Author

zsogitbe commented Mar 9, 2025

Thank you, Martin. I am a bit hesitant because I’m confused about how ggerganov still manages to maintain the former file sizes (~60 MB for the DLL), while we are seeing hundreds of MB more (ggml-org/llama.cpp#12267). I don’t believe he is using the GGML_CUDA_COMPRESSION_MODE=size option. Before considering this option, I’d like to understand why this discrepancy is happening.

Additionally, if we decide to add the GGML_CUDA_COMPRESSION_MODE=size option, we’ll need to conduct benchmarks to ensure that performance isn’t negatively impacted. I’m not entirely convinced that this option only excludes unneeded components—it might affect runtime performance if the excluded components are essential for specific tasks or optimizations. We should proceed cautiously here.

Overall, the file sizes in your test appear very small, but you don’t have CUDA. Could you clarify where you’re sourcing the CUDA 12 DLLs/distributables for LLamaSharp?

@zsogitbe
Copy link
Contributor Author

zsogitbe commented Mar 9, 2025

I believe I've identified the issue. I now have a smaller ggml-cuda.dll (51MB for the Windows Release with one architecture and 157MB with two architectures). The issue seems to stem from the -arch=native option. NVCC doesn't support this option, but it appears the code requires it for some reason. I had previously removed it, but I’ve now added it back. Additionally, the architecture(s) need to be defined manually (e.g., CMAKE_CUDA_ARCHITECTURES="61;89") while ensuring that -arch=native remains present in the CMake script.

@martindevans
Copy link
Member

Overall, the file sizes in your test appear very small, but you don’t have CUDA. Could you clarify where you’re sourcing the CUDA 12 DLLs/distributables for LLamaSharp?

They are in that build, search for "MB" and that'll highlight the relevant items:

ggml-cuda-bin-linux-cublas-cu11.7.1-x64.so : 175MB
ggml-cuda-bin-linux-cublas-cu12.8.0-x64.so : 113MB
ggml-cuda-bin-win-cublas-cu11.7.1-x64.dll : 174MB
ggml-cuda-bin-win-cublas-cu12.8.0-x64.dll : 113MB

-arch=native

I don't think we can use this in the hosted build system. If I understand it correctly that will build for whatever architecture the build host is, which isn't suitable for binaries that are going to be used anywhere else of course!

@sangyuxiaowu
Copy link
Contributor

My libggml-cuda.so on linux-arm64 Nvidia Jetson Device Cuda11。53M

Image

@zsogitbe
Copy link
Contributor Author

They are in that build, search for "MB" and that'll highlight the relevant items:

ggml-cuda-bin-linux-cublas-cu11.7.1-x64.so : 175MB
ggml-cuda-bin-linux-cublas-cu12.8.0-x64.so : 113MB
ggml-cuda-bin-win-cublas-cu11.7.1-x64.dll : 174MB
ggml-cuda-bin-win-cublas-cu12.8.0-x64.dll : 113MB

-arch=native

I don't think we can use this in the hosted build system. If I understand it correctly that will build for whatever architecture the build host is, which isn't suitable for binaries that are going to be used anywhere else of course!

Martin, the primary issue with the build is that the -arch=native option does not exist in NVCC, which constitutes a bug. Consequently, the NVIDIA Dev Kit's default architecture is utilized, likely corresponding to compute capability 50 (or close to this value, considering it's from around 2014).

Additionally, the build options for Windows are flawed, leading to excessively large DLL sizes—in my case, several hundred MB. Since I only use clean CUDA and not CUBLAS, I cannot evaluate whether your 113 MB size is reasonable. However, my guess is that the inclusion of CUBLAS adds approximately 70 MB (making it around 51 MB for a clean CUDA build), or the increased size may be due to buggy build options (this is more likely because with CUBLAS the DLL should be smaller, I guess...).

In any case, I recommend against using these pre-built DLLs. Instead, build your own based on the default options and add CUDA and the architecture manually. Be careful with changing cmake options because many will result in buggy very large DLLs (this is what I had). One critical cmake configuration to include is: CMAKE_CUDA_ARCHITECTURES="list with ; separated compute architectures"

The current official LLamaSharp distribution contains for CUDA 12 a DLL that is more than 300 MB, and that is definitely a BUG also.

@zsogitbe
Copy link
Contributor Author

zsogitbe commented Apr 3, 2025

IMPORTANT NEWS: The last master from llama.cpp corrects the BUG with the DLL sizes and also the CUDA graphs problem (ggml-org/llama.cpp#12152) that was causing regular crashes in LLamaSharp.

Unfortunately 'llama_model_params' was changed in llama.cpp (extra parameter const struct llama_model_tensor_buft_override * tensor_buft_overrides;), so simply using the last version of llama.cpp will not work. We will need the help of @martindevans for the update to the last version of llama.cpp.

@martindevans
Copy link
Member

That's great news! I've just started off a run here to see if it creates more reasonably sized binaries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants