-
Notifications
You must be signed in to change notification settings - Fork 422
Dramatic increase of the C++ dlls size #1125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thank you, Martin. I am a bit hesitant because I’m confused about how ggerganov still manages to maintain the former file sizes (~60 MB for the DLL), while we are seeing hundreds of MB more (ggml-org/llama.cpp#12267). I don’t believe he is using the Additionally, if we decide to add the Overall, the file sizes in your test appear very small, but you don’t have CUDA. Could you clarify where you’re sourcing the CUDA 12 DLLs/distributables for LLamaSharp? |
I believe I've identified the issue. I now have a smaller ggml-cuda.dll (51MB for the Windows Release with one architecture and 157MB with two architectures). The issue seems to stem from the -arch=native option. NVCC doesn't support this option, but it appears the code requires it for some reason. I had previously removed it, but I’ve now added it back. Additionally, the architecture(s) need to be defined manually (e.g., CMAKE_CUDA_ARCHITECTURES="61;89") while ensuring that -arch=native remains present in the CMake script. |
They are in that build, search for "MB" and that'll highlight the relevant items:
I don't think we can use this in the hosted build system. If I understand it correctly that will build for whatever architecture the build host is, which isn't suitable for binaries that are going to be used anywhere else of course! |
Martin, the primary issue with the build is that the Additionally, the build options for Windows are flawed, leading to excessively large DLL sizes—in my case, several hundred MB. Since I only use clean CUDA and not CUBLAS, I cannot evaluate whether your 113 MB size is reasonable. However, my guess is that the inclusion of CUBLAS adds approximately 70 MB (making it around 51 MB for a clean CUDA build), or the increased size may be due to buggy build options (this is more likely because with CUBLAS the DLL should be smaller, I guess...). In any case, I recommend against using these pre-built DLLs. Instead, build your own based on the default options and add CUDA and the architecture manually. Be careful with changing cmake options because many will result in buggy very large DLLs (this is what I had). One critical cmake configuration to include is: The current official LLamaSharp distribution contains for CUDA 12 a DLL that is more than 300 MB, and that is definitely a BUG also. |
IMPORTANT NEWS: The last master from llama.cpp corrects the BUG with the DLL sizes and also the CUDA graphs problem (ggml-org/llama.cpp#12152) that was causing regular crashes in LLamaSharp. Unfortunately 'llama_model_params' was changed in llama.cpp (extra parameter |
That's great news! I've just started off a run here to see if it creates more reasonably sized binaries. |
Description
Does anyone know why the size of our C++ DLLs (CUDA) has suddenly increased to nearly 700 MB, compared to around 60 MB previously? I've also noticed some new files, including ggml-cuda.dll, which is over 600 MB. Could this be due to incorrect dynamic linking, instead of statically linking only the necessary code?
Thank you in advance for your help!
The text was updated successfully, but these errors were encountered: