-
Notifications
You must be signed in to change notification settings - Fork 11.7k
CUDA: compress-mode size #12029
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA: compress-mode size #12029
Conversation
That's quite a lot, I didn't realize that the build with all supported archs has gotten so bad. In the windows releases it seems to be 500M, so it's not that bad, but still pretty bad. I am not exactly sure what may be the downsides of enabling this option, it would be preferable if this was optional. Enabling it by default should be ok, though. |
And so it is for linux. Even before 12.8 it was compressing by default. Either with a
They say it costs startup time, which I think would be ok for almost all ml usecases that use cuda anyway. I just hope it's not for every kernel launch. I don't have a setup right now where I can test that myself, so if anyone can help here, that would be nice. Ok, I will make it an ggml option and enable it by default. Or should I make the option a string and just pass that? (none, speed, balance, size) |
Yes, that sounds good to me. |
cuda 12.8 added the option to specify stronger compression for binaries.
d7580f2
to
6cdc5d3
Compare
cuda 12.8 added the option to specify stronger compression for binaries, so we now default to "size".
# - speed (nvcc's default) | ||
# - balance | ||
# - size | ||
list(APPEND CUDA_FLAGS -compress-mode=${GGML_CUDA_COMPRESSION_MODE}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be --compress-mode
instead? #12325
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
according to the cuda docu both are accepted, and i chose single dash, because the other options next to it are also single dash.
cuda 12.8 added the option to specify stronger compression for binaries, so we now default to "size".
If somebody has this error, a pair of CUDA 12.8 and GCC 12 solved my issue This Ubuntu shell script helped me to set things up
The
The
There is one thing, that by default my nvcc points to version 11, so just typing
Which was likely the error, I tried to compile various versions with GCC-11, but for CUDA 12.8 I needed GCC-12 |
This patch sets cuda compression mode to
size
for >= 12.8cuda 12.8 added the option to specify stronger compression for binaries.
I ran some tests in CI with the new ubuntu 12.8 docker image:
89-real
archIn this scenario, it appears it is not compressing by default at all?
60;61;70;75;80
archesI did not test the runtime load overhead this should incur. But for most ggml-cuda usecases, the processes are usually long(er) lived, so the trade-off seems reasonable to me.