Skip to content

"async" CUDA<n> uploads appear to be non-concurrent #1472

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
lee-b opened this issue Apr 8, 2025 · 1 comment
Open

"async" CUDA<n> uploads appear to be non-concurrent #1472

lee-b opened this issue Apr 8, 2025 · 1 comment

Comments

@lee-b
Copy link

lee-b commented Apr 8, 2025

Describe the Issue

Loading models split across GPU's isn't concurrent... or at least, they don't appear to be. From the "async uploads" log entries, it seems intended that they would be.

Additional Information:

Loading a model across 2x 3090 with cuda and cublas enabled, I see:

koboldcpp-1 | load_all_data: using async uploads for device CUDA0, buffer type CUDA0, backend CUDA0
koboldcpp-1 | ..................................................load_all_data: using async uploads for device CUDA1, buffer type CUDA1, backend CUDA1

However, the second upload appears to BEGIN after the first, rather starting at the same time and then finishing in whichever order they actually complete.

Is this intended behaviour? Maybe the log messages are misleading

@Vladonai
Copy link

Vladonai commented Apr 8, 2025

I have noticed this behavior too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants