"async" CUDA<n> uploads appear to be non-concurrent #1472

lee-b · 2025-04-08T08:29:41Z

Describe the Issue

Loading models split across GPU's isn't concurrent... or at least, they don't appear to be. From the "async uploads" log entries, it seems intended that they would be.

Additional Information:

Loading a model across 2x 3090 with cuda and cublas enabled, I see:

koboldcpp-1 | load_all_data: using async uploads for device CUDA0, buffer type CUDA0, backend CUDA0
koboldcpp-1 | ..................................................load_all_data: using async uploads for device CUDA1, buffer type CUDA1, backend CUDA1

However, the second upload appears to BEGIN after the first, rather starting at the same time and then finishing in whichever order they actually complete.

Is this intended behaviour? Maybe the log messages are misleading

Vladonai · 2025-04-08T10:45:15Z

I have noticed this behavior too.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"async" CUDA<n> uploads appear to be non-concurrent #1472

"async" CUDA<n> uploads appear to be non-concurrent #1472

lee-b commented Apr 8, 2025

Vladonai commented Apr 8, 2025

"async" CUDA<n> uploads appear to be non-concurrent #1472

"async" CUDA<n> uploads appear to be non-concurrent #1472

Comments

lee-b commented Apr 8, 2025

Vladonai commented Apr 8, 2025