You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Loading models split across GPU's isn't concurrent... or at least, they don't appear to be. From the "async uploads" log entries, it seems intended that they would be.
Additional Information:
Loading a model across 2x 3090 with cuda and cublas enabled, I see:
koboldcpp-1 | load_all_data: using async uploads for device CUDA0, buffer type CUDA0, backend CUDA0
koboldcpp-1 | ..................................................load_all_data: using async uploads for device CUDA1, buffer type CUDA1, backend CUDA1
However, the second upload appears to BEGIN after the first, rather starting at the same time and then finishing in whichever order they actually complete.
Is this intended behaviour? Maybe the log messages are misleading
The text was updated successfully, but these errors were encountered:
Describe the Issue
Loading models split across GPU's isn't concurrent... or at least, they don't appear to be. From the "async uploads" log entries, it seems intended that they would be.
Additional Information:
Loading a model across 2x 3090 with cuda and cublas enabled, I see:
koboldcpp-1 | load_all_data: using async uploads for device CUDA0, buffer type CUDA0, backend CUDA0
koboldcpp-1 | ..................................................load_all_data: using async uploads for device CUDA1, buffer type CUDA1, backend CUDA1
However, the second upload appears to BEGIN after the first, rather starting at the same time and then finishing in whichever order they actually complete.
Is this intended behaviour? Maybe the log messages are misleading
The text was updated successfully, but these errors were encountered: