-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Multi Threaded issue with CUDA compiled library 1.5.4 #1814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Does it work with the latest |
I can confirm this still happens on the master. [30-01-2024 09:09:12:174] [INFO ] [GGML_ASSERT: ggml-cuda.cu:7579: ptr == (void *) (g_cuda_pool_addr[device] + g_cuda_pool_used[device])] Line 7579 in ggml-cuda.cu: The way the allocations work in the current format, if the first call with the first allocation finishes before a second allocation you're always going to get this? Single audio file at once, obviously no issue, but a MT situation... If there is something I'm missing in the call to whisper_full_with_state to help with this. I previously have not had to. 1.5.1 was good. |
Just to make sure - you are using a different |
Yes. We retrieve a context by model type, and then state by model in each thread that calls whisper_full_with_state: struct whisper_context* ctx = (struct whisper_context*)whisperInt->getContextByModel(modelTxtName); int res = whisper_full_with_state(ctx, state, wparams, (const float*)samples, count); whisperInt->freeWhisperState(state); Construction / free of state from whisperInt obj: void* WhisperInt::getContextByModel(std::string model) void* WhisperInt::getWhisperStateByModel(std::string model) void WhisperInt::freeWhisperState(void* state) |
hi, i also meet the same problem and it can be reproduced by assigning --processor or whisper_full_parallel |
As you suggested, disabling the vmm allocator should fix the assert, but there are so many globals in the CUDA backend that are not synchronized that I can only imagine that if it works at all, it will be by chance. |
Ok, you can track the state of Sorry for the inconvenience - you might want to stick with whisper.cpp v1.5.1 for now |
No problemo. What you've done thus far is awesome. |
The CUDA backend should be thread safe now. |
I've recently compiled the 1.5.4 library for with CUBLAS and having an issue where running multiple whisper_full_with_state()'s.
I did not previously have this issue with the 1.5.1 library.
I re-compiled with DEBUG_CUDA_MALLOC and it has the output:
[29-01-2024 16:04:58:258] [INFO ] [cuda pool[0]: allocated 7680000 bytes at 302000000, used: 7680000]
[29-01-2024 16:04:58:263] [INFO ] [cuda pool[0]: allocated 7680000 bytes at 302753000, used: 15360000]
[29-01-2024 16:04:58:267] [INFO ] [cuda pool[0]: freed 7680000 bytes at 302000000]
[29-01-2024 16:04:58:272] [INFO ] [GGML_ASSERT: ggml-cuda.cu:6742: ptr == (void *) (g_cuda_pool_addr[device] + g_cuda_pool_used[device])]
I can see its failed on the assert as its not de-allocating in reverse order. It has these allocated "per device", but there is only one actual device (card). In this case, I'd have two instances running and the first executed one has finished first.
Is this some sort of "virtual" device where each call of whisper_full_with_state needs to specify a separate device or is this something with the memory allocation?
I also noticed that libcuda.so is missing when compiling if you don't have the driver installed. I don't have a GPU in the compiler host so I had to copy this onto the host manually.
The text was updated successfully, but these errors were encountered: