Multi Threaded issue with CUDA compiled library 1.5.4

I've recently compiled the 1.5.4 library for with CUBLAS and having an issue where running multiple whisper_full_with_state()'s.
I did not previously have this issue with the 1.5.1 library.

I re-compiled with DEBUG_CUDA_MALLOC and it has the output:

[29-01-2024 16:04:58:258] [INFO    ] [cuda pool[0]: allocated 7680000 bytes at 302000000, used: 7680000]
[29-01-2024 16:04:58:263] [INFO    ] [cuda pool[0]: allocated 7680000 bytes at 302753000, used: 15360000]
[29-01-2024 16:04:58:267] [INFO    ] [cuda pool[0]: freed 7680000 bytes at 302000000]
[29-01-2024 16:04:58:272] [INFO    ] [GGML_ASSERT: ggml-cuda.cu:6742: ptr == (void *) (g_cuda_pool_addr[device] + g_cuda_pool_used[device])]

I can see its failed on the assert as its not de-allocating in reverse order.  It has these allocated "per device", but there is only one actual device (card).  In this case, I'd have two instances running and the first executed one has finished first.

Is this some sort of "virtual" device where each call of whisper_full_with_state needs to specify a separate device or is this something with the memory allocation?

I also noticed that libcuda.so is missing when compiling if you don't have the driver installed.  I don't have a GPU in the compiler host so I had to copy this onto the host manually. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multi Threaded issue with CUDA compiled library 1.5.4 #1814

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Multi Threaded issue with CUDA compiled library 1.5.4 #1814

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions