-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Description
Issue
Apologies if this is a known issue! I looked and couldn't find an existing one 😄
When passing audio that is less than 1s long to whisper.cpp
, regardless of the model or hardware acceleration (tested with small and medium, on both CPU only and cuBLAS), Whisper silently fails to transcribe the audio.
There's no error returned, it just looks like it skips all sampling (there's 0ms listed for sample, encode, decode, and prompt times) and returns nothing for the transcript.
Reproduction
First, make sure whisper is working by running on JFK sample:
make clean
make -j
(with or without cuBLAS)./main --model ./models/ggml-medium.en.bin bindings/go/samples/jfk.wav
- Working! Transcribes correctly.
Now use an external audio editor or ffmpeg
to trim the audio to less than a second and run again:
ffmpeg -i bindings/go/samples/jfk.wav -to 00:00:00.8 jfk-short.wav
./main --model models/ggml-medium.en.bin ./jfk-short.wav
- Not working. Mel shows some time (few ms) but encoder, decoder, prompt, and batchd times show 0ms, and there's no transcript. I tested on multiple clips to ensure it wasn't just this one, and I can consistently reproduce.
If you pad the audio to more than a second, transcripts appear again:
ffmpeg -i jfk-short.wav -af "apad=pad_dur=1" jfk-padded.wav
./main --model models/ggml-medium.en.bin ./jfk-padded.wav
- Working again! Shows the (cut off) transcript.
System Info
Tested on Ubuntu 22.04, amd64
, both with and without cuBLAS enabled.