Skip to content

Audio less than 1s long silently fails all transcription #1603

@isaac-mcfadyen

Description

@isaac-mcfadyen

Issue

Apologies if this is a known issue! I looked and couldn't find an existing one 😄

When passing audio that is less than 1s long to whisper.cpp, regardless of the model or hardware acceleration (tested with small and medium, on both CPU only and cuBLAS), Whisper silently fails to transcribe the audio.

There's no error returned, it just looks like it skips all sampling (there's 0ms listed for sample, encode, decode, and prompt times) and returns nothing for the transcript.

Reproduction

First, make sure whisper is working by running on JFK sample:

  • make clean
  • make -j (with or without cuBLAS)
  • ./main --model ./models/ggml-medium.en.bin bindings/go/samples/jfk.wav
    • Working! Transcribes correctly.

Now use an external audio editor or ffmpeg to trim the audio to less than a second and run again:

  • ffmpeg -i bindings/go/samples/jfk.wav -to 00:00:00.8 jfk-short.wav
  • ./main --model models/ggml-medium.en.bin ./jfk-short.wav
    • Not working. Mel shows some time (few ms) but encoder, decoder, prompt, and batchd times show 0ms, and there's no transcript. I tested on multiple clips to ensure it wasn't just this one, and I can consistently reproduce.

If you pad the audio to more than a second, transcripts appear again:

  • ffmpeg -i jfk-short.wav -af "apad=pad_dur=1" jfk-padded.wav
  • ./main --model models/ggml-medium.en.bin ./jfk-padded.wav
    • Working again! Shows the (cut off) transcript.

System Info

Tested on Ubuntu 22.04, amd64, both with and without cuBLAS enabled.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions