Testing the large v3 model on a word-by-word transcript output, when there is no audio at the end, it always adds "Thank you"