When doing transcription in Hindi for a file, I encounter invalid unicode character. <img width="753" alt="Screenshot 2023-12-29 at 8 29 09 PM" src="https://github.com/ggerganov/whisper.cpp/assets/7852108/340f9bab-4299-4103-9055-fa5a9db4e989"> I have noticed this with many Hindi files. Used whisper-large-v2 mode for inference on CPU. Have noticed the same issue when inferencing on GPU as well. I am guessing the issue is: whisper model token output (BPE encoded) is not getting correctly mapped to unicode characters.