Closed
Description
currently in llama.cpp, convert.py
assumes tokenizer.model
file in the model path. seems like this works for any case that uses a sentencepiece
tokenizer, but nothing else.
huggingface's tokenizer library is neat and provides more options than sentencepiece
. it would be really great if ggml support any tokenizers from huggingface. i believe this means it'd expect merges.txt
and vocab.json
.