feature request - support huggingface tokenizer

currently in llama.cpp, `convert.py` assumes `tokenizer.model` file in the model path. seems like this works for any case that uses a `sentencepiece` tokenizer, but nothing else. 

huggingface's tokenizer library is neat and provides more options than `sentencepiece`. it would be really great if ggml support any tokenizers from huggingface. i believe this means it'd expect `merges.txt` and `vocab.json`. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feature request - support huggingface tokenizer #1764

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feature request - support huggingface tokenizer #1764

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions