Skip to content

feature request - support huggingface tokenizer #1764

Closed
@keunwoochoi

Description

@keunwoochoi

currently in llama.cpp, convert.py assumes tokenizer.model file in the model path. seems like this works for any case that uses a sentencepiece tokenizer, but nothing else.

huggingface's tokenizer library is neat and provides more options than sentencepiece. it would be really great if ggml support any tokenizers from huggingface. i believe this means it'd expect merges.txt and vocab.json.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions