Implement non-greedy tokenizer that tries to maximize token lengths #242

thement · 2023-03-17T16:36:53Z

No description provided.

Piezoid · 2023-03-17T16:57:38Z

Although I haven't examined the code, I've tested it on several prompts and can already conclude that this patch allows Llama to write in French.
Previously it spouted nonsense before falling back to english.

utils.cpp

Piezoid · 2023-03-17T17:30:04Z

main.cpp

@@ -846,6 +846,7 @@ int main(int argc, char ** argv) {
    std::vector<float> logits;

    // tokenize the prompt
+    params.prompt.insert(0, 1, ' ');


Is the space meant to be a separate token? I noticed that it often get fused with the first user provided token.

It should be fused to the first token! This is how original python llama code parses it.
I can dig out more details if you want.

ggerganov

Merge it if results look ok.
I won't be able to have detailed look in the next few days

- this is to match original llama tokenizer behavior

…gml-org#242) * Implement non-greedy tokenizer that tries to maximize token lengths * Insert single space in front of the prompt - this is to match original llama tokenizer behavior --------- Co-authored-by: Jakub Horak <[email protected]>

thement mentioned this pull request Mar 17, 2023

Differences with the llama tokenizer #167

Closed

thement marked this pull request as ready for review March 17, 2023 17:14

j-f1 reviewed Mar 17, 2023

View reviewed changes

utils.cpp Outdated Show resolved Hide resolved

Piezoid reviewed Mar 17, 2023

View reviewed changes

ggerganov approved these changes Mar 17, 2023

View reviewed changes

jxhor added 2 commits March 17, 2023 18:43

Implement non-greedy tokenizer that tries to maximize token lengths

7e1041a

Insert single space in front of the prompt

7566d1a

- this is to match original llama tokenizer behavior

thement merged commit c9f670a into ggml-org:master Mar 17, 2023

ivanstepanovftw mentioned this pull request May 4, 2023

Avoid hardcoding a space at the beginning of the prompt. #1315

Closed

Bearsaerker mentioned this pull request Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement non-greedy tokenizer that tries to maximize token lengths #242

Implement non-greedy tokenizer that tries to maximize token lengths #242

thement commented Mar 17, 2023

Piezoid commented Mar 17, 2023

Piezoid Mar 17, 2023

thement Mar 17, 2023 •

edited

Loading

ggerganov left a comment

Implement non-greedy tokenizer that tries to maximize token lengths #242

Implement non-greedy tokenizer that tries to maximize token lengths #242

Conversation

thement commented Mar 17, 2023

Piezoid commented Mar 17, 2023

Piezoid Mar 17, 2023

Choose a reason for hiding this comment

thement Mar 17, 2023 • edited Loading

Choose a reason for hiding this comment

ggerganov left a comment

Choose a reason for hiding this comment

thement Mar 17, 2023 •

edited

Loading