Skip to content

Add phi 3 chat template #6857

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions llama.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -17257,6 +17257,15 @@ static int32_t llama_chat_apply_template_internal(
if (add_ass) {
ss << "<|start_header_id|>assistant<|end_header_id|>\n\n";
}
} else if (tmpl == "phi3" || (tmpl.find("<|assistant|>") != std::string::npos && tmpl.find("<|end|>") != std::string::npos )) {
// Phi 3
for (auto message : chat) {
std::string role(message->role);
ss << "<|" << role << "|>\n" << trim(message->content) << "<|end|>\n";
}
if (add_ass) {
ss << "<|assistant|>\n";
}
} else {
// template not supported
return -1;
Expand Down
4 changes: 4 additions & 0 deletions tests/test-chat-template.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,8 @@ int main(void) {
"{{ bos_token }}{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% elif false == true %}{% set loop_messages = messages %}{% set system_message = 'You are Command-R, a brilliant, sophisticated, AI-assistant trained to assist human users by providing thorough responses. You are trained by Cohere.' %}{% else %}{% set loop_messages = messages %}{% set system_message = false %}{% endif %}{% if system_message != false %}{{ '<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>' + system_message + '<|END_OF_TURN_TOKEN|>' }}{% endif %}{% for message in loop_messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% set content = message['content'] %}{% if message['role'] == 'user' %}{{ '<|START_OF_TURN_TOKEN|><|USER_TOKEN|>' + content.strip() + '<|END_OF_TURN_TOKEN|>' }}{% elif message['role'] == 'assistant' %}{{ '<|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>' + content.strip() + '<|END_OF_TURN_TOKEN|>' }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>' }}{% endif %}",
// Llama-3
"{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}",
// Phi-3
"{{ bos_token }}{% for message in messages %}{{'<|' + message['role'] + '|>' + ' ' + message['content'] + '<|end|> ' }}{% endfor %}{% if add_generation_prompt %}{{ '<|assistant|> ' }}{% else %}{{ eos_token }}{% endif %}"
Copy link
Collaborator

@ngxson ngxson May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tristandruyen This template uses a space character ' ' after each special token, but on your cpp version, you used new line "\n". Can you check if it is correct?

Copy link
Contributor Author

@tristandruyen tristandruyen May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the Chat Format documented in the huggingface README's there should be a \n after <|user|> and <|end|>

https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/blob/main/README.md#chat-format

Copy link
Contributor Author

@tristandruyen tristandruyen May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While they do not explicitly state that there should be a \n after <|assistant|> the model nearly always generates a \n as a first token when it is left out...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the Chat Format documented in the huggingface README's there should be a \n after <|user|> and <|end|>

It's best to look at the code directly rather than README. Sometimes human copy-paste can have mistakes.

On the same repo that you linked to, you can see the jinja template code here:

https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/blob/main/tokenizer_config.json#L119

This file have '<|user|>' + '\n' which confirms that there should be a new line after special tokens.

However, that does not explain why your jinja version does not have it. Remember, jinja templates are deterministic, meaning your cpp code need to follow 100% what is being coded inside the template. This mean we can fix this PR in one of 2 ways: (1) correct the jinja template to have \n or (2) correct the cpp implementation output ' ' instead of new line.

Copy link
Contributor Author

@tristandruyen tristandruyen May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems there are even more version of the template, I guess i had an outdated tokeinzer_config.json when making my GGUF's, because I just took the jinja template from the gguf metadata of my recent phi3 quants, which should just be a copy of what was in the tokenizer_config.json, shouldn't it ?

I think I found the reason, the huggingface GUI for displaying GGUF metadata, seems to replace \n with whitespace for some weird reason....
I'll update the template with the correct one....

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: there are indeed 2 versions, but they all use new line:

I think it's best to somehow scan all template in existing phi-3 family, then make a fix PR. Without careful research, I doubt that there will be more confusions in the future

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I found the reason, the huggingface GUI for displaying GGUF metadata, seems to replace \n with whitespace for some weird reason....

I'd agree that there is a bad design in the jinja templates (in general). Since '\n' is wrapped inside a json string, it should be converted to " '\\n' ". But for some reasons, jinja doesn't care about this mistake (that's why it get converted to space in the gguf viewer on huggingface).

Unfortunately, this is not something we could fix on our side.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the PR :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tristandruyen Please have a look of complete list of phi-3 templates: https://gist.github.com/ngxson/38d8d89fb5d856187bcc70e78b1cc4f5

Copy link
Contributor Author

@tristandruyen tristandruyen May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the list.
So it seems to me like these are 4 different ways to get basically the same result.
There are some difference but AFAIK they should not matter here:

  • the bos token for small & mini, should be handled by tokenizer.ggml.add_bos_token
  • the eos token for mini should be handled likewise by add_eos_token

So the current c++ code seems to do the correct thing in my understanding.

I think adding all variants as test would be good, the current matching via <|assistant|> and <|end|> should work for all of them.

};
std::vector<std::string> expected_output = {
// teknium/OpenHermes-2.5-Mistral-7B
Expand Down Expand Up @@ -77,6 +79,8 @@ int main(void) {
"<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>You are a helpful assistant<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|>Hello<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>Hi there<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|>Who are you<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>I am an assistant<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|>Another question<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>",
// Llama 3
"<|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nHello<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nHi there<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nWho are you<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nI am an assistant<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nAnother question<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
// Phi 3
"<|system|>\nYou are a helpful assistant<|end|>\n<|user|>\nHello<|end|>\n<|assistant|>\nHi there<|end|>\n<|user|>\nWho are you<|end|>\n<|assistant|>\nI am an assistant<|end|>\n<|user|>\nAnother question<|end|>\n<|assistant|>\n",
};
std::vector<char> formatted_chat(1024);
int32_t res;
Expand Down
Loading