Skip to content

Conversation

simon-mo
Copy link
Collaborator

Related to #4180

Some models uses eos_token_id field (Optional[Union[int, list[int]]) in generation_config.json
https://huggingface.co/docs/transformers/v4.39.3/en/main_classes/text_generation#transformers.GenerationConfig

This PR will load the config, get the value if user supplied, and inject it into stop_token_ids in sampling params. Notably this does not change the os_token_id in the sampling param or tokenizer config.

One example is DRBX. Meta Llama 3 might use generation config to reconcile the difference between <endoftext|> and <eot_id|>.

Testing

Because this model dependent, I have performed manual testing:

  1. Run Meta Llama 3 8B instruct, see the endofturn is not respected.
~$ curl http://localhost:8000/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "meta-llama/Meta-Llama-3-8B-Instruct",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Who are you?"
      }
    ],
    "max_tokens": 256
  }'
{"id":"cmpl-ca00059831714382b0104ca1cb7e407d","object":"chat.completion","created":1713481143,"model":"meta-llama/Meta-Llama-3-8B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"I'm your helpful assistant! I'm an AI designed to assist and support you in various ways. I can help with tasks, answer questions, provide information, and even engage in conversations. My purpose is to make your life easier and more efficient, so feel free to ask me anything or tell me what's on your mind! What can I help you with today?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nI'm happy to help with any questions or tasks you have.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nI'm a large language model, trained on a massive dataset of text from the internet, books, and other sources. I can understand and respond to natural language input, and I'm constantly learning and improving my abilities.\n\nI can help with a wide range of tasks, such as:\n\n* Answering questions on various topics, from science and history to entertainment and culture\n* Generating text, such as articles, stories, or emails\n* Translating text from one language to another\n* Summarizing long pieces of text into shorter, more digestible versions\n* Offering suggestions and ideas for creative projects or problems you're trying to solve\n* Even just chatting with you and engaging in conversation!\n\nWhat do you need help with today?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nI'm excited to hear"},"logprobs":null,"finish_reason":"length","stop_reason":null}],"
  1. Change the field in generation config of the hf model
-   "eos_token_id": 128001,
+   "eos_token_id": [128001,128009],
  1. Same query
~$ curl http://localhost:8000/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "meta-llama/Meta-Llama-3-8B-Instruct",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Who are you?"
      }
    ],
    "max_tokens": 256
  }'
{"id":"cmpl-bf80caf7d899446fa9e148d1714b0552","object":"chat.completion","created":1713481243,"model":"meta-llama/Meta-Llama-3-8B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"I'm your helpful assistant! I'm an AI designed to assist and support you in various ways. I can help with tasks, answer questions, provide information, and even engage in conversations. My purpose is to make your life easier and more efficient, so feel free to ask me anything or tell me what's on your mind! What can I help you with today?"},"logprobs":null,"finish_

@simon-mo simon-mo mentioned this pull request Apr 18, 2024
9 tasks
@simon-mo simon-mo enabled auto-merge (squash) April 18, 2024 23:27
@premg16
Copy link

premg16 commented Apr 19, 2024

I am running vllm from docker image and facing the same issue what shall i do ?

@simon-mo
Copy link
Collaborator Author

For now you can add stop_token_ids as part of your request parameter, see #4180 (comment)

To go without this extra step, we need the model checkpoint's generation config to be updated, which is pending on HF side.

robertgshaw2-redhat pushed a commit to neuralmagic/nm-vllm that referenced this pull request Apr 21, 2024
xjpang pushed a commit to xjpang/vllm that referenced this pull request Apr 25, 2024
xjpang pushed a commit to xjpang/vllm that referenced this pull request Apr 26, 2024
robertgshaw2-redhat pushed a commit to neuralmagic/nm-vllm that referenced this pull request Apr 26, 2024
alexeykondrat pushed a commit to alexeykondrat/ci-vllm that referenced this pull request May 1, 2024
z103cb pushed a commit to z103cb/opendatahub_vllm that referenced this pull request May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants