Skip to content

Conversation

jacobthebanana
Copy link
Contributor

@jacobthebanana jacobthebanana commented Feb 9, 2025

This PR allows the user to specify a "trigger token" that needs to be produced before xgrammar is applied to structured decoding. For example, when generating with r1-like models, the end-of-thought token </think> can be the trigger token, as seen in the example in the added unit test.

Additional work might be required to:

  • Extend these logic to the other logit processor options:
    • outlines
    • lm-format-enforcer
  • Allow the user to specify strings consisting of more than one tokens (e.g., JSON Output: or \boxed in math prompts) as the trigger for structured decoding.

FIX #12619

I was not aware of #12955 from Saturday morning before I started working on this PR on Sunday- I apologize to @gaocegege if this PR partially overlapped with their contribution. From what I understand, the main difference between these two PR is the handling of batch_size in xgrammar_decoding, in case more than one stream of generations are being sent through this logic processor at a time. Though it is unclear whether that would ever be the case in the current setup.

Copy link

github-actions bot commented Feb 9, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@jacobthebanana
Copy link
Contributor Author

Output from unit test:

$ python \
-m pytest \
--maxfail=1 \
--disable-warnings \
-sv tests/entrypoints/llm/test_guided_generate.py::test_guided_json_for_reasoning
Prompt: 'Solve 8x + 7 = -23. Summarize your steps in JSON format. <think>', 
Generated text: '\nI need to solve the equation 8x + 7 = -23. First, I\'ll subtract 7 from both sides to isolate the term with the variable. This gives me 8x = -30.\n\nNext, I\'ll divide both sides by 8 to solve for x, resulting in x = -30/8, which simplifies to x = -15/4.\n</think>{\n\n  "steps": [\n    {\n      "explanation": "To solve the equation 8x + 7 = -23 for x, follow these steps.",\n      "output": "Calculate the initial step by isolating the term with x.\\n\\nSubtract 7 from both sides:\\n8x + 7 - 7 = -23 - 7\\n8x = -30\\n\\nThen, solve for x by dividing both sides by 8:\\n8x/8 = -30/8\\nx = -15/4"\n    }\n  ]\n  ,\n  "final_answer": "x = -15/4"\n}'
Reasoning output: "\nI need to solve the equation 8x + 7 = -23. First, I'll subtract 7 from both sides to isolate the term with the variable. This gives me 8x = -30.\n\nNext, I'll divide both sides by 8 to solve for x, resulting in x = -30/8, which simplifies to x = -15/4.\n", 
Structured output: '{\n\n  "steps": [\n    {\n      "explanation": "To solve the equation 8x + 7 = -23 for x, follow these steps.",\n      "output": "Calculate the initial step by isolating the term with x.\\n\\nSubtract 7 from both sides:\\n8x + 7 - 7 = -23 - 7\\n8x = -30\\n\\nThen, solve for x by dividing both sides by 8:\\n8x/8 = -30/8\\nx = -15/4"\n    }\n  ]\n  ,\n  "final_answer": "x = -15/4"\n}'

(new lines added for readability)

@jacobthebanana
Copy link
Contributor Author

Closing this PR in favor of #12955

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature]: Only apply Guided/Structured grammar after reasoning steps in Reasoning models
1 participant