-
-
Notifications
You must be signed in to change notification settings - Fork 8.4k
[Model][3/N] Automatic conversion of CrossEncoding model #20168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: wang.yuqi <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @noooop, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request, part of a series, focuses on enabling automatic conversion of ForCausalLM
models to support ForSequenceClassification
tasks. Specifically, it adds the necessary code to allow the Gemma model to be automatically adapted for sequence classification, expanding its utility within the framework.
Highlights
- Model Conversion: Implemented automatic conversion for the Gemma model, allowing
GemmaForCausalLM
to function asGemmaForSequenceClassification
by leveraging theas_seq_cls_model
adapter. - Model Registration: Registered the newly created
GemmaForSequenceClassification
class within the_MODELS
registry, making it discoverable and usable by the system for sequence classification tasks.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces automatic conversion of CrossEncoding models and adds GemmaForSequenceClassification to the model registry. The changes involve modifying gemma.py and registry.py to support this new functionality.
@DarkLight1337 @maxdebayser @22quinn 1-2 Add some background information, and how the problem arose.
3-5 are some issues that need to be discussed to get the best solution.
|
From a quick search, it seems that |
Do we even need
|
def _get_preferred_task(
self,
architectures: list[str],
supported_tasks: set[_ResolvedTask],
) -> Optional[_ResolvedTask]:
model_id = self.model
if get_pooling_config(model_id, self.revision):
return "embed"
if self.registry.is_cross_encoder_model(architectures): <- here
return "classify" The modifying inspect_model_cls and _get_preferred_task are extremely complex, let's try not to touch them. This makes it impossible for us to completely remove the is_cross_encoder_model. |
Let's first discuss two fundamental issues.
|
Since any model can be converted into a classification model via |
Although all models can be converted into a classification model via as_seq_cls_model, the corresponding weights must actually be from a *ForSequenceClassification model; otherwise, an error will occur during the loading phase. as_seq_cls_model actually avoids duplicating code by not having each ForCausalLM implement ForSequenceClassification. |
Sorry I mean that if the architecture name is |
now basically so |
Yes. I think scoring can also be considered as a type of classification task. We can adjust the naming in another PR |
I look forward to hearing your thoughts. |
Let me see if I got things right:
|
Theoretically, this (series of) PR could allow all *ForCausalLM models to automatically have *ForSequenceClassification implementation.
Using implicit as_seq_cls_model would cause a circular reference. The explicit conversion *ForCausalLM to *ForSequenceClassification will make things easier. If we want to load GemmaForSequenceClassification, and using implicit as_seq_cls_model. The routing is like this.
The code runs to _get_preferred_task def _get_preferred_task(
self,
architectures: list[str],
supported_tasks: set[_ResolvedTask],
) -> Optional[_ResolvedTask]:
model_id = self.model
if get_pooling_config(model_id, self.revision):
return "embed"
if self.registry.is_cross_encoder_model(architectures): <- here
return "classify"
if self.registry.is_transcription_model(architectures):
return "transcription" We need to infer what the task is at this time, so we don't know what the task is yet. When we don't know that the task is classify, we cannot implicitly run as_seq_cls_model(GemmaForCausalLM), making is_cross_encoder_model(architectures) true. Using implicit as_seq_cls_model would cause a circular reference. The explicit conversion *ForCausalLM to *ForSequenceClassification will make things easier. e.g. GemmaForSequenceClassification = as_seq_cls_model(GemmaForCausalLM)
"GemmaForSequenceClassification": ("gemma", "GemmaForSequenceClassification"),
|
1. Cannot implicit run as_seq_cls_model, otherwise it will cause a circular reference on is_cross_encoder_model.
previous:
vllm/vllm/model_executor/model_loader/utils.py
Lines 244 to 250 in dec197e
We hope to use as_seq_cls_model implicitly, after this pr.
But it need to use is_cross_encoder_model to get_preferred_task
The modifying inspect_model_cls and _get_preferred_task are extremely complex, let's try not to touch them.
2. what is the actual purpose of is_cross_encoder_model
pooling now divides into three tasks
Among them, the "reward" is very easy to distinguish from "embed" and "classify", so we can exclude it first.
after #19978
so the purpose of is_cross_encoder_model is to ensure the correct scoring calculation method
Redirecting *ForSequenceClassification to *ForCausalLM makes things complicated.
e.g.
"GemmaForSequenceClassification": ("gemma", "GemmaForCausalLM"),
At this time, *ForCausalLM might be used for classify and embed, need to pass the task parameter to distinguish. If parsing fails, incorrect calculation methods will be used and wrong results will be obtained.
The explicit (not implicit & automatic) conversion *ForCausalLM to *ForSequenceClassification will make things easier.
e.g.
All problems are solved effortlessly.
3. make explicit conversion code look good
use more hacky way, e.g.
Using the above syntax sugar will automatically explicitly convert.
I am not very satisfied with the methods above, and I look forward to better ones.
4. This pr actually allows all ForCausalLM to support corresponding ForSequenceClassification.
Do we really need to list out all Auto-converted architectures?
Shall we consider the above hacky way.
5. Should we retire the classify task because its naming is slightly inconsistent with actual usage?
vllm/vllm/entrypoints/llm.py
Lines 1310 to 1324 in 3c545c0
this situation is_cross_encoder_model and task == "classify" are the same
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.Purpose
Test Plan
Test Result
(Optional) Documentation Update