[Model][3/N] Automatic conversion of CrossEncoding model #20168

noooop · 2025-06-27T05:52:44Z

1. Cannot implicit run as_seq_cls_model, otherwise it will cause a circular reference on is_cross_encoder_model.

Lines 244 to 250 in dec197e

    
           model_cls, arch = ModelRegistry.resolve_model_cls(architectures) 
        
           if model_config.task == "embed": 
        
               model_cls = as_embedding_model(model_cls) 
        
           elif model_config.task == "classify": 
        
               model_cls = as_classification_model(model_cls) 
        
           elif model_config.task == "reward": 
        
               model_cls = as_reward_model(model_cls)

We hope to use as_seq_cls_model implicitly, after this pr.

    model_cls, arch = ModelRegistry.resolve_model_cls(architectures)
    if model_config.task == "embed":
        model_cls = as_embedding_model(model_cls)
    elif model_config.task == "classify":
        model_cls = as_seq_cls_model(model_cls)
    elif model_config.task == "reward":
        model_cls = as_reward_model(model_cls)

but _ModelRegistry.is_cross_encoder_model does not consider implicitly conversion

    @property
    def is_cross_encoder(self) -> bool:
        return self.registry.is_cross_encoder_model(self.architectures)  <- here

    def is_cross_encoder_model(
        self,
        architectures: Union[str, list[str]],
    ) -> bool:
        model_cls, _ = self.inspect_model_cls(architectures)  <- here
        return model_cls.supports_cross_encoding

@dataclass(frozen=True)
class _LazyRegisteredModel(_BaseRegisteredModel):
    """
    Represents a model that has not been imported in the main process.
    """
    module_name: str
    class_name: str

    # Performed in another process to avoid initializing CUDA
    def inspect_model_cls(self) -> _ModelInfo:
        return _run_in_subprocess(
            lambda: _ModelInfo.from_model_cls(self.load_model_cls()))  <- here 

    def load_model_cls(self) -> type[nn.Module]:
        mod = importlib.import_module(self.module_name)
        return getattr(mod, self.class_name)

    @staticmethod
    def from_model_cls(model: type[nn.Module]) -> "_ModelInfo":
        return _ModelInfo(
            architecture=model.__name__,
            is_text_generation_model=is_text_generation_model(model),
            is_pooling_model=True,  # Can convert any model into a pooling model
            supports_cross_encoding=supports_cross_encoding(model),   <- here # this model expected as_seq_cls_model(GemmaForCausalLM), but actually is GemmaForCausalLM, 
            supports_multimodal=supports_multimodal(model),
            supports_pp=supports_pp(model),
            has_inner_state=has_inner_state(model),
            is_attention_free=is_attention_free(model),
            is_hybrid=is_hybrid(model),
            supports_transcription=supports_transcription(model),
            supports_v0_only=supports_v0_only(model),
            has_noops=has_noops(model),
        )

When we tried to add a task parameter to registry.is_cross_encoder_model

    def _get_preferred_task(
        self,
        architectures: list[str],
        supported_tasks: set[_ResolvedTask],
    ) -> Optional[_ResolvedTask]:
        model_id = self.model
        if get_pooling_config(model_id, self.revision):
            return "embed"
        if self.registry.is_cross_encoder_model(architectures):  <- here
            return "classify"
        if self.registry.is_transcription_model(architectures):
            return "transcription"

But it need to use is_cross_encoder_model to get_preferred_task

A circular reference occurred

The modifying inspect_model_cls and _get_preferred_task are extremely complex, let's try not to touch them.

2. what is the actual purpose of is_cross_encoder_model

pooling now divides into three tasks

"pooling": ["embed", "classify", "reward"],

Among them, the "reward" is very easy to distinguish from "embed" and "classify", so we can exclude it first.

after #19978

When task_option == "embed" or *ForSequenceClassification & num_labels == 1, then allows users to use the score API.

The score calculation method for embed is embedding cosine distance.

The score calculation method for embed is from classification head for num_labels == 1

so the purpose of is_cross_encoder_model is to ensure the correct scoring calculation method

Redirecting *ForSequenceClassification to *ForCausalLM makes things complicated.

e.g.

"GemmaForSequenceClassification": ("gemma", "GemmaForCausalLM"),

At this time, *ForCausalLM might be used for classify and embed, need to pass the task parameter to distinguish. If parsing fails, incorrect calculation methods will be used and wrong results will be obtained.

The explicit (not implicit & automatic) conversion *ForCausalLM to *ForSequenceClassification will make things easier.

e.g.

GemmaForSequenceClassification = as_seq_cls_model(GemmaForCausalLM)

"GemmaForSequenceClassification": ("gemma", "GemmaForSequenceClassification"),

All problems are solved effortlessly.

3. make explicit conversion code look good

Solution 1

GemmaForSequenceClassification = as_seq_cls_model(GemmaForCausalLM)

Solution 2

class GemmaForSequenceClassification(as_seq_cls_model(GemmaForCausalLM)):
    pass

Solution 3
use more hacky way, e.g.

"GemmaForSequenceClassification": ("gemma", "as_seq_cls_model+GemmaForCausalLM"),

Using the above syntax sugar will automatically explicitly convert.

Solution 4
I am not very satisfied with the methods above, and I look forward to better ones.

4. This pr actually allows all ForCausalLM to support corresponding ForSequenceClassification.

Do we really need to list out all Auto-converted architectures?

Shall we consider the above hacky way.

5. Should we retire the classify task because its naming is slightly inconsistent with actual usage?

is_cross_encoder_model is used to automatically distinguish between "embed" and "classify" tasks, in _get_preferred_task.

    def _get_preferred_task(
        self,
        architectures: list[str],
        supported_tasks: set[_ResolvedTask],
    ) -> Optional[_ResolvedTask]:
        model_id = self.model
        if get_pooling_config(model_id, self.revision):
            return "embed"
        if self.registry.is_cross_encoder_model(architectures):  <- here
            return "classify"

is_cross_encoder_model is used to distinguish between _cross_encoding_score and _embedding_score in LLM. score.

vllm/vllm/entrypoints/llm.py

Lines 1310 to 1324 in 3c545c0

    
           if self.llm_engine.model_config.is_cross_encoder: 
        
               return self._cross_encoding_score(tokenizer, input_text_1, 
        
                                                 input_text_2, 
        
                                                 truncate_prompt_tokens, use_tqdm, 
        
                                                 lora_request, 
        
                                                 prompt_adapter_request) 
        
           else: 
        
               return self._embedding_score( 
        
                   tokenizer, 
        
                   input_text_1,  # type: ignore[arg-type] 
        
                   input_text_2,  # type: ignore[arg-type] 
        
                   truncate_prompt_tokens, 
        
                   use_tqdm, 
        
                   lora_request, 
        
                   prompt_adapter_request)

this situation is_cross_encoder_model and task == "classify" are the same

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Signed-off-by: wang.yuqi <[email protected]>

github-actions · 2025-06-27T05:52:53Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Summary of Changes

Hello @noooop, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request, part of a series, focuses on enabling automatic conversion of ForCausalLM models to support ForSequenceClassification tasks. Specifically, it adds the necessary code to allow the Gemma model to be automatically adapted for sequence classification, expanding its utility within the framework.

Highlights

Model Conversion: Implemented automatic conversion for the Gemma model, allowing GemmaForCausalLM to function as GemmaForSequenceClassification by leveraging the as_seq_cls_model adapter.
Model Registration: Registered the newly created GemmaForSequenceClassification class within the _MODELS registry, making it discoverable and usable by the system for sequence classification tasks.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces automatic conversion of CrossEncoding models and adds GemmaForSequenceClassification to the model registry. The changes involve modifying gemma.py and registry.py to support this new functionality.

vllm/model_executor/models/registry.py

noooop · 2025-06-27T07:34:02Z

@DarkLight1337 @maxdebayser @22quinn

1-2 Add some background information, and how the problem arose.

Cannot implicit run as_seq_cls_model, otherwise it will cause a circular reference on is_cross_encoder_model.
what is the actual purpose of is_cross_encoder_model

3-5 are some issues that need to be discussed to get the best solution.

make explicit conversion code look good
This pr actually allows all ForCausalLM to support corresponding ForSequenceClassification.
Do we really need to list out all Auto-converted architectures?
Should we retire the classify task because its naming is slightly inconsistent with actual usage?

DarkLight1337 · 2025-06-27T13:22:53Z

From a quick search, it seems that ModelConfig.is_cross_encoder is only used to switch between using the pooling results directly to get the score (assuming that pooled outputs are classification scores) vs. applying cosine similarity on the pooling results (assuming that the pooled outputs are embeddings).

maxdebayser · 2025-06-27T22:08:56Z

Do we even need is_cross_encoder anymore? If "score" is just an API and no longer a task, then when "score" is called:

if the model task is embed, the prompts will be passed individually to the model and the results will be computed with cosine similarity
if the model task is classify, the prompts will be passed pairwise to the tokenizer and the result will be the score returned by the model on the tokenizer output.

noooop · 2025-06-28T03:50:40Z

Do we even need is_cross_encoder anymore? If "score" is just an API and no longer a task, then when "score" is called:

is_cross_encoder_model is used to automatically distinguish between "embed" and "classify" tasks, in _get_preferred_task.

    def _get_preferred_task(
        self,
        architectures: list[str],
        supported_tasks: set[_ResolvedTask],
    ) -> Optional[_ResolvedTask]:
        model_id = self.model
        if get_pooling_config(model_id, self.revision):
            return "embed"
        if self.registry.is_cross_encoder_model(architectures):  <- here
            return "classify"

The modifying inspect_model_cls and _get_preferred_task are extremely complex, let's try not to touch them.

This makes it impossible for us to completely remove the is_cross_encoder_model.

noooop · 2025-06-30T04:11:05Z

Let's first discuss two fundamental issues.

Is it better to use explicit (not implicit & automatic) conversion from *ForCausalLM to *ForSequenceClassification to avoid modifying inspect_model_cls and _get_preferred_task?
Should we rename the current "classify" task because its naming is slightly inconsistent with actual usage?

DarkLight1337 · 2025-06-30T05:29:41Z

Since any model can be converted into a classification model via as_seq_cls_model, if you let the adapter support cross encoding, then I think we should consider any model to support cross-encoding without having to check is_cross_encoder. During inference, we can use is_cross_encoder to check the architecture that is finally used in order to switch between the different modes of Score API.

noooop · 2025-06-30T05:39:58Z

Although all models can be converted into a classification model via as_seq_cls_model, the corresponding weights must actually be from a *ForSequenceClassification model; otherwise, an error will occur during the loading phase.

as_seq_cls_model actually avoids duplicating code by not having each ForCausalLM implement ForSequenceClassification.

DarkLight1337 · 2025-06-30T05:55:10Z

the corresponding weights must actually be from a *ForSequenceClassification model; otherwise, an error will occur during the loading phase.

Sorry I mean that if the architecture name is *ForSequenceClassification, we can assume it supports cross encoding

noooop · 2025-06-30T06:01:36Z

Sorry I mean that if the architecture name is *ForSequenceClassification, we can assume it supports cross encoding

now basically task=="classify"/ the architecture name is *ForSequenceClassification / is_cross_encoder is True are equal.

so task=="classify" naming is slightly inconsistent with actual usage

DarkLight1337 · 2025-06-30T06:14:29Z

Yes. I think scoring can also be considered as a type of classification task. We can adjust the naming in another PR

noooop · 2025-06-30T10:46:46Z

Let's first discuss two fundamental issues.

Is it better to use explicit (not implicit & automatic) conversion from *ForCausalLM to *ForSequenceClassification to avoid modifying inspect_model_cls and _get_preferred_task?

Should we rename the current "classify" task because its naming is slightly inconsistent with actual usage?

@maxdebayser

I look forward to hearing your thoughts.

maxdebayser · 2025-06-30T20:17:20Z

Let me see if I got things right:

Automatic conversion is required so that the user can load a model and pass --task classify, right?
So if we require explicit conversion, that wouldn't work, right?

The modifying inspect_model_cls and _get_preferred_task are extremely complex, let's try not to touch them.
I'm not sure why this is the case. If we can treat cross encoding as a special case of classification, we could drop the the if self.registry.is_cross_encoder_model and just rely on the ForSequenceClassification pattern to determine that the preferred task is classify. Or not? Perhaps I'm not seeing all the ramifications here.

noooop · 2025-07-01T01:06:24Z

@maxdebayser

Automatic conversion is required so that the user can load a model and pass --task classify, right?

Theoretically, this (series of) PR could allow all *ForCausalLM models to automatically have *ForSequenceClassification implementation.

So if we require explicit conversion, that wouldn't work, right?

Using implicit as_seq_cls_model would cause a circular reference.

The explicit conversion *ForCausalLM to *ForSequenceClassification will make things easier.

If we want to load GemmaForSequenceClassification, and using implicit as_seq_cls_model.

The routing is like this.

"GemmaForSequenceClassification": ("gemma", "GemmaForCausalLM"),

The code runs to _get_preferred_task

    def _get_preferred_task(
        self,
        architectures: list[str],
        supported_tasks: set[_ResolvedTask],
    ) -> Optional[_ResolvedTask]:
        model_id = self.model
        if get_pooling_config(model_id, self.revision):
            return "embed"
        if self.registry.is_cross_encoder_model(architectures):  <- here
            return "classify"
        if self.registry.is_transcription_model(architectures):
            return "transcription"

We need to infer what the task is at this time, so we don't know what the task is yet.

When we don't know that the task is classify, we cannot implicitly run as_seq_cls_model(GemmaForCausalLM), making is_cross_encoder_model(architectures) true.

Using implicit as_seq_cls_model would cause a circular reference.

The explicit conversion *ForCausalLM to *ForSequenceClassification will make things easier.

e.g.

GemmaForSequenceClassification = as_seq_cls_model(GemmaForCausalLM)

"GemmaForSequenceClassification": ("gemma", "GemmaForSequenceClassification"),

I think you shouldn't delete is_cross_encoder_model to avoid encountering support for cross_encoder where the model name isn't *ForSequenceClassification in the future.

+ GemmaForSequenceClassification

0298ff8

Signed-off-by: wang.yuqi <[email protected]>

gemini-code-assist bot reviewed Jun 27, 2025

View reviewed changes

noooop mentioned this pull request Jun 27, 2025

[Model][2/N] Automatic conversion of CrossEncoding model #19978

Open

4 tasks

gemini-code-assist bot reviewed Jun 27, 2025

View reviewed changes

vllm/model_executor/models/registry.py Show resolved Hide resolved

	model_cls, arch = ModelRegistry.resolve_model_cls(architectures)
	if model_config.task == "embed":
	model_cls = as_embedding_model(model_cls)
	elif model_config.task == "classify":
	model_cls = as_classification_model(model_cls)
	elif model_config.task == "reward":
	model_cls = as_reward_model(model_cls)

	if self.llm_engine.model_config.is_cross_encoder:
	return self._cross_encoding_score(tokenizer, input_text_1,
	input_text_2,
	truncate_prompt_tokens, use_tqdm,
	lora_request,
	prompt_adapter_request)
	else:
	return self._embedding_score(
	tokenizer,
	input_text_1, # type: ignore[arg-type]
	input_text_2, # type: ignore[arg-type]
	truncate_prompt_tokens,
	use_tqdm,
	lora_request,
	prompt_adapter_request)

Uh oh!

[Model][3/N] Automatic conversion of CrossEncoding model #20168

Are you sure you want to change the base?

[Model][3/N] Automatic conversion of CrossEncoding model #20168

Conversation

noooop commented Jun 27, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Cannot implicit run as_seq_cls_model, otherwise it will cause a circular reference on is_cross_encoder_model.

2. what is the actual purpose of is_cross_encoder_model

3. make explicit conversion code look good

4. This pr actually allows all ForCausalLM to support corresponding ForSequenceClassification.

5. Should we retire the classify task because its naming is slightly inconsistent with actual usage?

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Jun 27, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

noooop commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 commented Jun 27, 2025

Uh oh!

maxdebayser commented Jun 27, 2025

Uh oh!

noooop commented Jun 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

noooop commented Jun 30, 2025

Uh oh!

DarkLight1337 commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

noooop commented Jun 30, 2025

Uh oh!

DarkLight1337 commented Jun 30, 2025

Uh oh!

noooop commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 commented Jun 30, 2025

Uh oh!

noooop commented Jun 30, 2025

Uh oh!

maxdebayser commented Jun 30, 2025

Uh oh!

noooop commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

noooop commented Jun 27, 2025 •

edited by github-actions bot

Loading

noooop commented Jun 27, 2025 •

edited

Loading

noooop commented Jun 28, 2025 •

edited

Loading

DarkLight1337 commented Jun 30, 2025 •

edited

Loading

noooop commented Jun 30, 2025 •

edited

Loading

noooop commented Jul 1, 2025 •

edited

Loading