[Bugfix] Fix Dense module loading for sentence-transformers embedding models (simplified version) #23019

FFFfff1FFFfff · 2025-08-16T03:44:33Z

Purpose:

This PR adds automatic support for Sentence-Transformers Dense projection layers in vLLM, enabling proper handling of models that require dimension transformation (e.g., 1024→1792) during embedding generation.

Resolves the following issues:

Missing Dense projection functionality for ST models in vLLM
Incorrect output dimensions (1024 instead of expected 1792 for models like TencentBAC/Conan-embedding-v1)
Ensures numerical consistency with HuggingFace Sentence-Transformers implementation

Key Modifications

pooler.py: Enhanced EmbeddingPoolerHead with projector support, device sync, and dimension validation
adapters.py: Added _load_st_projector() to detect and load Dense layers from ST models
bert.py: Integrated ST projector support in BERT embedding models
config.py: Added get_hf_file_bytes() utility for loading model files

New Version Improvements

Simplified code: Removed complex token encoding logic, better error handling with specific exceptions
Enhanced testing: Added test_embed_models_mteb for MTEB compatibility validation
Numerical stability: Explicit float32 handling for projection operations

Test
python -m pytest tests/models/language/pooling/test_st_projector.py -v

Test Result

tests/models/language/pooling/test_st_projector.py::test_embed_models_mteb PASSED
tests/models/language/pooling/test_st_projector.py::test_st_projector_loading PASSED
tests/models/language/pooling/test_st_projector.py::test_compare_with_hf_dimensions PASSED
tests/models/language/pooling/test_st_projector.py::test_embedding_numerical_similarity PASSED
tests/models/language/pooling/test_st_projector.py::test_embedding_quality_checks PASSED

Signed-off-by: FFFfff1FFFfff <[email protected]>

github-actions · 2025-08-16T03:44:41Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request introduces support for Sentence-Transformers Dense projection layers, a crucial enhancement for handling a broader range of embedding models. The implementation is well-structured, incorporating new helper functions for loading weights and validating configurations, and includes a comprehensive test suite. I've identified one critical issue concerning the handling of empty inputs which could lead to a server crash. Please see the detailed comment below.

vllm/model_executor/layers/pooler.py

Signed-off-by: FFFfff1FFFfff <[email protected]>

DarkLight1337 · 2025-08-16T06:32:53Z

cc @noooop

vllm/model_executor/models/adapters.py

Signed-off-by: FFFfff1FFFfff <[email protected]>

DarkLight1337 · 2025-08-17T02:59:01Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces support for Sentence-Transformers Dense projection layers, which is a valuable addition for handling a wider range of embedding models. The implementation appears robust, and the new tests are comprehensive, covering loading, dimensionality, numerical similarity, and quality checks. However, I've identified a critical issue in vllm/model_executor/layers/pooler.py where a change inadvertently alters the behavior of reward models by replacing RewardPoolerHead with EmbeddingPoolerHead for the 'encode' task. This could lead to incorrect outputs for reward modeling tasks and needs to be addressed.

gemini-code-assist · 2025-08-17T03:00:12Z

vllm/model_executor/layers/pooler.py

        elif pooler_config.task == "encode":
-            head = RewardPoolerHead()
+            head = EmbeddingPoolerHead()  # no projector


This change replaces RewardPoolerHead with EmbeddingPoolerHead for the encode task. This appears to be a regression, as the encode task is used by reward models, which typically output unnormalized scores (often processed with sigmoid or softmax). RewardPoolerHead correctly handles this. In contrast, EmbeddingPoolerHead is designed for embedding models and applies normalization, which is not the desired behavior for reward models. This change will likely break reward model functionality.

Suggested change

elif pooler_config.task == "encode":

head = RewardPoolerHead()

head = EmbeddingPoolerHead() # no projector

elif pooler_config.task == "encode":

head = RewardPoolerHead()

Issue addressed. Ready for review. Thanks!! @DarkLight1337

noooop · 2025-08-17T05:16:35Z

tests/models/language/pooling/test_st_projector.py

+@pytest.mark.parametrize("model_info", ST_PROJECTOR_MODELS)
+def test_embed_models_mteb(hf_runner, vllm_runner,
+                           model_info: EmbedModelInfo) -> None:
+    """MTEB test for ST projector models to detect numerical issues."""
+    vllm_extra_kwargs: dict[str, Any] = {}
+    if model_info.architecture == "BertModel":
+        # Ensure BertEmbeddingModel is used for embedding models
+        vllm_extra_kwargs["trust_remote_code"] = True
+
+    mteb_test_embed_models(hf_runner,
+                           vllm_runner,
+                           model_info,
+                           vllm_extra_kwargs,
+                           atol=MTEB_EMBED_TOL)


Please keep only test_embed_models_mteb, most of the tests in this folder are like this

noooop · 2025-08-17T05:17:14Z

tests/models/language/pooling/test_st_projector.py

+    vllm_extra_kwargs: dict[str, Any] = {}
+    if model_info.architecture == "BertModel":
+        # Ensure BertEmbeddingModel is used for embedding models
+        vllm_extra_kwargs["trust_remote_code"] = True


hf_runner, vllm_runner already include trust_remote_code= True

noooop · 2025-08-17T05:19:53Z

vllm/model_executor/layers/pooler.py

+
+    def _sync_projector_to_ref(self, ref_tensor: torch.Tensor) -> None:
+        """Ensure projector is on correct device with float32 dtype."""
+        if self.projector is None:
+            return
+
+        projector = cast(nn.Module, self.projector)
+        try:
+            proj_device = next(projector.parameters()).device
+            if proj_device != ref_tensor.device:
+                projector.to(device=ref_tensor.device, dtype=torch.float32)
+            # Ensure all parameters are float32
+            for param in projector.parameters():
+                param.data = param.data.to(torch.float32)
+        except StopIteration:
+            # Empty projector, skip device check
+            pass
+


I don't think this is needed.

noooop · 2025-08-17T05:20:21Z

vllm/model_executor/layers/pooler.py

+    def _validate_projector_dimensions(self, ref_tensor: torch.Tensor) -> None:
+        """Validate projector input dimensions match pooled output."""
+        if self.projector is None:
+            return
+
+        projector = cast(nn.Module, self.projector)
+        first_linear = None
+        for module in projector.modules():
+            if isinstance(module, nn.Linear):
+                first_linear = module
+                break
+
+        if first_linear is not None:
+            expected_dim = first_linear.in_features
+            actual_dim = ref_tensor.shape[-1]
+            if expected_dim != actual_dim:
+                raise ValueError(
+                    f"Dimension mismatch: Dense projector expects "
+                    f"input dim {expected_dim}, but pooled output "
+                    f"has dim {actual_dim}")


I don't think there's a need for dynamic projector_dimensions check

noooop · 2025-08-17T05:21:09Z

vllm/model_executor/layers/pooler.py

+            if isinstance(pooled_data, list) and len(pooled_data) == 0:
+                pass  # Skip projection for empty inputs


This situation should not happen.

noooop · 2025-08-17T05:23:37Z

vllm/model_executor/models/adapters.py

+
+    if weight is None:
+        return False
+
+    try:
+        with torch.no_grad():
+            # Ensure weights are float32 for numerical stability
+            linear.weight.copy_(weight.to(torch.float32))
+            if linear.bias is not None and bias is not None:
+                linear.bias.copy_(bias.to(torch.float32))
+        return True
+    except RuntimeError as e:
+        logger.warning("Failed to load weights into linear layer: %s", e)
+        return False


Please use weight_loader, reference

vllm/vllm/model_executor/models/adapters.py

Lines 372 to 377 in 4d4061b

token_ids = [tokenizer.convert_tokens_to_ids(t) for t in tokens]

score_weight = model.lm_head.weight.data[token_ids]

param = model.score.weight

weight_loader = getattr(param, "weight_loader", default_weight_loader)

weight_loader(param, score_weight)

noooop · 2025-08-17T05:24:45Z

vllm/model_executor/models/adapters.py

+
+        use_bias = cfg.get("bias", True)
+        # Create linear layer with float32 for numerical stability
+        linear = nn.Linear(in_features, out_features, bias=use_bias)
+


You should set float32 here

noooop · 2025-08-17T05:25:20Z

vllm/model_executor/models/adapters.py

+        # Try to load weights - first safetensors, then pytorch_model.bin
+        weight_loaded = False
+
+        # Try safetensors
+        try:
+            b = get_hf_file_bytes(f"{folder}/model.safetensors", model_path,
+                                  revision)
+            if b is not None:
+                import io
+
+                from safetensors.torch import load as st_load
+                sd = st_load(b)
+                weight_loaded = _load_weights_to_linear(sd, linear)
+        except (OSError, ImportError, ValueError) as e:
+            logger.debug("Failed to load safetensors from %s: %s", folder, e)
+
+        if not weight_loaded:
+            try:
+                b = get_hf_file_bytes(f"{folder}/pytorch_model.bin",
+                                      model_path, revision)
+                if b is not None:
+                    import io
+                    sd = torch.load(io.BytesIO(b), map_location="cpu")
+                    weight_loaded = _load_weights_to_linear(sd, linear)
+            except (OSError, torch.serialization.UnpicklingError, RuntimeError,
+                    ValueError) as e:
+                logger.debug("Failed to load pytorch_model.bin from %s: %s",
+                             folder, e)
+
+        if not weight_loaded:
+            logger.warning("Failed to load weights for Dense layer in %s",
+                           folder)
+


weight_loader will do these automatically

noooop · 2025-08-17T05:25:45Z

vllm/transformers_utils/config.py

+            logger.debug("Failed to read file %s: %s", file_path, e)
+            return None
+
+    return None


A blank line is needed at the end.

Thanks for the review! My server crashed a few days ago, but I’ll get it fixed soon.

mergify · 2025-08-20T20:14:42Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @FFFfff1FFFfff.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

FFFfff1FFFfff added 3 commits August 14, 2025 05:14

[Bugfix] Fix Dense module loading for ST

fdba6a9

Signed-off-by: FFFfff1FFFfff <[email protected]>

[Bugfix] Fix Dense module loading for ST

d2c5380

Signed-off-by: FFFfff1FFFfff <[email protected]>

[Bugfix] Fix Dense module loading for ST

d4f9655

Signed-off-by: FFFfff1FFFfff <[email protected]>

FFFfff1FFFfff requested review from DarkLight1337 and ywang96 as code owners August 16, 2025 03:44

gemini-code-assist bot reviewed Aug 16, 2025

View reviewed changes

vllm/model_executor/layers/pooler.py Outdated Show resolved Hide resolved

[Bugfix] Fix Dense module simple

162fed3

Signed-off-by: FFFfff1FFFfff <[email protected]>

mergify bot added the ci/build label Aug 16, 2025

[Bugfix] Fix Dense module simple

48a384c

Signed-off-by: FFFfff1FFFfff <[email protected]>

DarkLight1337 reviewed Aug 16, 2025

View reviewed changes

vllm/model_executor/models/adapters.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Aug 16, 2025

View reviewed changes

vllm/model_executor/models/adapters.py Outdated Show resolved Hide resolved

FFFfff1FFFfff and others added 3 commits August 16, 2025 17:40

[Bugfix] Fix Dense module simple

75d5d95

Signed-off-by: FFFfff1FFFfff <[email protected]>

[Bugfix] Fix Dense module simple

b662dd0

Signed-off-by: FFFfff1FFFfff <[email protected]>

Merge branch 'main' into feature/my_dense_fix

1f67604

gemini-code-assist bot reviewed Aug 17, 2025

View reviewed changes

Merge branch 'main' into feature/my_dense_fix

a3c6de8

noooop reviewed Aug 17, 2025

View reviewed changes

mergify bot added the needs-rebase label Aug 20, 2025

		if isinstance(pooled_data, list) and len(pooled_data) == 0:
		pass # Skip projection for empty inputs

	token_ids = [tokenizer.convert_tokens_to_ids(t) for t in tokens]
	score_weight = model.lm_head.weight.data[token_ids]

	param = model.score.weight
	weight_loader = getattr(param, "weight_loader", default_weight_loader)
	weight_loader(param, score_weight)

Uh oh!

[Bugfix] Fix Dense module loading for sentence-transformers embedding models (simplified version) #23019

Are you sure you want to change the base?

[Bugfix] Fix Dense module loading for sentence-transformers embedding models (simplified version) #23019

Conversation

FFFfff1FFFfff commented Aug 16, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

DarkLight1337 commented Aug 16, 2025

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 commented Aug 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 17, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Aug 20, 2025

Uh oh!

Uh oh!

FFFfff1FFFfff commented Aug 16, 2025 •

edited by github-actions bot

Loading