[Bugfix][Hardware][CPU] Fix broken encoder-decoder CPU runner #10218

Isotr0py · 2024-11-11T09:37:49Z

Fix broken encoder-decoder CPU runner due to [Hardware][CPU] Add embedding models support for CPU backend #10193
I forgot implementing sampling preparation when refactoring CPU encoder-decoder runner😅

Signed-off-by: Isotr0py <[email protected]>

github-actions · 2024-11-11T09:38:02Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

DarkLight1337 · 2024-11-11T09:40:15Z

Can you take a look at why the CI still passed before, and fix it so that failing tests actually fail the CI?

bigPYJ1151 · 2024-11-11T10:03:28Z

Can you take a look at why the CI still passed before, and fix it so that failing tests actually fail the CI?

It is due to the CPU CI is set as soft-failed.

DarkLight1337 · 2024-11-11T10:12:30Z

The CI didn't even soft fail here: https://buildkite.com/vllm/ci-aws/builds/11064#01931a18-e48d-485d-b357-f5f995bc474f

bigPYJ1151 · 2024-11-11T10:50:37Z

The CI didn't even soft fail here: https://buildkite.com/vllm/ci-aws/builds/11064#01931a18-e48d-485d-b357-f5f995bc474f

Perhaps make cpu_tests() in the test script to exit early can fix (adding set -e at the function beginning).

Signed-off-by: Isotr0py <[email protected]>

Isotr0py · 2024-11-11T11:08:39Z

I added an intentional failing test to CPU test pipeline, let's see if it can be caught by the CI after adding set -e.

DarkLight1337 · 2024-11-11T11:09:35Z

Added ready label to trigger Intel CPU tests

Signed-off-by: Isotr0py <[email protected]>

Isotr0py · 2024-11-11T11:21:44Z

Seems that adding set -e can solve the issue: https://buildkite.com/vllm/ci-aws/builds/11071#01931aef-f91a-4901-a902-753db7a74f76/6-314

Signed-off-by: Isotr0py <[email protected]>

DarkLight1337 · 2024-11-12T16:36:33Z

Classification model tests are failing now: https://buildkite.com/vllm/ci-aws/builds/11116#01932014-a8a8-4bbf-91a0-6ba08aa7cde8

Looks like the vLLM output is wrong.

Isotr0py · 2024-11-12T17:22:51Z

Hmmm, this is odd, because I can't reproduce it with main branch on my CPU server. Let me try it on another device to see if this is related to device tomorrow...

Update: I can reproduce this once after several runs.

Isotr0py · 2024-11-13T03:32:28Z

Seems that the failing tests only occur after several runs:

$ pytest --count=20 -x -s -v tests/models/embedding/language/test_cls_models.py::test_classification_models
================================================================================================== FAILURES ==================================================================================================
_________________________________________________________________ test_classification_models[float-/data/LLM-model/Qwen2.5-1.5B-apeach-5-20] _________________________________________________________________

hf_runner = <class 'tests.conftest.HfRunner'>, vllm_runner = <class 'tests.conftest.VllmRunner'>
example_prompts = ['vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs.\n', 'Briefly describe the majo...me.\n', 'Analyze the impact of the COVID-19 pandemic on global economic structures and future business models.\n', ...]
model = '/data/LLM-model/Qwen2.5-1.5B-apeach', dtype = 'float'

    @pytest.mark.parametrize("model", CLASSIFICATION_MODELS)
    @pytest.mark.parametrize("dtype", ["float"])
    def test_classification_models(
        hf_runner,
        vllm_runner,
        example_prompts,
        model: str,
        dtype: str,
    ) -> None:
        with hf_runner(model,
                       dtype=dtype,
                       auto_cls=AutoModelForSequenceClassification) as hf_model:
            hf_outputs = hf_model.classify(example_prompts)
    
        with vllm_runner(model, dtype=dtype) as vllm_model:
            vllm_outputs = vllm_model.classify(example_prompts)
    
        print(hf_outputs, vllm_outputs)
    
        # check logits difference
        for hf_output, vllm_output in zip(hf_outputs, vllm_outputs):
            hf_output = torch.tensor(hf_output)
            vllm_output = torch.tensor(vllm_output)
    
>           assert torch.allclose(hf_output, vllm_output, 1e-3)
E           assert False
E            +  where False = <built-in method allclose of type object at 0x776c248678c0>(tensor([0.2645, 0.7355]), tensor([1., 0.]), 0.001)
E            +    where <built-in method allclose of type object at 0x776c248678c0> = torch.allclose

tests/models/embedding/language/test_cls_models.py:39: AssertionError
============================================================================================== warnings summary ==============================================================================================
../../miniconda3/envs/vllm/lib/python3.10/site-packages/intel_extension_for_pytorch/transformers/optimize.py:4
  /home/c4rbon/miniconda3/envs/vllm/lib/python3.10/site-packages/intel_extension_for_pytorch/transformers/optimize.py:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
    import pkg_resources

../../miniconda3/envs/vllm/lib/python3.10/site-packages/pkg_resources/__init__.py:3154
  /home/c4rbon/miniconda3/envs/vllm/lib/python3.10/site-packages/pkg_resources/__init__.py:3154: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('sphinxcontrib')`.
  Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
    declare_namespace(pkg)

../../miniconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:335
  /home/c4rbon/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:335: UserWarning: Device capability of ccl unspecified, assuming `cpu` and `cuda`. Please specify it via the `devices` argument of `register_backend`.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========================================================================================== short test summary info ===========================================================================================
FAILED tests/models/embedding/language/test_cls_models.py::test_classification_models[float-/data/LLM-model/Qwen2.5-1.5B-apeach-5-20] - assert False
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
============================================================================= 1 failed, 4 passed, 3 warnings in 61.51s (0:01:01) =============================================================================

DarkLight1337 · 2024-11-13T03:43:18Z

Does the failure occur randomly?

Isotr0py · 2024-11-13T03:45:26Z

Yes, and it seems that Intel CPU test in some new PRs is passing now: https://buildkite.com/vllm/ci-aws/builds/11137#0193231d-132d-40a9-9bfe-dfa5a1f05da0

DarkLight1337 · 2024-11-13T04:14:02Z

This is odd... can you try setting max_num_seqs=1 and see if it can completely avoid the error? If so, maybe there is some problem with batching.

Isotr0py · 2024-11-13T05:40:02Z

Setting max_num_seqs=1 didn't work...

======================================================================================================== FAILURES =========================================================================================================
_______________________________________________________________________ test_classification_models[float-/data/LLM-model/Qwen2.5-1.5B-apeach-4-20] ________________________________________________________________________

hf_runner = <class 'tests.conftest.HfRunner'>, vllm_runner = <class 'tests.conftest.VllmRunner'>
example_prompts = ['vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs.\n', 'Briefly describe the majo...me.\n', 'Analyze the impact of the COVID-19 pandemic on global economic structures and future business models.\n', ...]
model = '/data/LLM-model/Qwen2.5-1.5B-apeach', dtype = 'float'

    @pytest.mark.parametrize("model", CLASSIFICATION_MODELS)
    @pytest.mark.parametrize("dtype", ["float"])
    def test_classification_models(
        hf_runner,
        vllm_runner,
        example_prompts,
        model: str,
        dtype: str,
    ) -> None:
        with hf_runner(model,
                       dtype=dtype,
                       auto_cls=AutoModelForSequenceClassification) as hf_model:
            hf_outputs = hf_model.classify(example_prompts)
    
        with vllm_runner(model, dtype=dtype, max_num_seqs=1) as vllm_model:
            vllm_outputs = vllm_model.classify(example_prompts)
    
        print(hf_outputs, vllm_outputs)
    
        # check logits difference
        for hf_output, vllm_output in zip(hf_outputs, vllm_outputs):
            hf_output = torch.tensor(hf_output)
            vllm_output = torch.tensor(vllm_output)
    
>           assert torch.allclose(hf_output, vllm_output, 1e-3)
E           assert False
E            +  where False = <built-in method allclose of type object at 0x794a340678c0>(tensor([0.2645, 0.7355]), tensor([1., 0.]), 0.001)
E            +    where <built-in method allclose of type object at 0x794a340678c0> = torch.allclose

tests/models/embedding/language/test_cls_models.py:39: AssertionError
==================================================================================================== warnings summary =====================================================================================================
../../miniconda3/envs/vllm/lib/python3.10/site-packages/intel_extension_for_pytorch/transformers/optimize.py:4
  /home/c4rbon/miniconda3/envs/vllm/lib/python3.10/site-packages/intel_extension_for_pytorch/transformers/optimize.py:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
    import pkg_resources

../../miniconda3/envs/vllm/lib/python3.10/site-packages/pkg_resources/__init__.py:3154
  /home/c4rbon/miniconda3/envs/vllm/lib/python3.10/site-packages/pkg_resources/__init__.py:3154: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('sphinxcontrib')`.
  Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
    declare_namespace(pkg)

../../miniconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:335
  /home/c4rbon/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:335: UserWarning: Device capability of ccl unspecified, assuming `cpu` and `cuda`. Please specify it via the `devices` argument of `register_backend`.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================================================================= short test summary info =================================================================================================
FAILED tests/models/embedding/language/test_cls_models.py::test_classification_models[float-/data/LLM-model/Qwen2.5-1.5B-apeach-4-20] - assert False
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
=================================================================================== 1 failed, 3 passed, 3 warnings in 64.05s (0:01:04) ====================================================================================

DarkLight1337 · 2024-11-13T05:46:49Z

Maybe there's something wrong with the softmax? It's really strange that the output is exactly 0 and 1...

Isotr0py · 2024-11-13T05:54:00Z

I prefer this is an issue about the test itself, because only this test failed randomly and seems that running example/offline_inference_embedding.py won't output exactly 0 and 1.

Perhaps it's because of the runner order? We run hf_runner before vllm_runner in this test.

DarkLight1337 · 2024-11-13T05:55:44Z

Perhaps it's because of the runner order? We run hf_runner before vllm_runner in this test.

Let's try swapping the order.

Isotr0py · 2024-11-13T08:00:05Z

Seems that there is a suspicious overflow occured in score layer when the test is failing. Here is the logits tensor when test failed:

tensor([[ 3.3580e+36, -2.0206e+00],
        [ 3.3580e+36,  1.5207e+00],
        [ 3.3580e+36,  9.7952e-02],
        [ 3.3580e+36,  3.0942e+00],
        [ 3.3580e+36,  3.0767e+00]
        ...
        [ 3.3580e+36,  4.4228e+00],
        [ 3.3580e+36,  4.5115e+00],
        [ 3.3580e+36,  1.6440e+00],
        [ 3.3580e+36, -1.8783e+00]])

The overflow only occurred in the first column, while the last column is normal. Note that the hidden_states from Qwen2Model in failing case is same to passing case.

…roject#10218) Signed-off-by: Isotr0py <[email protected]> Signed-off-by: Sumit Dubey <[email protected]>

…roject#10218) Signed-off-by: Isotr0py <[email protected]>

…roject#10218) Signed-off-by: Isotr0py <[email protected]> Signed-off-by: LeiWang1999 <[email protected]>

fix broke enc-dec cpu runner

a7e02cc

Signed-off-by: Isotr0py <[email protected]>

Isotr0py mentioned this pull request Nov 11, 2024

[Hardware][CPU] Add embedding models support for CPU backend #10193

Merged

add a intentional failed test

c1eb88c

Signed-off-by: Isotr0py <[email protected]>

mergify bot added the ci/build label Nov 11, 2024

code format

009c008

Signed-off-by: Isotr0py <[email protected]>

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 11, 2024

change intentional failed test place

7cf4c36

Signed-off-by: Isotr0py <[email protected]>

remove intentional failed test

bb152d7

Signed-off-by: Isotr0py <[email protected]>

DarkLight1337 approved these changes Nov 11, 2024

View reviewed changes

DarkLight1337 enabled auto-merge (squash) November 11, 2024 12:06

DarkLight1337 merged commit 2cebda4 into vllm-project:main Nov 11, 2024
47 of 48 checks passed

Isotr0py deleted the fix-cpu-enc-dec branch November 11, 2024 14:02

Isotr0py mentioned this pull request Nov 13, 2024

[Bugfix] Fix tensor parallel for qwen2 classification model #10297

Merged

sumitd2 pushed a commit to sumitd2/vllm that referenced this pull request Nov 14, 2024

[Bugfix][Hardware][CPU] Fix broken encoder-decoder CPU runner (vllm-p…

8de1230

…roject#10218) Signed-off-by: Isotr0py <[email protected]> Signed-off-by: Sumit Dubey <[email protected]>

sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024

[Bugfix][Hardware][CPU] Fix broken encoder-decoder CPU runner (vllm-p…

4902926

…roject#10218) Signed-off-by: Isotr0py <[email protected]>

LeiWang1999 pushed a commit to LeiWang1999/vllm-bitblas that referenced this pull request Mar 26, 2025

[Bugfix][Hardware][CPU] Fix broken encoder-decoder CPU runner (vllm-p…

b9396e7

…roject#10218) Signed-off-by: Isotr0py <[email protected]> Signed-off-by: LeiWang1999 <[email protected]>

Uh oh!

[Bugfix][Hardware][CPU] Fix broken encoder-decoder CPU runner #10218

[Bugfix][Hardware][CPU] Fix broken encoder-decoder CPU runner #10218

Uh oh!

Conversation

Isotr0py commented Nov 11, 2024 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 11, 2024

Uh oh!

DarkLight1337 commented Nov 11, 2024

Uh oh!

bigPYJ1151 commented Nov 11, 2024

Uh oh!

DarkLight1337 commented Nov 11, 2024

Uh oh!

bigPYJ1151 commented Nov 11, 2024

Uh oh!

Isotr0py commented Nov 11, 2024

Uh oh!

DarkLight1337 commented Nov 11, 2024

Uh oh!

Isotr0py commented Nov 11, 2024

Uh oh!

Uh oh!

DarkLight1337 commented Nov 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Isotr0py commented Nov 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Isotr0py commented Nov 13, 2024

Uh oh!

DarkLight1337 commented Nov 13, 2024

Uh oh!

Isotr0py commented Nov 13, 2024

Uh oh!

DarkLight1337 commented Nov 13, 2024

Uh oh!

Isotr0py commented Nov 13, 2024

Uh oh!

DarkLight1337 commented Nov 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Isotr0py commented Nov 13, 2024

Uh oh!

DarkLight1337 commented Nov 13, 2024

Uh oh!

Isotr0py commented Nov 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Isotr0py commented Nov 11, 2024 •

edited by github-actions bot

Loading

DarkLight1337 commented Nov 12, 2024 •

edited

Loading

Isotr0py commented Nov 12, 2024 •

edited

Loading

DarkLight1337 commented Nov 13, 2024 •

edited

Loading

Isotr0py commented Nov 13, 2024 •

edited

Loading