Skip to content

Conversation

Isotr0py
Copy link
Member

@Isotr0py Isotr0py commented Nov 11, 2024

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@DarkLight1337
Copy link
Member

Can you take a look at why the CI still passed before, and fix it so that failing tests actually fail the CI?

@bigPYJ1151
Copy link
Member

Can you take a look at why the CI still passed before, and fix it so that failing tests actually fail the CI?

It is due to the CPU CI is set as soft-failed.

@DarkLight1337
Copy link
Member

@bigPYJ1151
Copy link
Member

The CI didn't even soft fail here: https://buildkite.com/vllm/ci-aws/builds/11064#01931a18-e48d-485d-b357-f5f995bc474f

Perhaps make cpu_tests() in the test script to exit early can fix (adding set -e at the function beginning).

@mergify mergify bot added the ci/build label Nov 11, 2024
Signed-off-by: Isotr0py <[email protected]>
@Isotr0py
Copy link
Member Author

I added an intentional failing test to CPU test pipeline, let's see if it can be caught by the CI after adding set -e.

@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 11, 2024
@DarkLight1337
Copy link
Member

Added ready label to trigger Intel CPU tests

@Isotr0py
Copy link
Member Author

Seems that adding set -e can solve the issue: https://buildkite.com/vllm/ci-aws/builds/11071#01931aef-f91a-4901-a902-753db7a74f76/6-314

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) November 11, 2024 12:06
@DarkLight1337 DarkLight1337 merged commit 2cebda4 into vllm-project:main Nov 11, 2024
47 of 48 checks passed
@Isotr0py Isotr0py deleted the fix-cpu-enc-dec branch November 11, 2024 14:02
@DarkLight1337
Copy link
Member

DarkLight1337 commented Nov 12, 2024

Classification model tests are failing now: https://buildkite.com/vllm/ci-aws/builds/11116#01932014-a8a8-4bbf-91a0-6ba08aa7cde8

Looks like the vLLM output is wrong.

@Isotr0py
Copy link
Member Author

Isotr0py commented Nov 12, 2024

Hmmm, this is odd, because I can't reproduce it with main branch on my CPU server. Let me try it on another device to see if this is related to device tomorrow...

Update: I can reproduce this once after several runs.

@Isotr0py
Copy link
Member Author

Seems that the failing tests only occur after several runs:

$ pytest --count=20 -x -s -v tests/models/embedding/language/test_cls_models.py::test_classification_models
================================================================================================== FAILURES ==================================================================================================
_________________________________________________________________ test_classification_models[float-/data/LLM-model/Qwen2.5-1.5B-apeach-5-20] _________________________________________________________________

hf_runner = <class 'tests.conftest.HfRunner'>, vllm_runner = <class 'tests.conftest.VllmRunner'>
example_prompts = ['vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs.\n', 'Briefly describe the majo...me.\n', 'Analyze the impact of the COVID-19 pandemic on global economic structures and future business models.\n', ...]
model = '/data/LLM-model/Qwen2.5-1.5B-apeach', dtype = 'float'

    @pytest.mark.parametrize("model", CLASSIFICATION_MODELS)
    @pytest.mark.parametrize("dtype", ["float"])
    def test_classification_models(
        hf_runner,
        vllm_runner,
        example_prompts,
        model: str,
        dtype: str,
    ) -> None:
        with hf_runner(model,
                       dtype=dtype,
                       auto_cls=AutoModelForSequenceClassification) as hf_model:
            hf_outputs = hf_model.classify(example_prompts)
    
        with vllm_runner(model, dtype=dtype) as vllm_model:
            vllm_outputs = vllm_model.classify(example_prompts)
    
        print(hf_outputs, vllm_outputs)
    
        # check logits difference
        for hf_output, vllm_output in zip(hf_outputs, vllm_outputs):
            hf_output = torch.tensor(hf_output)
            vllm_output = torch.tensor(vllm_output)
    
>           assert torch.allclose(hf_output, vllm_output, 1e-3)
E           assert False
E            +  where False = <built-in method allclose of type object at 0x776c248678c0>(tensor([0.2645, 0.7355]), tensor([1., 0.]), 0.001)
E            +    where <built-in method allclose of type object at 0x776c248678c0> = torch.allclose

tests/models/embedding/language/test_cls_models.py:39: AssertionError
============================================================================================== warnings summary ==============================================================================================
../../miniconda3/envs/vllm/lib/python3.10/site-packages/intel_extension_for_pytorch/transformers/optimize.py:4
  /home/c4rbon/miniconda3/envs/vllm/lib/python3.10/site-packages/intel_extension_for_pytorch/transformers/optimize.py:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
    import pkg_resources

../../miniconda3/envs/vllm/lib/python3.10/site-packages/pkg_resources/__init__.py:3154
  /home/c4rbon/miniconda3/envs/vllm/lib/python3.10/site-packages/pkg_resources/__init__.py:3154: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('sphinxcontrib')`.
  Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
    declare_namespace(pkg)

../../miniconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:335
  /home/c4rbon/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:335: UserWarning: Device capability of ccl unspecified, assuming `cpu` and `cuda`. Please specify it via the `devices` argument of `register_backend`.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========================================================================================== short test summary info ===========================================================================================
FAILED tests/models/embedding/language/test_cls_models.py::test_classification_models[float-/data/LLM-model/Qwen2.5-1.5B-apeach-5-20] - assert False
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
============================================================================= 1 failed, 4 passed, 3 warnings in 61.51s (0:01:01) =============================================================================

@DarkLight1337
Copy link
Member

Does the failure occur randomly?

@Isotr0py
Copy link
Member Author

Yes, and it seems that Intel CPU test in some new PRs is passing now: https://buildkite.com/vllm/ci-aws/builds/11137#0193231d-132d-40a9-9bfe-dfa5a1f05da0

@DarkLight1337
Copy link
Member

This is odd... can you try setting max_num_seqs=1 and see if it can completely avoid the error? If so, maybe there is some problem with batching.

@Isotr0py
Copy link
Member Author

Setting max_num_seqs=1 didn't work...

======================================================================================================== FAILURES =========================================================================================================
_______________________________________________________________________ test_classification_models[float-/data/LLM-model/Qwen2.5-1.5B-apeach-4-20] ________________________________________________________________________

hf_runner = <class 'tests.conftest.HfRunner'>, vllm_runner = <class 'tests.conftest.VllmRunner'>
example_prompts = ['vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs.\n', 'Briefly describe the majo...me.\n', 'Analyze the impact of the COVID-19 pandemic on global economic structures and future business models.\n', ...]
model = '/data/LLM-model/Qwen2.5-1.5B-apeach', dtype = 'float'

    @pytest.mark.parametrize("model", CLASSIFICATION_MODELS)
    @pytest.mark.parametrize("dtype", ["float"])
    def test_classification_models(
        hf_runner,
        vllm_runner,
        example_prompts,
        model: str,
        dtype: str,
    ) -> None:
        with hf_runner(model,
                       dtype=dtype,
                       auto_cls=AutoModelForSequenceClassification) as hf_model:
            hf_outputs = hf_model.classify(example_prompts)
    
        with vllm_runner(model, dtype=dtype, max_num_seqs=1) as vllm_model:
            vllm_outputs = vllm_model.classify(example_prompts)
    
        print(hf_outputs, vllm_outputs)
    
        # check logits difference
        for hf_output, vllm_output in zip(hf_outputs, vllm_outputs):
            hf_output = torch.tensor(hf_output)
            vllm_output = torch.tensor(vllm_output)
    
>           assert torch.allclose(hf_output, vllm_output, 1e-3)
E           assert False
E            +  where False = <built-in method allclose of type object at 0x794a340678c0>(tensor([0.2645, 0.7355]), tensor([1., 0.]), 0.001)
E            +    where <built-in method allclose of type object at 0x794a340678c0> = torch.allclose

tests/models/embedding/language/test_cls_models.py:39: AssertionError
==================================================================================================== warnings summary =====================================================================================================
../../miniconda3/envs/vllm/lib/python3.10/site-packages/intel_extension_for_pytorch/transformers/optimize.py:4
  /home/c4rbon/miniconda3/envs/vllm/lib/python3.10/site-packages/intel_extension_for_pytorch/transformers/optimize.py:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
    import pkg_resources

../../miniconda3/envs/vllm/lib/python3.10/site-packages/pkg_resources/__init__.py:3154
  /home/c4rbon/miniconda3/envs/vllm/lib/python3.10/site-packages/pkg_resources/__init__.py:3154: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('sphinxcontrib')`.
  Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
    declare_namespace(pkg)

../../miniconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:335
  /home/c4rbon/miniconda3/envs/vllm/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py:335: UserWarning: Device capability of ccl unspecified, assuming `cpu` and `cuda`. Please specify it via the `devices` argument of `register_backend`.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================================================================= short test summary info =================================================================================================
FAILED tests/models/embedding/language/test_cls_models.py::test_classification_models[float-/data/LLM-model/Qwen2.5-1.5B-apeach-4-20] - assert False
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
=================================================================================== 1 failed, 3 passed, 3 warnings in 64.05s (0:01:04) ====================================================================================

@DarkLight1337
Copy link
Member

DarkLight1337 commented Nov 13, 2024

Maybe there's something wrong with the softmax? It's really strange that the output is exactly 0 and 1...

@Isotr0py
Copy link
Member Author

I prefer this is an issue about the test itself, because only this test failed randomly and seems that running example/offline_inference_embedding.py won't output exactly 0 and 1.

Perhaps it's because of the runner order? We run hf_runner before vllm_runner in this test.

@DarkLight1337
Copy link
Member

Perhaps it's because of the runner order? We run hf_runner before vllm_runner in this test.

Let's try swapping the order.

@Isotr0py
Copy link
Member Author

Isotr0py commented Nov 13, 2024

Seems that there is a suspicious overflow occured in score layer when the test is failing. Here is the logits tensor when test failed:

tensor([[ 3.3580e+36, -2.0206e+00],
        [ 3.3580e+36,  1.5207e+00],
        [ 3.3580e+36,  9.7952e-02],
        [ 3.3580e+36,  3.0942e+00],
        [ 3.3580e+36,  3.0767e+00]
        ...
        [ 3.3580e+36,  4.4228e+00],
        [ 3.3580e+36,  4.5115e+00],
        [ 3.3580e+36,  1.6440e+00],
        [ 3.3580e+36, -1.8783e+00]])

The overflow only occurred in the first column, while the last column is normal. Note that the hidden_states from Qwen2Model in failing case is same to passing case.

sumitd2 pushed a commit to sumitd2/vllm that referenced this pull request Nov 14, 2024
sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024
LeiWang1999 pushed a commit to LeiWang1999/vllm-bitblas that referenced this pull request Mar 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants