[V1] AsyncLLM data parallel #13923

njhill · 2025-02-26T20:07:29Z

The engine core client starts an engine core proc per dp rank and load balances requests between them. A dummy request is sent to idle ranks when the global req count goes from 0->1, and when each engine finishes all requests it will continue in an idle forward loop.

Working for single node:

vllm serve -dp 2 ...

I aimed to keep the data parallel logic isolated as much as possible (in subclasses of the core engine and client) to avoid adding complexity/overhead to the more common default dp=1 case.

Follow-on after this PR:

Multi-node
Balance based on waiting queue lengths rather than in-flight counts
Make it work with API server scale-out

github-actions · 2025-02-26T20:07:39Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Nick Hill <[email protected]>

vllm/v1/engine/core_client.py

Signed-off-by: Nick Hill <[email protected]>

v-lmn · 2025-03-03T06:50:42Z

how to test,I mean how to run the server,I think we need two command right?
terminal 1 command 1
terminal 2 command 2
can you complete the command line

vllm/config.py

mergify · 2025-03-03T08:52:03Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @njhill.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

youkaichao

the DP-related part looks good to me.

cc @robertgshaw2-redhat I'm not familiar with the frontend processing part, maybe Robert can take a look?

njhill · 2025-03-03T15:38:07Z

how to test,I mean how to run the server,I think we need two command right?
terminal 1 command 1
terminal 2 command 2
can you complete the command line

@v-lmn no for single node you can run a single command, with --data-parallel=N. Multi-node isn't added yet but when it is, that will require a different command to be run on the other node(s).

…gine # Conflicts: # vllm/v1/core/scheduler.py # vllm/v1/engine/core.py # vllm/v1/engine/core_client.py

Signed-off-by: Nick Hill <[email protected]>

mergify · 2025-03-03T21:34:45Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @njhill.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

njhill · 2025-03-27T15:29:42Z

Thanks @tlrmchlsmth! Have addressed those comments. Also had to make some additional adjustments to ensure compatibility with @youkaichao's offline multi-node scenario added in #15484.

Signed-off-by: Nick Hill <[email protected]>

mergify · 2025-03-27T17:52:59Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @njhill.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Nick Hill <[email protected]>

# Conflicts: # vllm/v1/core/sched/scheduler.py

youkaichao · 2025-03-28T10:18:22Z

vllm/v1/engine/core.py

+        local_dp_rank = vllm_config.parallel_config.data_parallel_rank_local
+
+        assert dp_size > 1
+        assert 0 <= local_dp_rank <= dp_rank < dp_size


why do we need this check?

It's not strictly needed, I just thought it might be good here to verify that the config is in a coherent state.

youkaichao · 2025-03-28T10:18:49Z

vllm/v1/engine/core.py

+        from vllm.platforms import current_platform
+        if current_platform.is_cuda_alike():
+            from vllm.platforms.cuda import device_id_to_physical_device_id
+            tp_size = vllm_config.parallel_config.tensor_parallel_size


you can use world_size to be general, not just tp_size

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: xinyuxiao <[email protected]>

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

Co1lin · 2025-04-21T07:22:47Z

@njhill @youkaichao Hi, I tried to use dp as what you showed above with the latest dev version of vllm. However, the error below occurred. Do you have any clue on this? In the same shell environment, I can run without dp, e.g. with tp=2, and there are 8 usable GPUs on the machine. Thanks!

$  uv run vllm serve /path/to/model --port 8088 --served-model-name xxx --data-parallel-size 2
...
myhost:2292596:2292596 [0] init.cc:943 NCCL WARN Duplicate GPU detected : rank 0 and rank 1 both on CUDA device 53000
myhost:2292597:2292597 [0] init.cc:943 NCCL WARN Duplicate GPU detected : rank 1 and rank 0 both on CUDA device 53000
...
(EngineCore_1 pid=2292597)   File "/ephnvme/colin/code/prizetrain/.venv/lib/python3.12/site-packages/vllm/distributed/device_communicators/cuda_communicator.py", line 39, in __init__
(EngineCore_0 pid=2292596)     self.NCCL_CHECK(self._funcs["ncclCommInitRank"](ctypes.byref(comm),
(EngineCore_1 pid=2292597)     self.pynccl_comm = PyNcclCommunicator(
(EngineCore_0 pid=2292596)   File "/ephnvme/colin/code/prizetrain/.venv/lib/python3.12/site-packages/vllm/distributed/device_communicators/pynccl_wrapper.py", line 256, in NCCL_CHECK
(EngineCore_1 pid=2292597)                        ^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=2292596)     raise RuntimeError(f"NCCL error: {error_str}")
(EngineCore_1 pid=2292597)   File "/ephnvme/colin/code/prizetrain/.venv/lib/python3.12/site-packages/vllm/distributed/device_communicators/pynccl.py", line 99, in __init__
(EngineCore_1 pid=2292597)     self.comm: ncclComm_t = self.nccl.ncclCommInitRank(
(EngineCore_1 pid=2292597)                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=2292596) RuntimeError: NCCL error: invalid usage (run with NCCL_DEBUG=WARN for details)
(EngineCore_1 pid=2292597)   File "/ephnvme/colin/code/prizetrain/.venv/lib/python3.12/site-packages/vllm/distributed/device_communicators/pynccl_wrapper.py", line 277, in ncclCommInitRank
(EngineCore_1 pid=2292597)     self.NCCL_CHECK(self._funcs["ncclCommInitRank"](ctypes.byref(comm),
(EngineCore_1 pid=2292597)   File "/ephnvme/colin/code/prizetrain/.venv/lib/python3.12/site-packages/vllm/distributed/device_communicators/pynccl_wrapper.py", line 256, in NCCL_CHECK
(EngineCore_1 pid=2292597)     raise RuntimeError(f"NCCL error: {error_str}")
(EngineCore_1 pid=2292597) RuntimeError: NCCL error: invalid usage (run with NCCL_DEBUG=WARN for details)

youkaichao · 2025-04-21T12:38:42Z

@Co1lin I tested vllm serve meta-llama/Llama-3.2-1B-Instruct -dp 2 --port 8989 on the main branch, it works well.

can you create a separate issue with detailed environment?

liunn01 · 2025-04-23T07:39:47Z

vllm serve /local/models/Qwen2___5-32B -dp 8 --dtype half ，卡在图片所示不动，该如何正确执行呢

Signed-off-by: Nick Hill <[email protected]>

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Mu Huai <[email protected]>

hidva · 2025-05-30T11:28:16Z

vllm/v1/engine/core.py

+            local_unfinished_reqs = self.scheduler.has_unfinished_requests()
+
+            if local_unfinished_reqs:
+                # 2) Step the engine core.


Considering the presence of WAITING_FOR_REMOTE_KVS and WAITING_FOR_FSM, the condition local_unfinished_reqs = true does not necessarily imply that scheduler_output.total_num_scheduled_tokens > 0. This means that a forward pass may not actually be executed in _process_engine_step -> step -> execute_model. Meanwhile, other cores might still execute a forward or dummy forward
@njhill

Fixed in #18559

mergify bot added the v1 label Feb 26, 2025

[V1] AsyncLLM data parallel WIP

9ca44ce

Signed-off-by: Nick Hill <[email protected]>

njhill force-pushed the multi-engine branch from 1e63f80 to 9ca44ce Compare February 26, 2025 20:32

njhill added 7 commits February 26, 2025 20:03

Handle pausing loop

3f51611

Signed-off-by: Nick Hill <[email protected]>

More single-node updates

d8c591e

Signed-off-by: Nick Hill <[email protected]>

some cleanup

65e225d

Signed-off-by: Nick Hill <[email protected]>

fix up utility methods

5ce57b6

Signed-off-by: Nick Hill <[email protected]>

revert config check

a3f1102

Signed-off-by: Nick Hill <[email protected]>

fixes

a66fb01

Signed-off-by: Nick Hill <[email protected]>

cleanup

67672c2

Signed-off-by: Nick Hill <[email protected]>

youkaichao reviewed Feb 27, 2025

View reviewed changes

vllm/v1/engine/core_client.py Outdated Show resolved Hide resolved

njhill added 7 commits February 27, 2025 12:47

fixes

cf52fbf

Signed-off-by: Nick Hill <[email protected]>

reconcile with LLMEngine DP in decoupled engine case

a4ec81b

Signed-off-by: Nick Hill <[email protected]>

minor simplification

292aa00

Signed-off-by: Nick Hill <[email protected]>

rework

4b62ffd

Signed-off-by: Nick Hill <[email protected]>

class refactor

407c72e

Signed-off-by: Nick Hill <[email protected]>

fix

31bf7ea

Signed-off-by: Nick Hill <[email protected]>

adjust core engine init

fde51ce

Signed-off-by: Nick Hill <[email protected]>

youkaichao reviewed Mar 3, 2025

View reviewed changes

vllm/config.py Show resolved Hide resolved

mergify bot added the needs-rebase label Mar 3, 2025

youkaichao approved these changes Mar 3, 2025

View reviewed changes

Merge remote-tracking branch 'refs/remotes/origin/main' into multi-en…

d5a3e68

…gine # Conflicts: # vllm/v1/core/scheduler.py # vllm/v1/engine/core.py # vllm/v1/engine/core_client.py

mergify bot removed the needs-rebase label Mar 3, 2025

njhill added 2 commits March 3, 2025 08:12

fix new typing

6d89a1b

Signed-off-by: Nick Hill <[email protected]>

fix 🤦

448abd9

Signed-off-by: Nick Hill <[email protected]>

mergify bot removed the needs-rebase label Mar 27, 2025

njhill added 2 commits March 27, 2025 09:45

Fix env var fallback

771ccf1

Signed-off-by: Nick Hill <[email protected]>

Fix test supports_v1 check

05a0e83

Signed-off-by: Nick Hill <[email protected]>

mergify bot added the needs-rebase label Mar 27, 2025

njhill added 2 commits March 27, 2025 10:59

Fix yapf 🤦

bc41b13

Signed-off-by: Nick Hill <[email protected]>

Merge remote-tracking branch 'origin/main' into multi-engine

ccecb42

# Conflicts: # vllm/v1/core/sched/scheduler.py

mergify bot removed the needs-rebase label Mar 27, 2025

simon-mo merged commit 15dac21 into vllm-project:main Mar 27, 2025
53 of 59 checks passed

njhill deleted the multi-engine branch March 28, 2025 00:32

youkaichao mentioned this pull request Mar 28, 2025

[Bugfix] Data parallel example will all use same GPUs if the users script initializes torch.cuda #14598

Closed

youkaichao reviewed Mar 28, 2025

View reviewed changes

simon-mo mentioned this pull request Mar 29, 2025

[Roadmap] vLLM Roadmap Q2 2025 #15735

Closed

66 tasks

tlrmchlsmth mentioned this pull request Apr 3, 2025

[RFC]: Data Parallel Attention and Expert Parallel MoEs #16037

Open

37 tasks

Alex4210987 pushed a commit to LeiWang1999/vllm-bitblas that referenced this pull request Apr 5, 2025

[V1] AsyncLLM data parallel (vllm-project#13923)

b28af68

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: xinyuxiao <[email protected]>

lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025

[V1] AsyncLLM data parallel (vllm-project#13923)

2f4bbe3

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

Co1lin mentioned this pull request Apr 25, 2025

[Bug]: uv run vllm serve with DP results in NCCL error: two ranks use the same device #17176

Open

1 task

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025

[V1] AsyncLLM data parallel (vllm-project#13923)

faa7dc1

Signed-off-by: Nick Hill <[email protected]>

shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025

[V1] AsyncLLM data parallel (vllm-project#13923)

fbeb735

Signed-off-by: Nick Hill <[email protected]>

RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025

[V1] AsyncLLM data parallel (vllm-project#13923)

9e4bc6c

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Mu Huai <[email protected]>

hidva reviewed May 30, 2025

View reviewed changes

Uh oh!

[V1] AsyncLLM data parallel #13923

[V1] AsyncLLM data parallel #13923

Uh oh!

Conversation

njhill commented Feb 26, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 26, 2025

Uh oh!

Uh oh!

v-lmn commented Mar 3, 2025

Uh oh!

Uh oh!

mergify bot commented Mar 3, 2025

Uh oh!

youkaichao left a comment

Choose a reason for hiding this comment

Uh oh!

njhill commented Mar 3, 2025

Uh oh!

mergify bot commented Mar 3, 2025

Uh oh!

njhill commented Mar 27, 2025

Uh oh!

mergify bot commented Mar 27, 2025

Uh oh!

Uh oh!

youkaichao Mar 28, 2025

Choose a reason for hiding this comment

Uh oh!

njhill Mar 28, 2025

Choose a reason for hiding this comment

Uh oh!

youkaichao Mar 28, 2025

Choose a reason for hiding this comment

Uh oh!

Co1lin commented Apr 21, 2025

Uh oh!

youkaichao commented Apr 21, 2025

Uh oh!

liunn01 commented Apr 23, 2025

Uh oh!

hidva May 30, 2025

Choose a reason for hiding this comment

Uh oh!

hidva Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

njhill commented Feb 26, 2025 •

edited by github-actions bot

Loading