-
-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[P/D] NixlConnector DP fixes #18903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[P/D] NixlConnector DP fixes #18903
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
#18576 was merged since I forked, let me see what the impact of this is on the unique engine_id situation, should be a quick test. |
#18576 does not go far enough, DP ranks do not get unique engine ids. cc @tlrmchlsmth
|
Signed-off-by: Will Eaton <[email protected]>
Signed-off-by: Will Eaton <[email protected]>
Signed-off-by: Will Eaton <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @wseaton
Signed-off-by: Will Eaton <[email protected]> Signed-off-by: amit <[email protected]>
Signed-off-by: Will Eaton <[email protected]> Signed-off-by: amit <[email protected]>
Stacked PR, needs to go in after: #18559Edit: Not really stacked, these changes can go in on their own, PD+DP just won't work without the scheduling bugfixes.Changes:
KVConnector
are now generated in the factory if they are unset, in the DP case this means each thread gets it's own unique idworld_rank
instead ofrank
for the Nixl side channel port to ensure a unique side channel and no port collision in DPThe sum of these changes is that DP>1 now works w/ P/D enabled when using the
NixlConnector
.Both DP1 and DP2 on the prefill server was tested against DP2 decode.
Using via the following configuration: