Skip to content

Conversation

Shixiaowei02
Copy link
Collaborator

@Shixiaowei02 Shixiaowei02 commented Jun 19, 2025

In this tech blog, we will introduce Disaggregated Serving in TensorRT-LLM, mainly covering the following topics:

  • We explain why disaggregated Serving is needed and what problems it solves in LLM inference.
  • Architecture and usage of trtllm-serve, Dynamo, and Triton Inference Server. We describe how these components work and how to use them in practice.
  • We introduce how KV cache exchange is designed and optimized to improve performance.
  • We share performance results, explain how they were measured, and show how to reproduce DeepSeek R1 benchmarks.

By NVIDIA TensorRT-LLM Team

@Shixiaowei02 Shixiaowei02 force-pushed the user/xiaoweis/tech_blog branch 3 times, most recently from 47bca8c to bdaa71d Compare June 19, 2025 06:38
@Shixiaowei02 Shixiaowei02 changed the title Blog: Disaggregated Serving in TensorRT-LLM blog: Disaggregated Serving in TensorRT-LLM Jun 19, 2025
@Shixiaowei02 Shixiaowei02 force-pushed the user/xiaoweis/tech_blog branch 4 times, most recently from c7d7e78 to c66f85f Compare June 19, 2025 06:52
Signed-off-by: Shixiaowei02 <[email protected]>
@Shixiaowei02 Shixiaowei02 force-pushed the user/xiaoweis/tech_blog branch from c66f85f to 012f575 Compare June 19, 2025 07:12
@Shixiaowei02
Copy link
Collaborator Author

@xmchen1987 I can't add you to the review list, so please also review it. Thank you!

@juney-nvidia
Copy link
Collaborator

/bot run --comment "No need to run full CI"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9474 Bot args parsing error: usage: /bot [-h]
{run,kill,skip,submit,reviewers,reuse-pipeline,reuse-review} ...
/bot: error: unrecognized arguments: --comment No need to run full CI

@juney-nvidia
Copy link
Collaborator

/bot skip --comment "No need to run full CI"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9475 [ skip ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9475 [ skip ] completed with state SUCCESS
Skipping testing for commit 012f575

Signed-off-by: Shixiaowei02 <[email protected]>
@Shixiaowei02 Shixiaowei02 force-pushed the user/xiaoweis/tech_blog branch from 0373e55 to fe8839f Compare June 19, 2025 08:04
@juney-nvidia
Copy link
Collaborator

/bot skip --comment "No need to run full CI"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9490 [ skip ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9490 [ skip ] completed with state ABORTED

@juney-nvidia
Copy link
Collaborator

/bot skip --comment "No need to run full CI"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9493 [ skip ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9493 [ skip ] completed with state SUCCESS
Skipping testing for commit fe8839f

@Shixiaowei02 Shixiaowei02 merged commit 9a53e58 into NVIDIA:main Jun 19, 2025
3 checks passed
@Shixiaowei02 Shixiaowei02 deleted the user/xiaoweis/tech_blog branch June 19, 2025 10:02
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 9, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants