blog: Disaggregated Serving in TensorRT-LLM #5353

Shixiaowei02 · 2025-06-19T05:30:19Z

In this tech blog, we will introduce Disaggregated Serving in TensorRT-LLM, mainly covering the following topics:

We explain why disaggregated Serving is needed and what problems it solves in LLM inference.
Architecture and usage of trtllm-serve, Dynamo, and Triton Inference Server. We describe how these components work and how to use them in practice.
We introduce how KV cache exchange is designed and optimized to improve performance.
We share performance results, explain how they were measured, and show how to reproduce DeepSeek R1 benchmarks.

By NVIDIA TensorRT-LLM Team

Signed-off-by: Shixiaowei02 <[email protected]>

Shixiaowei02 · 2025-06-19T07:21:21Z

@xmchen1987 I can't add you to the review list, so please also review it. Thank you!

juney-nvidia · 2025-06-19T07:38:04Z

/bot run --comment "No need to run full CI"

tensorrt-cicd · 2025-06-19T07:44:02Z

PR_Github #9474 Bot args parsing error: usage: /bot [-h]
{run,kill,skip,submit,reviewers,reuse-pipeline,reuse-review} ...
/bot: error: unrecognized arguments: --comment No need to run full CI

juney-nvidia · 2025-06-19T07:46:45Z

/bot skip --comment "No need to run full CI"

tensorrt-cicd · 2025-06-19T07:52:30Z

PR_Github #9475 [ skip ] triggered by Bot

tensorrt-cicd · 2025-06-19T07:59:56Z

PR_Github #9475 [ skip ] completed with state SUCCESS
Skipping testing for commit 012f575

Signed-off-by: Shixiaowei02 <[email protected]>

juney-nvidia · 2025-06-19T09:25:23Z

/bot skip --comment "No need to run full CI"

tensorrt-cicd · 2025-06-19T09:32:30Z

PR_Github #9490 [ skip ] triggered by Bot

tensorrt-cicd · 2025-06-19T09:39:48Z

PR_Github #9490 [ skip ] completed with state ABORTED

juney-nvidia · 2025-06-19T09:46:42Z

/bot skip --comment "No need to run full CI"

tensorrt-cicd · 2025-06-19T09:52:26Z

PR_Github #9493 [ skip ] triggered by Bot

tensorrt-cicd · 2025-06-19T09:58:25Z

PR_Github #9493 [ skip ] completed with state SUCCESS
Skipping testing for commit fe8839f

Signed-off-by: Shixiaowei02 <[email protected]>

Shixiaowei02 force-pushed the user/xiaoweis/tech_blog branch 3 times, most recently from 47bca8c to bdaa71d Compare June 19, 2025 06:38

Shixiaowei02 changed the title ~~Blog: Disaggregated Serving in TensorRT-LLM~~ blog: Disaggregated Serving in TensorRT-LLM Jun 19, 2025

Shixiaowei02 force-pushed the user/xiaoweis/tech_blog branch 4 times, most recently from c7d7e78 to c66f85f Compare June 19, 2025 06:52

Shixiaowei02 requested review from qiaoxj07, jgangani, Tabrizian, pcastonguay, chuangz0, schetlur-nv, juney-nvidia, Shunkangz and zhengd-nv June 19, 2025 06:55

upload the blog

012f575

Signed-off-by: Shixiaowei02 <[email protected]>

Shixiaowei02 force-pushed the user/xiaoweis/tech_blog branch from c66f85f to 012f575 Compare June 19, 2025 07:12

juney-nvidia approved these changes Jun 19, 2025

View reviewed changes

Shunkangz approved these changes Jun 19, 2025

View reviewed changes

fix an error

fe8839f

Signed-off-by: Shixiaowei02 <[email protected]>

Shixiaowei02 force-pushed the user/xiaoweis/tech_blog branch from 0373e55 to fe8839f Compare June 19, 2025 08:04

qiaoxj07 approved these changes Jun 19, 2025

View reviewed changes

Shixiaowei02 merged commit 9a53e58 into NVIDIA:main Jun 19, 2025
3 checks passed

Shixiaowei02 deleted the user/xiaoweis/tech_blog branch June 19, 2025 10:02

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 9, 2025

blog: Disaggregated Serving in TensorRT-LLM (NVIDIA#5353)

3b2b421

Signed-off-by: Shixiaowei02 <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025

blog: Disaggregated Serving in TensorRT-LLM (NVIDIA#5353)

0573262

Signed-off-by: Shixiaowei02 <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025

blog: Disaggregated Serving in TensorRT-LLM (NVIDIA#5353)

6dea667

Signed-off-by: Shixiaowei02 <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025

blog: Disaggregated Serving in TensorRT-LLM (NVIDIA#5353)

a533c21

Signed-off-by: Shixiaowei02 <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025

blog: Disaggregated Serving in TensorRT-LLM (NVIDIA#5353)

d5a469e

Signed-off-by: Shixiaowei02 <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025

blog: Disaggregated Serving in TensorRT-LLM (NVIDIA#5353)

69289ac

Signed-off-by: Shixiaowei02 <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025

blog: Disaggregated Serving in TensorRT-LLM (NVIDIA#5353)

55c7a91

Signed-off-by: Shixiaowei02 <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025

blog: Disaggregated Serving in TensorRT-LLM (NVIDIA#5353)

30e7b7d

Signed-off-by: Shixiaowei02 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

blog: Disaggregated Serving in TensorRT-LLM #5353

blog: Disaggregated Serving in TensorRT-LLM #5353

Uh oh!

Shixiaowei02 commented Jun 19, 2025 •

edited

Loading

Uh oh!

Shixiaowei02 commented Jun 19, 2025

Uh oh!

juney-nvidia commented Jun 19, 2025

Uh oh!

tensorrt-cicd commented Jun 19, 2025

Uh oh!

juney-nvidia commented Jun 19, 2025

Uh oh!

tensorrt-cicd commented Jun 19, 2025

Uh oh!

tensorrt-cicd commented Jun 19, 2025

Uh oh!

juney-nvidia commented Jun 19, 2025

Uh oh!

tensorrt-cicd commented Jun 19, 2025

Uh oh!

tensorrt-cicd commented Jun 19, 2025

Uh oh!

juney-nvidia commented Jun 19, 2025

Uh oh!

tensorrt-cicd commented Jun 19, 2025

Uh oh!

tensorrt-cicd commented Jun 19, 2025

Uh oh!

Uh oh!

Uh oh!

blog: Disaggregated Serving in TensorRT-LLM #5353

blog: Disaggregated Serving in TensorRT-LLM #5353

Uh oh!

Conversation

Shixiaowei02 commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Shixiaowei02 commented Jun 19, 2025

Uh oh!

juney-nvidia commented Jun 19, 2025

Uh oh!

tensorrt-cicd commented Jun 19, 2025

Uh oh!

juney-nvidia commented Jun 19, 2025

Uh oh!

tensorrt-cicd commented Jun 19, 2025

Uh oh!

tensorrt-cicd commented Jun 19, 2025

Uh oh!

juney-nvidia commented Jun 19, 2025

Uh oh!

tensorrt-cicd commented Jun 19, 2025

Uh oh!

tensorrt-cicd commented Jun 19, 2025

Uh oh!

juney-nvidia commented Jun 19, 2025

Uh oh!

tensorrt-cicd commented Jun 19, 2025

Uh oh!

tensorrt-cicd commented Jun 19, 2025

Uh oh!

Uh oh!

Uh oh!

Shixiaowei02 commented Jun 19, 2025 •

edited

Loading