[TRTLLM-5990][doc] trtllm-serve doc improvement. #5220

nv-guomingz · 2025-06-15T16:32:00Z

This PR made below changes

Modifying the Quickstart.md by cherry-picking doc: Include NGC release containers in quick-start-guide.md #5334 and adding explicit instructions on launch docker container
Adjust the trtllm-serve part
- Remove the benchmark part(using genai) from original trtllm-serve.rst
- Add the e2e example for using trtllm-serve plus benchmarking_serving.py to benchmark llama 3.1 70B model
- Add the brief introduction for extra_llm_api_options usage.

Summary by CodeRabbit

Documentation
- Added a comprehensive guide for benchmarking the Llama 3.1 70B model using the OpenAI-compatible API.
- Updated the Quick Start Guide with detailed Docker usage instructions and clarified deployment steps for online serving.
- Streamlined documentation by removing in-depth Model Definition API instructions and directing users to dedicated serve command documentation.
- Improved navigation and organization by updating references and linking to new benchmarking resources.

docs/source/commands/trtllm-serve.rst

docs/source/commands/trtllm-serve/trtllm-serve-bench.md

Copilot

Pull Request Overview

This PR enhances the TensorRT-LLM documentation by clarifying Quick Start steps, refining the trtllm-serve section, and introducing a dedicated benchmarking guide.

Added explicit Docker container launch instructions in the Quick Start Guide
Updated trtllm-serve doc to focus on online serving and link to benchmarking
Created trtllm-serve-bench.md for an end-to-end performance benchmarking workflow

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File	Description
docs/source/quick-start-guide.md	Added Docker run example, corrected `trtllm-serve` instructions, and cleaned up old content
docs/source/index.rst	Updated toctree entry to point at `commands/trtllm-serve/index`
docs/source/commands/trtllm-serve/trtllm-serve.rst	Removed inline benchmark section and added a note linking to the new bench doc
docs/source/commands/trtllm-serve/trtllm-serve-bench.md	New benchmarking guide for Llama 3.1 70B online serving performance

Comments suppressed due to low confidence (9)

docs/source/quick-start-guide.md:37

The command name is inconsistent ("trtllm-server" vs "trtllm-serve"); update to use the correct "trtllm-serve" command.

> If you are running `trtllm-server` inside a Docker container, you have two options for sending API requests:

docs/source/quick-start-guide.md:23

Empty link text is used. Provide descriptive link text, for example [LLM API docs](llm-api/index) and [LLM API examples](examples/llm_api_examples).

To learn more about the LLM API, check out the [](llm-api/index) and [](examples/llm_api_examples).

docs/source/commands/trtllm-serve/trtllm-serve.rst:192

Backtick usage inside a link breaks syntax. Use [Performance Benchmarking with trtllm-serve](<URL>) without nested backticks.

Please refer to `Performance Benchmarking with `trtllm-serve` <https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/commands/trtllm-serve/trtllm-serve-bench.md>` for more details.

docs/source/commands/trtllm-serve/trtllm-serve-bench.md:48

Replace the placeholder "max_num_tokens" with a concrete numeric example or clearly denote it as a placeholder (e.g., <max_num_tokens>).

    --max_num_tokens max_num_tokens \

docs/source/commands/trtllm-serve/trtllm-serve-bench.md:49

Use an explicit integer for --max_seq_len (e.g., 2048) instead of shorthand "2k" to avoid ambiguity.

    --max_seq_len 2k \

docs/source/commands/trtllm-serve/trtllm-serve-bench.md:125

Add a space after the period: "CUDA graphs. Default value is 0."

   * cuda_graph_max_batch_size: Maximum batch size for CUDA graphs.Default value is `0`.

docs/source/commands/trtllm-serve/trtllm-serve-bench.md:127

Insert a space after the period and make "Value" lowercase: "enabled. Default value is True."

   * autotuner_enabled: Enable autotuner only when torch compile is enabled.Default Value is `True`.

docs/source/commands/trtllm-serve/trtllm-serve-bench.md:129

Add a space after the colon: "attention_backend: Attention backend to use. Default value is TRTLLM."

   * attention_backend:Attention backend to use. Default value is `TRTLLM`.

docs/source/commands/trtllm-serve/trtllm-serve-bench.md:103

The redirect syntax &< is invalid; likely intended &> output_bench.log to capture both stdout and stderr.

bash -x bench.sh &< output_bench.log

docs/source/quick-start-guide.md

docs/source/commands/trtllm-serve/trtllm-serve.rst

docs/source/commands/trtllm-serve/trtllm-serve-bench.md

nv-guomingz · 2025-08-04T09:40:23Z

/bot run --stage-list "A10-Build_Docs"

tensorrt-cicd · 2025-08-04T09:45:58Z

PR_Github #13979 [ run ] triggered by Bot

LinPoly

LGTM

tensorrt-cicd · 2025-08-04T10:27:08Z

PR_Github #13979 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #10531 (Partly Tested) completed with status: 'SUCCESS'

Signed-off-by: nv-guomingz <[email protected]>

nv-guomingz · 2025-08-05T04:23:11Z

/bot skip --comment "doc build phase already pass"

tensorrt-cicd · 2025-08-05T04:28:22Z

PR_Github #14080 [ skip ] triggered by Bot

tensorrt-cicd · 2025-08-05T04:41:48Z

PR_Github #14080 [ skip ] completed with state SUCCESS
Skipping testing for commit ed88f62

Signed-off-by: nv-guomingz <[email protected]> Signed-off-by: Lanyu Liao <[email protected]>

Signed-off-by: nv-guomingz <[email protected]>

juney-nvidia reviewed Jun 15, 2025

View reviewed changes

docs/source/commands/trtllm-serve.rst Outdated Show resolved Hide resolved

nv-guomingz force-pushed the user/guomingz/trtllm-serve-eou branch 4 times, most recently from 6a7dc39 to ff45cc3 Compare June 17, 2025 07:59

nv-guomingz force-pushed the user/guomingz/trtllm-serve-eou branch 12 times, most recently from 82426cc to d3e3ca9 Compare June 30, 2025 02:45

nv-guomingz marked this pull request as ready for review June 30, 2025 02:45

nv-guomingz force-pushed the user/guomingz/trtllm-serve-eou branch 2 times, most recently from 4efa360 to 9f21f45 Compare June 30, 2025 02:59

nv-guomingz changed the title ~~doc:trtllm-server doc improvement.~~ [TRTLLM-5990]doc:trtllm-server doc improvement. Jun 30, 2025

kaiyux requested changes Jun 30, 2025

View reviewed changes

kaiyux reviewed Jun 30, 2025

View reviewed changes

docs/source/commands/trtllm-serve/trtllm-serve-bench.md Outdated Show resolved Hide resolved

nv-guomingz requested a review from kaiyux June 30, 2025 06:24

kaiyux reviewed Jun 30, 2025

View reviewed changes

docs/source/commands/trtllm-serve/trtllm-serve-bench.md Outdated Show resolved Hide resolved

kaiyux reviewed Jun 30, 2025

View reviewed changes

docs/source/commands/trtllm-serve/trtllm-serve-bench.md Outdated Show resolved Hide resolved

kaiyux requested changes Jun 30, 2025

View reviewed changes

docs/source/commands/trtllm-serve/trtllm-serve-bench.md Outdated Show resolved Hide resolved

kaiyux requested a review from Copilot June 30, 2025 06:29

Copilot AI reviewed Jun 30, 2025

View reviewed changes

kaiyux changed the title ~~[TRTLLM-5990]doc:trtllm-server doc improvement.~~ [TRTLLM-5990]doc:trtllm-serve doc improvement. Jun 30, 2025

nv-guomingz force-pushed the user/guomingz/trtllm-serve-eou branch 2 times, most recently from 3c1b966 to 2e9a526 Compare August 1, 2025 05:26

kaiyux approved these changes Aug 1, 2025

View reviewed changes

nv-guomingz force-pushed the user/guomingz/trtllm-serve-eou branch 2 times, most recently from 795c15b to 70070f5 Compare August 4, 2025 09:28

nv-guomingz changed the title ~~[TRTLLM-5990]doc:trtllm-serve doc improvement.~~ [TRTLLM-5990][doc]:trtllm-serve doc improvement. Aug 4, 2025

nv-guomingz changed the title ~~[TRTLLM-5990][doc]:trtllm-serve doc improvement.~~ [TRTLLM-5990][doc] trtllm-serve doc improvement. Aug 4, 2025

MartinMarciniszyn approved these changes Aug 4, 2025

View reviewed changes

nv-guomingz requested a review from LinPoly August 4, 2025 09:37

LinPoly approved these changes Aug 4, 2025

View reviewed changes

nv-guomingz force-pushed the user/guomingz/trtllm-serve-eou branch from 70070f5 to e6d4354 Compare August 5, 2025 04:15

[TRTLLM-5990][doc] trtllm-serve doc improvement.

ed88f62

Signed-off-by: nv-guomingz <[email protected]>

nv-guomingz force-pushed the user/guomingz/trtllm-serve-eou branch from e6d4354 to ed88f62 Compare August 5, 2025 04:22

nv-guomingz enabled auto-merge (squash) August 5, 2025 04:28

nv-guomingz merged commit db51ab1 into NVIDIA:main Aug 5, 2025
4 checks passed

nv-guomingz deleted the user/guomingz/trtllm-serve-eou branch August 5, 2025 05:04

coderabbitai bot mentioned this pull request Aug 5, 2025

[None][doc] Add llama4 hybrid guide #6640

Merged

lancelly pushed a commit to lancelly/TensorRT-LLM that referenced this pull request Aug 6, 2025

[TRTLLM-5990][doc] trtllm-serve doc improvement. (NVIDIA#5220)

3c11dbe

Signed-off-by: nv-guomingz <[email protected]> Signed-off-by: Lanyu Liao <[email protected]>

This was referenced Aug 6, 2025

[None][doc] Create deployment guide for Llama4 Scout FP8 and NVFP4 #6550

Merged

[TRTLLM-6859][doc] Add DeepSeek R1 deployment guide. #6579

Merged

jain-ria pushed a commit to jain-ria/TensorRT-LLM that referenced this pull request Aug 7, 2025

[TRTLLM-5990][doc] trtllm-serve doc improvement. (NVIDIA#5220)

454d224

Signed-off-by: nv-guomingz <[email protected]>

This was referenced Aug 14, 2025

[None][chore] fix markdown format for the deployment guide #6879

Merged

[None][doc] Update gpt oss doc #6954

Merged

[None][doc] update v1.0 doc for trtllm-serve #7056

Merged

[TRTLLM-5990][doc] trtllm-serve doc improvement. #5220

[TRTLLM-5990][doc] trtllm-serve doc improvement. #5220

Uh oh!

Conversation

nv-guomingz commented Jun 15, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nv-guomingz commented Aug 4, 2025

Uh oh!

tensorrt-cicd commented Aug 4, 2025

Uh oh!

LinPoly left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Aug 4, 2025

Uh oh!

nv-guomingz commented Aug 5, 2025

Uh oh!

tensorrt-cicd commented Aug 5, 2025

Uh oh!

tensorrt-cicd commented Aug 5, 2025

Uh oh!

Uh oh!

Uh oh!

nv-guomingz commented Jun 15, 2025 •

edited by coderabbitai bot

Loading