Skip to content

Conversation

nv-guomingz
Copy link
Collaborator

@nv-guomingz nv-guomingz commented Jun 15, 2025

This PR made below changes

  • Modifying the Quickstart.md by cherry-picking doc: Include NGC release containers in quick-start-guide.md #5334 and adding explicit instructions on launch docker container
  • Adjust the trtllm-serve part
    • Remove the benchmark part(using genai) from original trtllm-serve.rst
    • Add the e2e example for using trtllm-serve plus benchmarking_serving.py to benchmark llama 3.1 70B model
    • Add the brief introduction for extra_llm_api_options usage.

Summary by CodeRabbit

  • Documentation
    • Added a comprehensive guide for benchmarking the Llama 3.1 70B model using the OpenAI-compatible API.
    • Updated the Quick Start Guide with detailed Docker usage instructions and clarified deployment steps for online serving.
    • Streamlined documentation by removing in-depth Model Definition API instructions and directing users to dedicated serve command documentation.
    • Improved navigation and organization by updating references and linking to new benchmarking resources.

@nv-guomingz nv-guomingz force-pushed the user/guomingz/trtllm-serve-eou branch 4 times, most recently from 6a7dc39 to ff45cc3 Compare June 17, 2025 07:59
@nv-guomingz nv-guomingz force-pushed the user/guomingz/trtllm-serve-eou branch 12 times, most recently from 82426cc to d3e3ca9 Compare June 30, 2025 02:45
@nv-guomingz nv-guomingz marked this pull request as ready for review June 30, 2025 02:45
@nv-guomingz nv-guomingz force-pushed the user/guomingz/trtllm-serve-eou branch 2 times, most recently from 4efa360 to 9f21f45 Compare June 30, 2025 02:59
@nv-guomingz nv-guomingz changed the title doc:trtllm-server doc improvement. [TRTLLM-5990]doc:trtllm-server doc improvement. Jun 30, 2025
@nv-guomingz nv-guomingz requested a review from kaiyux June 30, 2025 06:24
@kaiyux kaiyux requested a review from Copilot June 30, 2025 06:29
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances the TensorRT-LLM documentation by clarifying Quick Start steps, refining the trtllm-serve section, and introducing a dedicated benchmarking guide.

  • Added explicit Docker container launch instructions in the Quick Start Guide
  • Updated trtllm-serve doc to focus on online serving and link to benchmarking
  • Created trtllm-serve-bench.md for an end-to-end performance benchmarking workflow

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
docs/source/quick-start-guide.md Added Docker run example, corrected trtllm-serve instructions, and cleaned up old content
docs/source/index.rst Updated toctree entry to point at commands/trtllm-serve/index
docs/source/commands/trtllm-serve/trtllm-serve.rst Removed inline benchmark section and added a note linking to the new bench doc
docs/source/commands/trtllm-serve/trtllm-serve-bench.md New benchmarking guide for Llama 3.1 70B online serving performance
Comments suppressed due to low confidence (9)

docs/source/quick-start-guide.md:37

  • The command name is inconsistent ("trtllm-server" vs "trtllm-serve"); update to use the correct "trtllm-serve" command.
> If you are running `trtllm-server` inside a Docker container, you have two options for sending API requests:

docs/source/quick-start-guide.md:23

  • Empty link text is used. Provide descriptive link text, for example [LLM API docs](llm-api/index) and [LLM API examples](examples/llm_api_examples).
To learn more about the LLM API, check out the [](llm-api/index) and [](examples/llm_api_examples).

docs/source/commands/trtllm-serve/trtllm-serve.rst:192

  • Backtick usage inside a link breaks syntax. Use [Performance Benchmarking with trtllm-serve](<URL>) without nested backticks.
Please refer to `Performance Benchmarking with `trtllm-serve` <https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/commands/trtllm-serve/trtllm-serve-bench.md>` for more details.

docs/source/commands/trtllm-serve/trtllm-serve-bench.md:48

  • Replace the placeholder "max_num_tokens" with a concrete numeric example or clearly denote it as a placeholder (e.g., <max_num_tokens>).
    --max_num_tokens max_num_tokens \

docs/source/commands/trtllm-serve/trtllm-serve-bench.md:49

  • Use an explicit integer for --max_seq_len (e.g., 2048) instead of shorthand "2k" to avoid ambiguity.
    --max_seq_len 2k \

docs/source/commands/trtllm-serve/trtllm-serve-bench.md:125

  • Add a space after the period: "CUDA graphs. Default value is 0."
   * cuda_graph_max_batch_size: Maximum batch size for CUDA graphs.Default value is `0`.

docs/source/commands/trtllm-serve/trtllm-serve-bench.md:127

  • Insert a space after the period and make "Value" lowercase: "enabled. Default value is True."
   * autotuner_enabled: Enable autotuner only when torch compile is enabled.Default Value is `True`.

docs/source/commands/trtllm-serve/trtllm-serve-bench.md:129

  • Add a space after the colon: "attention_backend: Attention backend to use. Default value is TRTLLM."
   * attention_backend:Attention backend to use. Default value is `TRTLLM`.

docs/source/commands/trtllm-serve/trtllm-serve-bench.md:103

  • The redirect syntax &< is invalid; likely intended &> output_bench.log to capture both stdout and stderr.
bash -x bench.sh &< output_bench.log

@kaiyux kaiyux changed the title [TRTLLM-5990]doc:trtllm-server doc improvement. [TRTLLM-5990]doc:trtllm-serve doc improvement. Jun 30, 2025
@nv-guomingz nv-guomingz force-pushed the user/guomingz/trtllm-serve-eou branch 2 times, most recently from 3c1b966 to 2e9a526 Compare August 1, 2025 05:26
@nv-guomingz nv-guomingz force-pushed the user/guomingz/trtllm-serve-eou branch 2 times, most recently from 795c15b to 70070f5 Compare August 4, 2025 09:28
@nv-guomingz nv-guomingz changed the title [TRTLLM-5990]doc:trtllm-serve doc improvement. [TRTLLM-5990][doc]:trtllm-serve doc improvement. Aug 4, 2025
@nv-guomingz nv-guomingz changed the title [TRTLLM-5990][doc]:trtllm-serve doc improvement. [TRTLLM-5990][doc] trtllm-serve doc improvement. Aug 4, 2025
@nv-guomingz nv-guomingz requested a review from LinPoly August 4, 2025 09:37
@nv-guomingz
Copy link
Collaborator Author

/bot run --stage-list "A10-Build_Docs"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #13979 [ run ] triggered by Bot

Copy link
Collaborator

@LinPoly LinPoly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tensorrt-cicd
Copy link
Collaborator

PR_Github #13979 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #10531 (Partly Tested) completed with status: 'SUCCESS'

@nv-guomingz nv-guomingz force-pushed the user/guomingz/trtllm-serve-eou branch from 70070f5 to e6d4354 Compare August 5, 2025 04:15
@nv-guomingz nv-guomingz force-pushed the user/guomingz/trtllm-serve-eou branch from e6d4354 to ed88f62 Compare August 5, 2025 04:22
@nv-guomingz
Copy link
Collaborator Author

/bot skip --comment "doc build phase already pass"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14080 [ skip ] triggered by Bot

@nv-guomingz nv-guomingz enabled auto-merge (squash) August 5, 2025 04:28
@tensorrt-cicd
Copy link
Collaborator

PR_Github #14080 [ skip ] completed with state SUCCESS
Skipping testing for commit ed88f62

@nv-guomingz nv-guomingz merged commit db51ab1 into NVIDIA:main Aug 5, 2025
4 checks passed
@nv-guomingz nv-guomingz deleted the user/guomingz/trtllm-serve-eou branch August 5, 2025 05:04
lancelly pushed a commit to lancelly/TensorRT-LLM that referenced this pull request Aug 6, 2025
jain-ria pushed a commit to jain-ria/TensorRT-LLM that referenced this pull request Aug 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants