-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[TRTLLM-5990][doc] trtllm-serve doc improvement. #5220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TRTLLM-5990][doc] trtllm-serve doc improvement. #5220
Conversation
6a7dc39
to
ff45cc3
Compare
82426cc
to
d3e3ca9
Compare
4efa360
to
9f21f45
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances the TensorRT-LLM documentation by clarifying Quick Start steps, refining the trtllm-serve
section, and introducing a dedicated benchmarking guide.
- Added explicit Docker container launch instructions in the Quick Start Guide
- Updated
trtllm-serve
doc to focus on online serving and link to benchmarking - Created
trtllm-serve-bench.md
for an end-to-end performance benchmarking workflow
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
File | Description |
---|---|
docs/source/quick-start-guide.md | Added Docker run example, corrected trtllm-serve instructions, and cleaned up old content |
docs/source/index.rst | Updated toctree entry to point at commands/trtllm-serve/index |
docs/source/commands/trtllm-serve/trtllm-serve.rst | Removed inline benchmark section and added a note linking to the new bench doc |
docs/source/commands/trtllm-serve/trtllm-serve-bench.md | New benchmarking guide for Llama 3.1 70B online serving performance |
Comments suppressed due to low confidence (9)
docs/source/quick-start-guide.md:37
- The command name is inconsistent ("trtllm-server" vs "trtllm-serve"); update to use the correct "trtllm-serve" command.
> If you are running `trtllm-server` inside a Docker container, you have two options for sending API requests:
docs/source/quick-start-guide.md:23
- Empty link text is used. Provide descriptive link text, for example
[LLM API docs](llm-api/index)
and[LLM API examples](examples/llm_api_examples)
.
To learn more about the LLM API, check out the [](llm-api/index) and [](examples/llm_api_examples).
docs/source/commands/trtllm-serve/trtllm-serve.rst:192
- Backtick usage inside a link breaks syntax. Use
[Performance Benchmarking with trtllm-serve](<URL>)
without nested backticks.
Please refer to `Performance Benchmarking with `trtllm-serve` <https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/commands/trtllm-serve/trtllm-serve-bench.md>` for more details.
docs/source/commands/trtllm-serve/trtllm-serve-bench.md:48
- Replace the placeholder "max_num_tokens" with a concrete numeric example or clearly denote it as a placeholder (e.g.,
<max_num_tokens>
).
--max_num_tokens max_num_tokens \
docs/source/commands/trtllm-serve/trtllm-serve-bench.md:49
- Use an explicit integer for
--max_seq_len
(e.g.,2048
) instead of shorthand "2k" to avoid ambiguity.
--max_seq_len 2k \
docs/source/commands/trtllm-serve/trtllm-serve-bench.md:125
- Add a space after the period: "CUDA graphs. Default value is
0
."
* cuda_graph_max_batch_size: Maximum batch size for CUDA graphs.Default value is `0`.
docs/source/commands/trtllm-serve/trtllm-serve-bench.md:127
- Insert a space after the period and make "Value" lowercase: "enabled. Default value is
True
."
* autotuner_enabled: Enable autotuner only when torch compile is enabled.Default Value is `True`.
docs/source/commands/trtllm-serve/trtllm-serve-bench.md:129
- Add a space after the colon: "attention_backend: Attention backend to use. Default value is
TRTLLM
."
* attention_backend:Attention backend to use. Default value is `TRTLLM`.
docs/source/commands/trtllm-serve/trtllm-serve-bench.md:103
- The redirect syntax
&<
is invalid; likely intended&> output_bench.log
to capture both stdout and stderr.
bash -x bench.sh &< output_bench.log
3c1b966
to
2e9a526
Compare
795c15b
to
70070f5
Compare
/bot run --stage-list "A10-Build_Docs" |
PR_Github #13979 [ run ] triggered by Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR_Github #13979 [ run ] completed with state |
70070f5
to
e6d4354
Compare
Signed-off-by: nv-guomingz <[email protected]>
e6d4354
to
ed88f62
Compare
/bot skip --comment "doc build phase already pass" |
PR_Github #14080 [ skip ] triggered by Bot |
PR_Github #14080 [ skip ] completed with state |
Signed-off-by: nv-guomingz <[email protected]> Signed-off-by: Lanyu Liao <[email protected]>
Signed-off-by: nv-guomingz <[email protected]>
This PR made below changes
Summary by CodeRabbit