[None][chore] Mass integration of release/1.0 - 4th (release/1.0 doc change mainly) #7607

dominicshanshan · 2025-09-08T07:48:27Z

Summary by CodeRabbit

New Features
- Introduced trtllm-eval CLI for offline accuracy evaluation.
- Added a public BuildConfig for engine build configuration.
- Exposed LoRARequest in the LLM API.
Documentation
- Rebranded “TensorRT-LLM” to “TensorRT LLM” and overhauled navigation/installation.
- Added/expanded guides: AutoDeploy (and advanced), Disaggregated Serving, Attention backends, KV Cache, Long-sequence, Overlap Scheduler, IFB/Scheduler, Parallel Strategies, Quantization, Sampling, Speculative Decoding, Multi‑modality.
- New model support/feature matrices, deployment recipes, and benchmarking/profiling guides.
- Updated LoRA docs with DoRA scales in the weights format and expanded usage examples.
- Refreshed architecture overview and blogs.
Known Issues
- Unresolved merge markers present in the Llama 4 Scout quick‑start doc.

Description

Only cherry-pick #6696, #7549, #7554 @nv-guomingz for massive doc change in release/1.0 branch.

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

coderabbitai · 2025-09-08T07:51:56Z

📝 Walkthrough

Walkthrough

Extensive documentation reorganization and branding updates across the docs tree, addition of many new feature/how-to pages, and updates to deployment guides. Code changes introduce a new public BuildConfig dataclass in tensorrt_llm/builder.py and add LoRARequest to tensorrt_llm.llmapi exports.

Changes

Cohort / File(s)	Summary
Branding rename (TensorRT‑LLM → TensorRT LLM) `docs/source/*/.md`, `docs/source/*/.rst`, `docs/source/blogs/`, `docs/source/torch/.md`	Consistent renaming in titles, headings, prose, and captions; links and anchors adjusted where applicable. No logic changes.
Docs IA overhaul & index/nav updates `docs/source/index.rst`, `docs/source/overview.md`, `docs/source/installation/index.rst`, `docs/source/installation/containers.md`, `docs/source/installation/linux.md`, `docs/source/installation/build-from-source-linux.md`, `docs/source/reference/support-matrix.md`	Reworked ToC, consolidated installation path, added anchors, moved sections into nested toctrees; minor anchor removal in support-matrix.
Architecture & Advanced docs refresh `docs/source/architecture/.md`, `docs/source/advanced/.md`	Architecture overview rewritten; multiple pages updated/retargeted links; DoRA LoRA weights shape documented; speculative decoding cross-links adjusted.
New features and deep-dives (PyTorch backend) `docs/source/features/attention.md`, `.../kvcache.md`, `.../long-sequence.md`, `.../sampling.md`, `.../speculative-decoding.md`, `.../paged-attention-ifb-scheduler.md`, `.../parallel-strategy.md`, `.../quantization.md`, `.../multi-modality.md`, `.../feature-combination-matrix.md`, `.../disagg-serving.md`, `.../checkpoint-loading.md`, `.../overlap-scheduler.md`	Added comprehensive guides covering attention backends, KV cache system, long-sequence methods, sampling, speculative decoding, scheduler, parallel strategies, quantization, multimodality, disaggregated serving, checkpoint loading, and feature matrix.
AutoDeploy docs (new) `docs/source/features/auto_deploy/auto-deploy.md`, `.../advanced/*.md`, `docs/source/examples/dynamo_k8s_example.rst`	Introduces AutoDeploy overview, support matrix, advanced workflow/config/logging, benchmarking with trtllm-bench, and example runs; adds Kubernetes example.
KV cache examples (new) `docs/source/examples/kvcacheconfig.md`, `docs/source/examples/kvcacheretentionconfig.md`	New example guides for KvCacheConfig and KvCacheRetentionConfig with usage snippets.
Deployment guides and recipes `docs/source/deployment-guide/index.rst`, `.../quick-start-recipe-for-deepseek-r1-on-trtllm.md`, `.../quick-start-recipe-for-llama3.3-70b-on-trtllm.md`, `.../quick-start-recipe-for-llama4-scout-on-trtllm.md`	Adds Model Recipes section; content updates and formatting; one file shows unresolved merge-conflict markers.
Command docs `docs/source/commands/trtllm-bench.rst`, `docs/source/commands/trtllm-eval.rst`, `docs/source/commands/trtllm-serve/run-benchmark-with-trtllm-serve.md`	Branding updates; new trtllm-eval CLI doc with tasks, usage, and Click directive; benchmark guide wording updates.
Blogs updates `docs/source/blogs/.md`, `docs/source/blogs/tech_blog/.md`	Branding changes; select content additions (e.g., FP4 MoE notes), ToC reshuffles; link target updates; future work items added in disaggregated serving blog.
Config/Conf minor `docs/source/conf.py`	Trailing comma removed in myst_substitutions entry.
Examples link fix `examples/wide_ep/README.md`	Updated disaggregated serving troubleshooting link to new docs path.
Quick Start & Torch landing `docs/source/quick-start-guide.md`, `docs/source/torch.md`, `docs/source/architecture/checkpoint.md`, `docs/source/architecture/overview.md`	Quick Start restructured, ports exposed in example; checkpoint/overview branding and narrative revamp.
LLM API export change `tensorrt_llm/llmapi/__init__.py`	Adds `LoRARequest` to public exports.
Build pipeline config (new public dataclass) `tensorrt_llm/builder.py`	Adds `BuildConfig` with many fields and helpers (serialization, defaults, KV cache type reconciliation, updates, JSON loading). Intended as structured input for engine building.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

[None][chore] Mass integration of release/1.0 - 3rd #7519 — Also introduces/exports BuildConfig and integrates it with LLM usage.
fix: Update trtllm args issues with extra nested config (#5996) #6114 — Adjusts BuildConfig.from_dict/defaults and nested deserialization behavior; overlaps with new BuildConfig helpers.
[None][doc] add legacy section for tensorrt engine #6724 — Adds the same BuildConfig public entity and wiring in the build path.

Suggested labels

1.0_doc, Documentation

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 38

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (18)

docs/source/features/checkpoint-loading.md (1)
319-327: Typo: “asscoiated” → “associated”.

Fix spelling.
-By setting the model name, the registered mapper will be asscoiated with the specific model.
+By setting the model name, the registered mapper will be associated with the specific model.
docs/source/blogs/tech_blog/blog9_Deploying_GPT_OSS_on_TRTLLM.md (5)
79-98: Low‑latency 1x GPU example uses undefined ${num_gpus} and incorrect TP/EP.

For a 1x GPU block, set TP=1 and EP=1; don’t reference ${num_gpus} here.

Apply:
-    --tp ${num_gpus} \
-    --ep 1 \
+    --tp 1 \
+    --ep 1 \
Optional: show max_batch_size=1 for true minimal latency.

138-146: Max‑throughput block label vs content mismatch.

This block title says “1x B200/GB200/H200” but uses num_gpus=8 and --tp/--ep ${num_gpus}. Rename the summary to “8x …” or set num_gpus=1 and adjust flags.

Apply (if intended to be 8 GPUs):
-<details open> <summary>1x B200/GB200/H200</summary>
+<details open> <summary>8x B200/GB200/H200</summary>
172-191: Section title branding + single‑rank serve example uses TP=8/EP=8 with mpirun -n 1.

Use “TensorRT LLM” branding.

With -n 1, --tp_size and --ep_size must both be 1. Either increase -n to tp*ep or reduce sizes to 1 for the single‑GPU example.

Apply:
-## Launch the TensorRT-LLM Server
+## Launch the TensorRT LLM Server
@@
-mpirun -n 1 --oversubscribe --allow-run-as-root \
-trtllm-serve  openai/gpt-oss-120b \
+mpirun -n 1 --oversubscribe --allow-run-as-root \
+trtllm-serve openai/gpt-oss-120b \
@@
-  --tp_size 8 \
-  --ep_size 8 \
-  --max_batch_size 640 \
+  --tp_size 1 \
+  --ep_size 1 \
+  --max_batch_size ${max_batch_size} \
If you want to show an 8‑GPU serve example, add a separate details block with mpirun -n 8 and --tp_size/--ep_size that multiply to 8.

268-336: Sanitize the example response (remove internal “analysis” content).

The sample JSON embeds non‑API “analysis” text and very long content. Replace with a short, realistic OpenAI‑compatible response.

Apply:
-```bash
-{ ... very long object with internal analysis ... }
-```
+```json
+{
+  "id": "chatcmpl-123",
+  "object": "chat.completion",
+  "created": 1754358426,
+  "model": "openai/gpt-oss-120b",
+  "choices": [
+    {
+      "index": 0,
+      "message": {
+        "role": "assistant",
+        "content": "NVIDIA’s inference advantage comes from Tensor Cores, an optimized software stack (TensorRT + Triton), and high-bandwidth interconnects (NVLink/NVSwitch) that deliver low latency and high throughput at scale."
+      },
+      "finish_reason": "stop"
+    }
+  ],
+  "usage": {
+    "prompt_tokens": 17,
+    "completion_tokens": 42,
+    "total_tokens": 59
+  }
+}
+```
344-356: Remove outdated, contradictory MoE section (duplicates with conflicting guidance).

This “(H200/H100 Only)” block contradicts earlier guidance (CUTLASS used for throughput; TRITON recommended for H200). It should be deleted or reconciled in one canonical section above.

Apply:
-## (H200/H100 Only) Using OpenAI Triton Kernels for MoE
-...
-  backend: TRITON
-```
+
docs/source/examples/customization.md (2)
7-13: Import LLM in the quantization snippet

LLM is used but not imported in this snippet.
-from tensorrt_llm.llmapi import QuantConfig, QuantAlgo
+from tensorrt_llm.llmapi import LLM, QuantConfig, QuantAlgo
90-96: Inconsistent API: skip_tokenizer_init belongs to LLM(), not generate()

The text says to pass skip_tokenizer_init=True when creating LLM, but the code passes it to generate(). Align the example with the actual API.
-llm = LLM(<llama_model_path>)
-for output in llm.generate([[32, 12]], skip_tokenizer_init=True):
+llm = LLM(<llama_model_path>, skip_tokenizer_init=True)
+for output in llm.generate([[32, 12]]):
     print(output)
docs/source/blogs/tech_blog/blog4_Scaling_Expert_Parallelism_in_TensorRT-LLM.md (1)
230-231: Fix “Placement” typos in headings/body.

There are split words “Placemen t” in headings/body. Replace with “Placement”.
-  * Orchestrate the process (**Update Weights \& Placemen**t component)
+  * Orchestrate the process (**Update Weights & Placement** component)
-For the **Update Weights \& Placemen**t component, we identified two design choices:
+For the **Update Weights & Placement** component, we identified two design choices:
Also applies to: 241-247
docs/source/blogs/Best_perf_practice_on_DeepSeek-R1_in_TensorRT-LLM.md (1)
161-169: Sentence fragment in “Explanation” bullet.

Complete the sentence for clarity.
-- `trtllm-bench`: A CLI benchmarking utility that aims to make it easier for users to reproduce our officially published. See [TensorRT LLM Benchmarking](https://nvidia.github.io/TensorRT-LLM/performance/perf-benchmarking.html) for details.
+- `trtllm-bench`: A CLI benchmarking utility that helps users reproduce our officially published results. See [TensorRT LLM Benchmarking](https://nvidia.github.io/TensorRT-LLM/performance/perf-benchmarking.html) for details.
docs/source/models/adding-new-model.md (1)
199-204: Update example path to reflect actual directory
In docs/source/models/adding-new-model.md (lines 199–204), replace
python examples/pytorch/out_of_tree_example/main.py
with
python examples/llm-api/out_of_tree_example/main.py
and confirm that examples/llm-api/out_of_tree_example/main.py defines a main() entrypoint and runs with the current API.
tensorrt_llm/builder.py (4)
520-552: Fix dataclass types and defaults (bools, Optional).

Current types use int for booleans and non-Optional annotations with None defaults.
-    max_seq_len: int = None
+    max_seq_len: Optional[int] = None
@@
-    kv_cache_type: KVCacheType = None
-    gather_context_logits: int = False
-    gather_generation_logits: int = False
+    kv_cache_type: Optional[KVCacheType] = None
+    gather_context_logits: bool = False
+    gather_generation_logits: bool = False
@@
-    force_num_profiles: Optional[int] = None
+    force_num_profiles: Optional[int] = None
@@
-    input_timing_cache: str = None
+    input_timing_cache: Optional[str] = None
@@
-    visualize_network: str = None
+    visualize_network: Optional[str] = None
734-743: Make update_from_dict robust: convert enums and merge nested configs.

Avoids breaking types when users pass strings/ints or nested dicts.
-    def update_from_dict(self, config: dict):
-        for name, value in config.items():
-            if not hasattr(self, name):
-                raise AttributeError(
-                    f"{self.__class__} object has no attribute {name}")
-            setattr(self, name, value)
+    def update_from_dict(self, config: dict):
+        for name, value in config.items():
+            if name == "plugin_config" and isinstance(value, dict):
+                self.plugin_config.update_from_dict(value)
+                continue
+            if name == "lora_config" and isinstance(value, dict):
+                self.lora_config.update_from_dict(value)
+                continue
+            if name == "auto_parallel_config" and isinstance(value, dict):
+                self.auto_parallel_config.update_from_dict(value)
+                continue
+            if name == "kv_cache_type":
+                if value is None or isinstance(value, KVCacheType):
+                    self.kv_cache_type = value
+                else:
+                    self.kv_cache_type = KVCacheType.from_string(str(value))
+                continue
+            if name == "speculative_decoding_mode":
+                if isinstance(value, SpeculativeDecodingMode):
+                    self.speculative_decoding_mode = value
+                else:
+                    # accept int or name
+                    from enum import IntEnum
+                    try:
+                        self.speculative_decoding_mode = SpeculativeDecodingMode(value)
+                    except Exception:
+                        self.speculative_decoding_mode = SpeculativeDecodingMode[str(value)]
+                continue
+            if not hasattr(self, name):
+                raise AttributeError(f"{self.__class__} object has no attribute {name}")
+            setattr(self, name, value)
1081-1097: Add missing dtype mapping for managed weights deserialization.

Reading int8 managed weights fails with “Unsupported dtype: I8”.
-            elif dtype == "I32":
+            elif dtype == "I32":
                 dtype = np.int32
+            elif dtype == "I8":
+                dtype = np.int8
             else:
                 raise RuntimeError(f"Unsupported dtype: {dtype}")
1247-1259: Remove outdated quantization restrictions for SM≥100
Blackwell (SM≥100) now supports INT8/INT4 weight-only and SmoothQuant workflows in TensorRT-LLM (added in v0.17). Drop or revise these RuntimeError checks in tensorrt_llm/builder.py (lines 1247–1259) to allow these quant modes.
docs/source/deployment-guide/quick-start-recipe-for-deepseek-r1-on-trtllm.md (2)
233-258: Fix inconsistent sample response and branding.

The prose claims the response begins “New York is a state ...” but the JSON shows unrelated text. This confuses users validating their setup.

Replace the example payload with output that matches the prompt and keep it short:
-Here is an example response, showing that the TRT-LLM server returns “New York is a state located in the northeastern United States. It is bordered by”, completing the input sequence.
+Here is an example response showing the TensorRT LLM server completion for the prompt.
-{"id":"cmpl-...","object":"text_completion","created":1754294810,"model":"deepseek-ai/DeepSeek-R1-0528","choices":[{"index":0,"text":" / by Megan Stine ; illustrated by John Hinderliter.\n\nBook | Gross","token_ids":null,"logprobs":null,"context_logits":null,"finish_reason":"length","stop_reason":null,"disaggregated_params":null}],"usage":{"prompt_tokens":6,"total_tokens":22,"completion_tokens":16},"prompt_token_ids":null}
+{"id":"cmpl-...","object":"text_completion","created": 1754294810,"model":"deepseek-ai/DeepSeek-R1-0528","choices":[{"index":0,"text":"New York is a state in the northeastern United States.","finish_reason":"length"}],"usage":{"prompt_tokens":6,"completion_tokens":16,"total_tokens":22}}
263-263: Fix PyTorch docs URL.

The path has “docs” twice.
-https://docs.pytorch.org/docs/stable/notes/cuda.html#optimizing-memory-usage-with-pytorch-cuda-alloc-conf
+https://pytorch.org/docs/stable/notes/cuda.html#optimizing-memory-usage-with-pytorch-cuda-alloc-conf
docs/source/deployment-guide/quick-start-recipe-for-llama3.3-70b-on-trtllm.md (1)
33-44: Update to latest published NGC image tag
docs/source/deployment-guide/quick-start-recipe-for-llama3.3-70b-on-trtllm.md (lines 33–44, 46–53): replace
nvcr.io/nvidia/tensorrt-llm/release:1.0.0rc6
with
nvcr.io/nvidia/tensorrt-llm/release:1.1.0rc3
1.1.0rc3 is the current latest published release on NGC (catalog.ngc.nvidia.com, github.com)

docs/source/advanced/speculative-decoding.md

docs/source/architecture/checkpoint.md

docs/source/blogs/H100vsA100.md

docs/source/blogs/tech_blog/blog2_DeepSeek_R1_MTP_Implementation_and_Optimization.md

docs/source/blogs/tech_blog/blog9_Deploying_GPT_OSS_on_TRTLLM.md

docs/source/installation/build-from-source-linux.md

docs/source/overview.md

docs/source/quick-start-guide.md

tensorrt_llm/llmapi/__init__.py

docs/source/architecture/add-model.md

docs/source/deployment-guide/index.rst

docs/source/models/supported-models.md

docs/source/features/disagg-serving.md

nv-guomingz

Overall LGTM. @dominicshanshan please trigger the weekly release process once those comments addressed, thanks.

Signed-off-by: nv-guomingz <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

nv-guomingz · 2025-09-09T03:25:06Z

/bot skip --comment "docs change only"

tensorrt-cicd · 2025-09-09T03:31:00Z

PR_Github #18140 [ skip ] triggered by Bot

tensorrt-cicd · 2025-09-09T03:55:21Z

PR_Github #18140 [ skip ] completed with state SUCCESS
Skipping testing for commit b6d67ad

Signed-off-by: nv-guomingz <[email protected]>

Signed-off-by: nv-guomingz <[email protected]> Signed-off-by: Gergely Magyar <[email protected]>

Signed-off-by: nv-guomingz <[email protected]>

dominicshanshan requested review from a team as code owners September 8, 2025 07:48

dominicshanshan requested review from Superjomn, laikhtewari, QiJune and nv-guomingz September 8, 2025 07:48

dominicshanshan force-pushed the mi-release-1.0-4 branch from 8e6ccaa to 2537515 Compare September 8, 2025 07:49

nv-guomingz added the weekly_release_blocker label Sep 8, 2025

coderabbitai bot reviewed Sep 8, 2025

View reviewed changes

Superjomn reviewed Sep 8, 2025

View reviewed changes

docs/source/architecture/add-model.md Show resolved Hide resolved

nv-guomingz reviewed Sep 8, 2025

View reviewed changes

docs/source/deployment-guide/index.rst Show resolved Hide resolved

dominicshanshan force-pushed the mi-release-1.0-4 branch 4 times, most recently from b1ba8ae to d22f5df Compare September 8, 2025 13:06

nv-guomingz reviewed Sep 8, 2025

View reviewed changes

docs/source/models/supported-models.md Show resolved Hide resolved

nv-guomingz reviewed Sep 8, 2025

View reviewed changes

docs/source/features/disagg-serving.md Show resolved Hide resolved

nv-guomingz reviewed Sep 8, 2025

View reviewed changes

docs/source/features/disagg-serving.md Outdated Show resolved Hide resolved

nv-guomingz approved these changes Sep 8, 2025

View reviewed changes

QiJune approved these changes Sep 8, 2025

View reviewed changes

dominicshanshan force-pushed the mi-release-1.0-4 branch 4 times, most recently from 660c9fc to e736627 Compare September 9, 2025 03:15

nv-guomingz force-pushed the mi-release-1.0-4 branch from e736627 to 8e63a3f Compare September 9, 2025 03:16

nv-guomingz added 3 commits September 8, 2025 20:20

[TRTLLM-5930][doc] 1.0 Documentation. (NVIDIA#6696)

ccf7d42

Signed-off-by: nv-guomingz <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

[None][doc] Update kvcache part (NVIDIA#7549)

15bd310

Signed-off-by: nv-guomingz <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

[None][doc] Rename TensorRT-LLM to TensorRT LLM. (NVIDIA#7554)

b6d67ad

Signed-off-by: nv-guomingz <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

dominicshanshan force-pushed the mi-release-1.0-4 branch from 8e63a3f to b6d67ad Compare September 9, 2025 03:20

nv-guomingz merged commit 7f3f658 into NVIDIA:main Sep 9, 2025
5 checks passed

nv-guomingz added a commit to nv-guomingz/TensorRT-LLM that referenced this pull request Sep 9, 2025

[None][fix] add the missing import raised by NVIDIA#7607

0d4ec9f

Signed-off-by: nv-guomingz <[email protected]>

coderabbitai bot mentioned this pull request Sep 9, 2025

[None][fix] add the missing import raised by #7607 #7639

Merged

1 task

chzblych pushed a commit that referenced this pull request Sep 9, 2025

[None][fix] add the missing import raised by #7607 (#7639)

62b564a

Signed-off-by: nv-guomingz <[email protected]>

dominicshanshan mentioned this pull request Sep 9, 2025

[None][chore] Mass integration of release/1.0 - 5th #7640

Merged

1 task

gergely-magyar pushed a commit to gergely-magyar/TensorRT-LLM that referenced this pull request Sep 9, 2025

[None][fix] add the missing import raised by NVIDIA#7607 (NVIDIA#7639)

5ebda2b

Signed-off-by: nv-guomingz <[email protected]> Signed-off-by: Gergely Magyar <[email protected]>

coderabbitai bot mentioned this pull request Sep 19, 2025

[None][doc] Rename TensorRT-LLM to TensorRT LLM for homepage and the … #7850

Merged

1 task

Wong4j pushed a commit to Wong4j/TensorRT-LLM that referenced this pull request Sep 20, 2025

[None][fix] add the missing import raised by NVIDIA#7607 (NVIDIA#7639)

a28ca8c

Signed-off-by: nv-guomingz <[email protected]>

[None][chore] Mass integration of release/1.0 - 4th (release/1.0 doc change mainly) #7607

[None][chore] Mass integration of release/1.0 - 4th (release/1.0 doc change mainly) #7607

Conversation

dominicshanshan commented Sep 8, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

coderabbitai bot commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nv-guomingz left a comment

Choose a reason for hiding this comment

Uh oh!

nv-guomingz commented Sep 9, 2025

Uh oh!

tensorrt-cicd commented Sep 9, 2025

Uh oh!

tensorrt-cicd commented Sep 9, 2025

Uh oh!

Uh oh!

Uh oh!

dominicshanshan commented Sep 8, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 8, 2025 •

edited

Loading