fix: container/Dockerfile.trtllm - use pytorch 2.8.0a0+5228986c39.nv25.5 #2579

dmitry-tokarev-nv · 2025-08-20T22:45:46Z

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

New Features
- Expanded CUDA toolkit available in the runtime image (ptxas, cudafe++, fatbinary, headers, nvvm, libcudart), enabling broader GPU tooling and on-container compilation workflows.
Chores
- Consolidated all Python dependency installs to system-level across build and runtime images.
- Removed virtual environment setup to simplify the runtime environment.
- Standardized installation of CUDA/TRT components and benchmarks using system-wide installs for consistency and reliability.
- Aligned build and runtime images to ensure consistent dependency management and improved CUDA compatibility.

copy-pr-bot · 2025-08-20T22:45:49Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2025-08-20T22:50:23Z

Walkthrough

System-wide Python installation replaces virtualenv across build and runtime stages in container/Dockerfile.trtllm. CUDA toolkit assets copied into the runtime image are expanded. Triton, cuda-python, TensorRT-LLM wheels, internal wheels, tests, and benchmarks are installed via uv pip with --system and --break-system-packages.

Changes

Cohort / File(s)	Summary of changes
Docker build/runtime install model `container/Dockerfile.trtllm`	Commented out uv venv creation/activation; switched all uv pip installs (requirements, tests, internal wheels, benchmarks, Triton, cuda-python, TRT-LLM) to system-level with --system --break-system-packages; added runtime COPYs for additional CUDA tools/headers/libs (cudafe++, ptxas, fatbinary, include, libcudart, nvvm).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

feat: add runtime image for trtllm container build #1796 — Similar Dockerfile runtime changes shifting package installs and CUDA artifacts; overlaps in install strategy and toolkit copying.
fix: triton ver lock fix and trtllm env var declaration #2300 — Adjusts Triton/trtllm wheel install logic; closely related to this PR’s system-level Triton/TRT-LLM installation changes.
fix: update trtllm runtime build #1810 — Consolidates/updates ai_dynamo/nixl wheel installation steps in the runtime Dockerfile, akin to this PR’s wheel install adjustments.

Poem

A rabbit hops through Docker’s field,
No venv burrows—system’s sealed.
CUDA carrots, bright and new,
ptxas, cudafe++, in view.
Triton streams along the way,
Wheels roll smooth—uv leads the day.
Ship the crate, then bound away! 🥕🐇

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

container/Dockerfile.trtllm (2)
418-426: Potential CUDA toolchain/library version skew between build and runtime images.

You’re copying cudafe++/ptxas/fatbinary, headers, nvvm, and libcudart.so* from the build stage into the runtime. The build base (${BASE_IMAGE_TAG}=25.05) likely carries a different CUDA than the runtime (${RUNTIME_IMAGE_TAG}=12.9.0). Mixing toolchain and runtime libs can produce subtle compilation/runtime faults (e.g., PTX incompatibilities, mismatched libcudart SONAMEs).

Recommendations (pick one):

Align CUDA across stages: use a build image that matches 12.9 (or switch runtime to match build).

Don’t copy libcudart.so*; rely on the runtime image’s CUDA runtime:
# Remove this copy:
# COPY --from=build /usr/local/cuda/lib64/libcudart.so* /usr/local/cuda/lib64/
Install cuda-toolkit-12-9 directly in runtime (devel layer or apt) instead of copying binaries piecemeal.

351-354: Drop VIRTUAL_ENV/venv PATH manipulation in runtime or actually create/use it.

These envs suggest a venv, but you’ve moved to system-wide installs and commented out venv creation. Keep things unambiguous.

Minimal fix (outside selected range):
# Remove:
ENV VIRTUAL_ENV=/opt/dynamo/venv
ENV PATH="${VIRTUAL_ENV}/bin:${PATH}"

# Keep PATH explicit if needed:
ENV PATH=/opt/hpcx/ompi/bin:/usr/local/bin/etcd/:/usr/local/cuda/nvvm/bin:$PATH

🧹 Nitpick comments (3)

container/Dockerfile.trtllm (3)
410-417: System-level installs: add uv cache and verify interpreter resolution.

Switching to uv pip --system is fine in containers, but two nits:

Add a build cache mount to speed up repeated image builds.

Given VIRTUAL_ENV is still set (Lines 351–354), please verify uv picks system Python, not some unexpected interpreter.

Suggested tweaks:
 RUN --mount=type=bind,source=./container/deps/requirements.txt,target=/tmp/requirements.txt \
-    uv pip install --system --break-system-packages --requirement /tmp/requirements.txt
+    --mount=type=cache,target=/root/.cache/uv \
+    uv pip install --system --break-system-packages --requirement /tmp/requirements.txt

 RUN --mount=type=bind,source=./container/deps/requirements.test.txt,target=/tmp/requirements.txt \
-    uv pip install --system --break-system-packages --requirement /tmp/requirements.txt
+    --mount=type=cache,target=/root/.cache/uv \
+    uv pip install --system --break-system-packages --requirement /tmp/requirements.txt
If helpful, I can also switch these to uv pip sync with a constraints file to keep resolution stable across builds.

497-497: Avoid installing benchmarks into the system site in the runtime image.

This bloats the production image and risks dependency drift. Gate it behind a build-arg so CI images can opt-in.
- RUN uv pip install --system --break-system-packages /workspace/benchmarks
+ ARG INSTALL_BENCHMARKS=0
+ RUN if [ "$INSTALL_BENCHMARKS" = "1" ]; then \
+       uv pip install --system --break-system-packages /workspace/benchmarks; \
+     fi
481-489: Note on index selection and dependency pinning.

Thanks for documenting uv’s --extra-index-url precedence. For reproducibility, consider pinning TENSORRT-LLM to an exact version and adding a constraints file for transitive deps (e.g., cuda-python, packaging, jinja2) to ensure stable rebuilds.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between bc290e7 and 4e51c49.

📒 Files selected for processing (1)

container/Dockerfile.trtllm (2 hunks)

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: ptarasiewiczNV
PR: ai-dynamo/dynamo#2027
File: container/deps/vllm/install_vllm.sh:0-0
Timestamp: 2025-07-22T10:22:28.972Z
Learning: The `--torch-backend=auto` flag works with vLLM installations via uv pip install, even though it's not a standard pip option. This flag is processed by vLLM's build system during installation to automatically match PyTorch distribution with container CUDA versions.

Learnt from: nnshah1
PR: ai-dynamo/dynamo#2489
File: container/deps/vllm/install_vllm.sh:151-152
Timestamp: 2025-08-18T16:52:15.659Z
Learning: The VLLM_PRECOMPILED_WHEEL_LOCATION environment variable, when exported, automatically triggers vLLM's build system to use the precompiled wheel instead of building from source, even when using standard `uv pip install .` commands in container/deps/vllm/install_vllm.sh.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Build and Test - dynamo

🔇 Additional comments (1)

container/Dockerfile.trtllm (1)

405-406: I've initiated inspection of container/Dockerfile.trtllm around the virtualenv and PATH settings. Once the outputs are available, I'll verify whether the review comment's concerns hold and then update the comment accordingly.

container/Dockerfile.trtllm

dmitry-tokarev-nv · 2025-08-20T22:57:02Z

/ok to test 4e51c49

…alling requirements.txt

dmitry-tokarev-nv · 2025-08-21T04:21:54Z

/ok to test 77ace97

…cudnn lib

dmitry-tokarev-nv · 2025-08-21T05:53:24Z

/ok to test d3ae1d4

dmitry-tokarev-nv · 2025-08-26T22:23:29Z

/ok to test 625a490

tanmayv25 · 2025-08-27T00:02:08Z

@dmitry-tokarev-nv Why we can not use uv for running on B200?

container/Dockerfile.trtllm

Co-authored-by: Misha Chornyi <[email protected]> Signed-off-by: Dmitry Tokarev <[email protected]>

dmitry-tokarev-nv · 2025-08-27T01:59:19Z

@dmitry-tokarev-nv Why we can not use uv for running on B200?

@tanmayv25 it didn't work with torch 2.8.0a0+5228986c39.nv25.5 which is coming from pytorch container and uv didn't recognize it. 2.8.0a0+5228986c39.nv25.5 is needed to work on B200.

dmitry-tokarev-nv · 2025-08-27T02:07:59Z

/ok to test 4f2dfef

Comments are non-critical.

…5.5 (#2579) Signed-off-by: Dmitry Tokarev <[email protected]> Co-authored-by: Misha Chornyi <[email protected]>

…5.5 (#2579) Signed-off-by: Dmitry Tokarev <[email protected]> Co-authored-by: Misha Chornyi <[email protected]> Signed-off-by: Jason Zhou <[email protected]>

…5.5 (#2579) Signed-off-by: Dmitry Tokarev <[email protected]> Co-authored-by: Misha Chornyi <[email protected]> Signed-off-by: Krishnan Prashanth <[email protected]>

container/Dockerfile.trtllm - install pip libs on system level

4e51c49

dmitry-tokarev-nv requested review from rmccorm4, tanmayv25, richardhuo-nv, ptarasiewiczNV, ishandhanani, alec-flowers, nnshah1 and a team as code owners August 20, 2025 22:45

pull-request-size bot added the size/S label Aug 20, 2025

github-actions bot added the fix label Aug 20, 2025

dmitry-tokarev-nv mentioned this pull request Aug 20, 2025

docs: Add pinned install for cuda-python in pip install path (#2553) #2569

Merged

coderabbitai bot reviewed Aug 20, 2025

View reviewed changes

container/Dockerfile.trtllm Outdated Show resolved Hide resolved

copy-pr-bot bot temporarily deployed to GITLAB August 20, 2025 22:57 Inactive

container/Dockerfile.trtllm - uninstall triton and pytorch after inst…

77ace97

…alling requirements.txt

copy-pr-bot bot temporarily deployed to GITLAB August 21, 2025 04:22 Inactive

remove cache + don't copy packages that will be overritten + install …

d3ae1d4

…cudnn lib

pull-request-size bot added size/M and removed size/S labels Aug 21, 2025

copy-pr-bot bot temporarily deployed to GITLAB August 21, 2025 05:53 Inactive

copy-pr-bot bot temporarily deployed to GITLAB August 21, 2025 05:56 Inactive

container/Dockerfile.trtllm - runtime stage - do not use uv

f5fccc1

dmitry-tokarev-nv changed the title ~~fix: container/Dockerfile.trtllm - install pip libs on system level~~ fix: container/Dockerfile.trtllm - use pytorch 2.8.0a0+5228986c39.nv25.5 Aug 21, 2025

Merge branch 'main' into dtokarev-fix-trtllm-runtime-libs

625a490

copy-pr-bot bot temporarily deployed to GITLAB August 26, 2025 22:23 Inactive

copy-pr-bot bot temporarily deployed to GITLAB August 26, 2025 22:24 Inactive

mc-nv reviewed Aug 27, 2025

View reviewed changes

dmitry-tokarev-nv and others added 5 commits August 26, 2025 21:50

Update container/Dockerfile.trtllm

bd4e510

Co-authored-by: Misha Chornyi <[email protected]> Signed-off-by: Dmitry Tokarev <[email protected]>

Update container/Dockerfile.trtllm

547e878

Co-authored-by: Misha Chornyi <[email protected]> Signed-off-by: Dmitry Tokarev <[email protected]>

Update container/Dockerfile.trtllm

0a286b9

Co-authored-by: Misha Chornyi <[email protected]> Signed-off-by: Dmitry Tokarev <[email protected]>

Update container/Dockerfile.trtllm

9691c41

Co-authored-by: Misha Chornyi <[email protected]> Signed-off-by: Dmitry Tokarev <[email protected]>

cleanup

b994cc7

remove venv from PATH in the runtime stage

dd863aa

dmitry-tokarev-nv requested review from nv-kmcgill53 and mc-nv August 27, 2025 02:07

Merge branch 'main' into dtokarev-fix-trtllm-runtime-libs

4f2dfef

copy-pr-bot bot temporarily deployed to GITLAB August 27, 2025 02:08 Inactive

dmitry-tokarev-nv enabled auto-merge (squash) August 27, 2025 02:08

copy-pr-bot bot temporarily deployed to GITLAB August 27, 2025 02:08 Inactive

nv-nmailhot approved these changes Aug 27, 2025

View reviewed changes

dmitry-tokarev-nv merged commit 73f60fe into main Aug 27, 2025
10 of 11 checks passed

dmitry-tokarev-nv deleted the dtokarev-fix-trtllm-runtime-libs branch August 27, 2025 02:37

dmitry-tokarev-nv added a commit that referenced this pull request Aug 27, 2025

fix: container/Dockerfile.trtllm - use pytorch 2.8.0a0+5228986c39.nv2…

491c3e2

…5.5 (#2579) Signed-off-by: Dmitry Tokarev <[email protected]> Co-authored-by: Misha Chornyi <[email protected]>

dmitry-tokarev-nv mentioned this pull request Aug 27, 2025

fix: container/Dockerfile.trtllm - use pytorch 2.8.0a0+5228986c39.nv2… #2742

Merged

krishung5 pushed a commit that referenced this pull request Aug 27, 2025

fix: container/Dockerfile.trtllm - use pytorch 2.8.0a0+5228986c39.nv2…

f1ebe78

…5.5 (#2579) Signed-off-by: Dmitry Tokarev <[email protected]> Co-authored-by: Misha Chornyi <[email protected]>

coderabbitai bot mentioned this pull request Sep 4, 2025

fix: NIXL wheel missing from system packages in dev env #2879

Merged

fix: container/Dockerfile.trtllm - use pytorch 2.8.0a0+5228986c39.nv25.5 #2579

fix: container/Dockerfile.trtllm - use pytorch 2.8.0a0+5228986c39.nv25.5 #2579

Uh oh!

Conversation

dmitry-tokarev-nv commented Aug 20, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Aug 20, 2025

Uh oh!

coderabbitai bot commented Aug 20, 2025

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dmitry-tokarev-nv commented Aug 20, 2025

Uh oh!

dmitry-tokarev-nv commented Aug 21, 2025

Uh oh!

dmitry-tokarev-nv commented Aug 21, 2025

Uh oh!

dmitry-tokarev-nv commented Aug 26, 2025

Uh oh!

tanmayv25 commented Aug 27, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dmitry-tokarev-nv commented Aug 27, 2025

Uh oh!

dmitry-tokarev-nv commented Aug 27, 2025

Uh oh!

Uh oh!

Uh oh!

dmitry-tokarev-nv commented Aug 20, 2025 •

edited by coderabbitai bot

Loading