feat: sglang + gb200 #2223

ishandhanani · 2025-07-31T19:17:05Z

Test matrix

H100 wide ep
gb200 wide ep

Summary by CodeRabbit

New Features
- Added comprehensive documentation for running DeepSeek-R1 with SGLang's GB200 implementation, including step-by-step deployment and configuration guides.
- Introduced new scripts (gb200.sh, h100.sh) for streamlined job launching on GB200 and H100 GPUs, supporting both Dynamo and SGLang commands.
- Enhanced SLURM job setup with new options to select GPU type and toggle SGLang command usage.
Improvements
- Refined SLURM job script template for improved IP discovery, GPU type selection, and unified worker command construction.
- Extended worker setup scripts to support SGLang server mode, GPU-specific scripts, and improved environment variable management.
- Updated Dockerfile for greater build flexibility, improved dependency management, and conditional builds for different architectures.
- Improved documentation with clearer instructions, usage examples, and formatting enhancements.
Bug Fixes
- Corrected step numbering and minor formatting in existing documentation for better clarity.
Chores
- Increased default SLURM job time limit from 1 to 4 hours.

copy-pr-bot · 2025-07-31T19:17:08Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

…kerfile whitespace

coderabbitai · 2025-07-31T19:21:18Z

Walkthrough

This change introduces new documentation and scripts for deploying DeepSeek-R1 with SGLang WideEP on GB200 and H100 hardware, refactors SLURM job orchestration to support both SGLang and Dynamo commands, and parameterizes the Dockerfile for flexible builds. It adds GPU-specific shell scripts, updates worker setup logic, and enhances SLURM templates and documentation.

Changes

Cohort / File(s)	Change Summary
WideEP GB200 Deployment Documentation `components/backends/sglang/docs/dsr1-wideep-gb200.md`	Adds a comprehensive guide for deploying DeepSeek-R1 with SGLang WideEP on GB200 hardware, including Docker build, environment setup, and worker launch instructions.
WideEP H100 Deployment Documentation `components/backends/sglang/docs/dsr1-wideep-h100.md`	Updates documentation for H100 deployment: corrects ingress command, adjusts step numbering, and improves formatting in the benchmarking section.
SLURM Job Orchestration: Documentation & Template `components/backends/sglang/slurm_jobs/README.md`, `components/backends/sglang/slurm_jobs/job_script_template.j2`	Enhances the SLURM job README with usage notes, new arguments, and log viewing tips. Updates the job script template to add partitioning, GPU type, SGLang command support, improved IP detection, and consolidated worker launch logic.
SLURM Job Orchestration: Scripts `components/backends/sglang/slurm_jobs/scripts/gb200.sh`, `components/backends/sglang/slurm_jobs/scripts/h100.sh`	Adds new shell scripts for launching prefill/decode workers on GB200 and H100, supporting both Dynamo and SGLang modes with extensive environment validation and parameterization.
SLURM Job Orchestration: Worker Setup `components/backends/sglang/slurm_jobs/scripts/worker_setup.py`	Refactors worker setup: adds SGLang mode, GPU type selection, environment setup functions, and replaces direct Python invocations with GPU-specific shell script calls.
SLURM Job Orchestration: Submission Tool `components/backends/sglang/slurm_jobs/submit_job_script.py`	Adds arguments for GPU type and SGLang command usage, increases default job time, and passes new variables to the job script template.
WideEP Dockerfile `container/Dockerfile.sglang-wideep`	Refactors Dockerfile: introduces build arguments, conditional UCX/NIXL build, improved dependency management, and explicit environment setup for flexible, reproducible builds.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant SLURM Controller
    participant Worker Setup Script
    participant GPU Shell Script (h100.sh/gb200.sh)
    participant SGLang/Dynamo Worker

    User->>SLURM Controller: Submit job with --gpu-type and --use-sglang-commands
    SLURM Controller->>Worker Setup Script: Launch worker_setup.py with args
    Worker Setup Script->>GPU Shell Script: Set env vars, call h100.sh/gb200.sh with mode/command
    GPU Shell Script->>SGLang/Dynamo Worker: Launch Python server with parameters
    SGLang/Dynamo Worker-->>User: Serve model requests

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related PRs

chore: update sgl version and fix h100 wideep example #2169: Updates the same Dockerfile.sglang-wideep, focusing on version and checkout simplification, which overlaps with the Dockerfile refactor and parameterization in this PR.
chore(dynamo-run): Remove out=sglang|vllm|trtllm #1920: Removes subprocess-based SGLang worker launching; both PRs address SGLang worker process management and deployment.
feat: automate slurm handling in sglang example. #1730: Introduces SLURM job automation for SGLang, related to this PR’s enhancements and refactoring of SLURM job orchestration and worker setup.

Poem

In racks where GPUs hum and spin,
New scripts and docs now usher in—
WideEP on Hopper, H100 too,
With SGLang or Dynamo, jobs to queue.
Docker builds flex with every flag,
Rabbits cheer, their tails a-wag!
🐇✨

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 10

🧹 Nitpick comments (5)

components/backends/sglang/docs/dsr1-wideep-h100.md (1)
134-174: Minor style suggestion for conciseness.

The formatting improvements in the benchmarking section enhance readability. However, consider replacing "in order to" with "to" on Line 171 for more concise writing:
-We use ShareGPT in order to leverage the pre-existing EPLB distributions
+We use ShareGPT to leverage the pre-existing EPLB distributions
components/backends/sglang/slurm_jobs/scripts/gb200.sh (1)
69-69: Consider refactoring to reduce code duplication.

The TODO comment correctly identifies an opportunity for improvement. The dynamo and sglang commands share many identical parameters - consider extracting common arguments into variables.

Example approach:
# Common arguments
COMMON_ARGS="--served-model-name deepseek-ai/DeepSeek-R1 \
            --model-path /model/ \
            --trust-remote-code \
            --dist-init-addr $HOST_IP:$PORT \
            --nnodes $TOTAL_NODES \
            --node-rank $RANK"
components/backends/sglang/docs/dsr1-wideep-gb200.md (1)
172-172: Provide explicit examples for other decode nodes.

The instruction to "change --node-rank to 1, 2, 3..." could be clearer with explicit command examples or a script snippet.

Consider adding a shell loop example or explicit commands for at least the first few nodes to make it clearer for users:
# For node 1:
# ... (same command with --node-rank 1)
# For node 2:
# ... (same command with --node-rank 2)
# etc.
container/Dockerfile.sglang-wideep (1)

36-66: Consider extracting the UCX/NIXL build into a separate script.

The conditional build block for UCX and NIXL is quite large and complex. Consider moving this logic to a separate shell script for better maintainability.

This would make the Dockerfile cleaner and the build logic easier to test and modify independently.
components/backends/sglang/slurm_jobs/scripts/worker_setup.py (1)
246-276: Add error checking for background process startup.

Multiple critical processes are started in the background without verifying they're running successfully. While the learnings indicate this is intentional for debugging, consider adding initial health checks.

After starting each background process, consider adding a brief sleep and checking if the process is still running:
time.sleep(2)  # Give process time to start
if nats_process.poll() is not None:
    logging.error(f"nats-server exited with code: {nats_process.returncode}")
    # Continue anyway per the design decision

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bae25dc and 9e0badd.

📒 Files selected for processing (9)

components/backends/sglang/docs/dsr1-wideep-gb200.md (1 hunks)
components/backends/sglang/docs/dsr1-wideep-h100.md (6 hunks)
components/backends/sglang/slurm_jobs/README.md (3 hunks)
components/backends/sglang/slurm_jobs/job_script_template.j2 (4 hunks)
components/backends/sglang/slurm_jobs/scripts/gb200.sh (1 hunks)
components/backends/sglang/slurm_jobs/scripts/h100.sh (1 hunks)
components/backends/sglang/slurm_jobs/scripts/worker_setup.py (7 hunks)
components/backends/sglang/slurm_jobs/submit_job_script.py (3 hunks)
container/Dockerfile.sglang-wideep (1 hunks)

🧰 Additional context used

🧠 Learnings (9)

📓 Common learnings

Learnt from: fsaady
PR: ai-dynamo/dynamo#1730
File: examples/sglang/slurm_jobs/job_script_template.j2:59-59
Timestamp: 2025-07-02T13:20:28.800Z
Learning: In the SLURM job script template at examples/sglang/slurm_jobs/job_script_template.j2, the `--total_nodes` parameter represents the total nodes per worker type (prefill or decode), not the total nodes in the entire cluster. Each worker type needs to know its own group size for distributed coordination.

Learnt from: fsaady
PR: ai-dynamo/dynamo#1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:113-116
Timestamp: 2025-07-03T09:44:41.470Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, logging the full configuration file content is acceptable because the config file is public, contains only placeholder replacements (no sensitive data), and provides debugging benefits for users who may want to create configurations based on the logged output.

Learnt from: fsaady
PR: ai-dynamo/dynamo#1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.

📚 Learning: in examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd)...

Learnt from: fsaady
PR: ai-dynamo/dynamo#1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.

Applied to files:

components/backends/sglang/docs/dsr1-wideep-h100.md
components/backends/sglang/slurm_jobs/scripts/h100.sh
components/backends/sglang/slurm_jobs/README.md
components/backends/sglang/slurm_jobs/scripts/gb200.sh
components/backends/sglang/slurm_jobs/job_script_template.j2
container/Dockerfile.sglang-wideep
components/backends/sglang/slurm_jobs/scripts/worker_setup.py

📚 Learning: in the slurm job script template at examples/sglang/slurm_jobs/job_script_template.j2, the `--total_...

Learnt from: fsaady
PR: ai-dynamo/dynamo#1730
File: examples/sglang/slurm_jobs/job_script_template.j2:59-59
Timestamp: 2025-07-02T13:20:28.800Z
Learning: In the SLURM job script template at examples/sglang/slurm_jobs/job_script_template.j2, the `--total_nodes` parameter represents the total nodes per worker type (prefill or decode), not the total nodes in the entire cluster. Each worker type needs to know its own group size for distributed coordination.

Applied to files:

components/backends/sglang/docs/dsr1-wideep-h100.md
components/backends/sglang/slurm_jobs/scripts/h100.sh
components/backends/sglang/slurm_jobs/README.md
components/backends/sglang/slurm_jobs/submit_job_script.py
components/backends/sglang/slurm_jobs/scripts/gb200.sh
components/backends/sglang/docs/dsr1-wideep-gb200.md
components/backends/sglang/slurm_jobs/job_script_template.j2
components/backends/sglang/slurm_jobs/scripts/worker_setup.py

📚 Learning: the sglang `async_encode` method does not support streaming options, so collecting all embeddings be...

Learnt from: t-ob
PR: ai-dynamo/dynamo#1290
File: launch/dynamo-run/src/subprocess/sglang_inc.py:80-110
Timestamp: 2025-06-03T10:17:51.711Z
Learning: The sglang `async_encode` method does not support streaming options, so collecting all embeddings before yielding is the correct approach for embedding requests.

Applied to files:

components/backends/sglang/docs/dsr1-wideep-h100.md

📚 Learning: in fault tolerance test configurations, the `resources` section under `serviceargs` specifies resour...

Learnt from: nnshah1
PR: ai-dynamo/dynamo#1444
File: tests/fault_tolerance/configs/agg_tp_1_dp_8.yaml:31-38
Timestamp: 2025-07-01T15:33:53.262Z
Learning: In fault tolerance test configurations, the `resources` section under `ServiceArgs` specifies resources per individual worker, not total resources for all workers. So `workers: 8` with `gpu: '1'` means 8 workers × 1 GPU each = 8 GPUs total.

Applied to files:

components/backends/sglang/docs/dsr1-wideep-h100.md
components/backends/sglang/slurm_jobs/README.md
components/backends/sglang/slurm_jobs/scripts/worker_setup.py

📚 Learning: in examples/sglang/slurm_jobs/scripts/worker_setup.py, logging the full configuration file content i...

Learnt from: fsaady
PR: ai-dynamo/dynamo#1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:113-116
Timestamp: 2025-07-03T09:44:41.470Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, logging the full configuration file content is acceptable because the config file is public, contains only placeholder replacements (no sensitive data), and provides debugging benefits for users who may want to create configurations based on the logged output.

Applied to files:

components/backends/sglang/slurm_jobs/scripts/h100.sh
components/backends/sglang/slurm_jobs/README.md
components/backends/sglang/slurm_jobs/job_script_template.j2
components/backends/sglang/slurm_jobs/scripts/worker_setup.py

📚 Learning: in multi-node setups with head/worker architecture, the head node typically doesn't need environment...

Learnt from: GuanLuo
PR: ai-dynamo/dynamo#1371
File: examples/llm/benchmarks/vllm_multinode_setup.sh:18-25
Timestamp: 2025-06-05T01:46:15.509Z
Learning: In multi-node setups with head/worker architecture, the head node typically doesn't need environment variables pointing to its own services (like NATS_SERVER, ETCD_ENDPOINTS) because local processes can access them via localhost. Only worker nodes need these environment variables to connect to the head node's external IP address.

Applied to files:

components/backends/sglang/slurm_jobs/job_script_template.j2
components/backends/sglang/slurm_jobs/scripts/worker_setup.py

📚 Learning: in dockerfiles, when appending to environment variables that may not exist in the base image, docker...

Learnt from: grahamking
PR: ai-dynamo/dynamo#1177
File: container/Dockerfile.vllm:102-105
Timestamp: 2025-05-28T22:54:46.875Z
Learning: In Dockerfiles, when appending to environment variables that may not exist in the base image, Docker validation will fail if you reference undefined variables with ${VARIABLE} syntax. In such cases, setting the environment variable directly (e.g., ENV CPATH=/usr/include) rather than appending is the appropriate approach.

Applied to files:

container/Dockerfile.sglang-wideep

📚 Learning: in components/backends/sglang/deploy/agg_router.yaml, the clear_namespace command is intentionally d...

Learnt from: biswapanda
PR: ai-dynamo/dynamo#2137
File: components/backends/sglang/deploy/agg_router.yaml:0-0
Timestamp: 2025-07-28T17:00:07.968Z
Learning: In components/backends/sglang/deploy/agg_router.yaml, the clear_namespace command is intentionally designed to block the router from starting if it fails (using &&). This is a deliberate design decision where namespace clearing is a critical prerequisite and the router should not start with an uncleared namespace.

Applied to files:

container/Dockerfile.sglang-wideep

🧬 Code Graph Analysis (2)

components/backends/sglang/slurm_jobs/scripts/h100.sh (1)

components/backends/sglang/slurm_jobs/scripts/gb200.sh (1)

print_usage (6-15)

components/backends/sglang/slurm_jobs/scripts/gb200.sh (1)

components/backends/sglang/slurm_jobs/scripts/h100.sh (1)

print_usage (6-15)

🪛 LanguageTool

components/backends/sglang/docs/dsr1-wideep-h100.md

[style] ~171-~171: Consider a more concise word here.
Context: ...ill and decode workers. We use ShareGPT in order to leverage the pre-existing EPLB distribu...

(IN_ORDER_TO_PREMIUM)

🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/2223/merge) by ishandhanani.

components/backends/sglang/slurm_jobs/scripts/h100.sh

[error] 1-1: Script has a shebang but is not marked executable. Run 'chmod +x components/backends/sglang/slurm_jobs/scripts/h100.sh' to fix.

components/backends/sglang/slurm_jobs/scripts/gb200.sh

[error] 1-1: Script has a shebang but is not marked executable. Run 'chmod +x components/backends/sglang/slurm_jobs/scripts/gb200.sh' to fix.

container/Dockerfile.sglang-wideep

[error] 15-24: Trailing whitespace detected and fixed by pre-commit hook 'trailing-whitespace'. Please ensure no trailing whitespace remains.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: pre-merge-rust (.)
GitHub Check: pre-merge-rust (lib/runtime/examples)
GitHub Check: Build and Test - vllm
GitHub Check: pre-merge-rust (lib/bindings/python)

🔇 Additional comments (20)

components/backends/sglang/docs/dsr1-wideep-h100.md (2)

60-60: LGTM on the updated ingress command.

The change from dynamo run in=http out=dyn & to python3 -m dynamo.frontend --http-port=8000 & aligns with the modular approach introduced in this PR, where GPU-specific shell scripts now handle command execution.

96-96: Good fix on the step numbering.

Correcting the step number from 7 to 6 maintains proper sequential numbering in the instructions.

components/backends/sglang/slurm_jobs/README.md (3)

95-96: LGTM on the new optional arguments documentation.

The documentation clearly describes the new --gpu-type and --use-sglang-commands flags, which align with the broader refactoring to support multiple GPU architectures and command modes.

100-122: Excellent examples demonstrating the new functionality.

The examples effectively show how to use the new GPU type selection and command mode options, providing clear guidance for both H100 and GB200 deployments.

136-144: Good enhancement to log monitoring instructions.

The addition of commands to tail logs for all prefill or decode workers simultaneously is a valuable operational improvement that aligns with the consolidated job launch approach.

components/backends/sglang/slurm_jobs/submit_job_script.py (3)

89-89: Time limit increase is reasonable for complex deployments.

Increasing the default time limit from 1 hour to 4 hours makes sense for large-scale distributed deployments that may require more setup and initialization time.

103-111: Well-designed new command-line arguments.

The new arguments provide clear choices and sensible defaults:

--gpu-type with explicit choices and H100 as default

--use-sglang-commands as a boolean flag with clear intent

132-133: Proper integration of new arguments into template variables.

The new arguments are correctly added to the template variables dictionary, ensuring they're available for use in the Jinja2 template.

components/backends/sglang/slurm_jobs/scripts/gb200.sh (6)

5-37: LGTM on argument validation logic.

The usage function and argument validation are well-implemented, providing clear error messages and maintaining consistency with the h100.sh script structure.

43-67: Thorough environment variable validation.

The script properly validates all required environment variables with clear error messages, which will help with debugging deployment issues.

77-122: GB200-specific configuration looks appropriate.

The prefill dynamo command uses GB200-optimized parameters including specific memory fractions, chunked prefill sizes, and attention backends suitable for the hardware.

129-172: Consistent sglang prefill configuration.

The sglang prefill command maintains parameter consistency with the dynamo version while using the appropriate sglang module entry point.

178-222: Decode worker configuration is well-tuned.

The decode worker parameters show appropriate tuning for GB200 hardware with higher max-running-requests and adjusted memory fractions.

228-270: Consistent decode sglang configuration.

The sglang decode command maintains consistency with the dynamo version while properly excluding the --served-model-name parameter that's not needed for decode workers.

components/backends/sglang/slurm_jobs/scripts/h100.sh (5)

5-67: LGTM on validation logic consistency.

The usage function, argument validation, and environment variable validation are identical to gb200.sh, providing consistent user experience across GPU types.

72-100: H100-specific prefill configuration looks appropriate.

The H100 dynamo prefill command uses parameters optimized for H100 hardware, including the deepep-config file reference and appropriate memory fraction settings.

103-129: Consistent sglang prefill implementation.

The sglang prefill command properly mirrors the dynamo configuration while using the sglang.launch_server entry point and adding the required --host parameter.

134-159: Well-configured H100 decode worker.

The decode worker parameters are appropriate for H100 hardware with suitable memory fractions and CUDA graph batch size settings.

162-185: Consistent decode sglang configuration.

The sglang decode command maintains consistency with the dynamo version and properly excludes --served-model-name for decode workers.

components/backends/sglang/slurm_jobs/job_script_template.j2 (1)

42-42: Verify the IP address retrieval logic works across different network configurations.

The new approach using ip route get combined with getent ahosts is more robust than ifconfig, but the command is complex and may fail in certain network configurations.

Consider adding error handling or fallback mechanisms if the primary method fails. Also, ensure this works correctly in environments where nodes have multiple network interfaces or complex routing rules.

Also applies to: 49-49

components/backends/sglang/docs/dsr1-wideep-gb200.md

components/backends/sglang/slurm_jobs/job_script_template.j2

components/backends/sglang/slurm_jobs/scripts/gb200.sh

components/backends/sglang/slurm_jobs/scripts/h100.sh

components/backends/sglang/slurm_jobs/scripts/worker_setup.py

container/Dockerfile.sglang-wideep

… template

…wide expert parallelism setup

…ate Config in trtllm_utils.py; increment nixl-sys version to 0.4.1 in Cargo.lock and Dockerfiles; remove outdated multimodal README.md

components/backends/sglang/README.md

components/backends/sglang/docs/dsr1-wideep-gb200.md

components/backends/sglang/docs/dsr1-wideep-h100.md

Co-authored-by: Ryan McCormick <[email protected]> Signed-off-by: ishandhanani <[email protected]>

container/Dockerfile.sglang-wideep

…han/gb200-sgl

ishandhanani added 2 commits July 31, 2025 12:14

docs(sglang): add GB200 deployment guide for DeepSeek-R1 with WideEP

f404ab0

docs(slurm_jobs): add support for GB200 GPUs and SGLang commands

9e0badd

ishandhanani requested review from rmccorm4, tanmayv25, richardhuo-nv, ptarasiewiczNV, alec-flowers, nnshah1, a team and piotrm-nvidia as code owners July 31, 2025 19:17

ishandhanani requested review from ryanolson, grahamking, paulhendricks, biswapanda, tmonty12, GuanLuo, tedzhouhk, kkranen, jthomson04 and PeaBrane as code owners July 31, 2025 19:17

pull-request-size bot added the size/XXL label Jul 31, 2025

github-actions bot added the feat label Jul 31, 2025

chore(scripts): make sglang slurm scripts executable and clean up Doc…

1801f8d

…kerfile whitespace

coderabbitai bot reviewed Jul 31, 2025

View reviewed changes

ishandhanani added 3 commits July 31, 2025 19:28

feat(slurm_jobs): add partition argument to job script submission and…

e91ba89

… template

docs(sglang): update build instructions and Dockerfile image tag for …

acd33ab

…wide expert parallelism setup

feat(trtllm): update BuildConfig and add GPU handling in main.py; upd…

d2faf73

…ate Config in trtllm_utils.py; increment nixl-sys version to 0.4.1 in Cargo.lock and Dockerfiles; remove outdated multimodal README.md

PeaBrane approved these changes Aug 1, 2025

View reviewed changes

ishandhanani added 2 commits August 1, 2025 00:26

go

f199a64

docs(dsr1-wideep-gb200): update build command syntax for clarity

21669c8

ishandhanani enabled auto-merge (squash) August 1, 2025 00:31

fix(docs): update cuda-graph-bs parameter in dsr1-wideep-h100.md

a599079