Skip to content

Conversation

grahamking
Copy link
Contributor

@grahamking grahamking commented Jul 22, 2025

This works now:

  • cd $PROJECT_ROOT
  • uv pip install .
  • python -m dynamo.sglang.worker --model-path ~/llms/Qwen3-0.6B --skip-tokenizer-init

Also python -m dynamo.sglang.decode_worker

Changes:

  • Top level folder move examples/sglang -> components/backends/sglang
  • Internal move examples/sglang/components -> components/backends/sglang/src/dynamo/sglang/worker and decode_worker.
  • Internal move examples/sglang/utils -> components/backends/sglang/src/dynamo/sglang/utils
  • Create components/backends/sglang/src/dynamo/sglang/worker/__main__.py which is what makes python -m dynamo.sglang.worker work.
  • Edit pyproject.toml to install this new module.
  • Add components/backends/sglang/requirements.txt
  • Update docs and launch scripts. There is probably more to do here.

Summary by CodeRabbit

  • Documentation

    • Updated documentation and README files to clarify descriptions, correct directory paths, and improve formatting.
    • Removed redundant license headers from several documentation and code files.
  • Chores

    • Adjusted script and module paths in launch scripts to reflect new directory structure and improve clarity.
    • Added a requirements file specifying necessary dependencies for the SGLang backend component.
    • Updated build configuration to include the SGLang backend component in package builds.
  • New Features

    • Introduced two main entry points for the SGLang backend to enable optimized event loop usage with uvloop.

This works now:
- `cd $PROJECT_ROOT`
- `uv pip install .`
- `python -m dynamo.sglang --model-path ~/llms/Qwen3-0.6B --skip-tokenizer-init`

Changes:
- Top level folder move `examples/sglang` -> `components/backends/sglang`
- Internal move `examples/sglang/components` -> `components/backends/sglang/src/dynamo/sglang/components`
- Internal move `examples/sglang/utils` -> `components/backends/sglang/src/dynamo/sglang/utils`
- Create `components/backends/sglang/src/dynamo/sglang/__main__.py`
  which is what makes `python -m dynamo.sglang` work.
- Edit `pyproject.toml` to install this new module.
- Add `components/backends/sglang/requirements.txt`
- Update docs and launch scripts. There is probably more to do here.
Copy link
Contributor

coderabbitai bot commented Jul 22, 2025

Walkthrough

This update refactors the SGLang backend component for Dynamo by revising documentation, updating script and module invocation paths, and aligning directory references with the current project structure. It introduces a new requirements file and an asynchronous main entry script, and removes redundant license headers across several files.

Changes

Files/Paths Change Summary
components/backends/sglang/README.md, docs/*.md Updated directory references, clarified instructions, removed redundant license headers, and improved formatting.
components/backends/sglang/launch/*.sh Updated script/module invocation paths to reflect new structure and use module-based Python invocation.
components/backends/sglang/requirements.txt Added new requirements file specifying dependencies for SGLang backend.
components/backends/sglang/src/dynamo/sglang/main.py Introduced new async entry script for worker using uvloop and asyncio.
components/backends/sglang/src/dynamo/sglang/components/decode_worker.py, worker.py Changed import paths to match new directory structure; removed license header from decode_worker.py.
pyproject.toml Added SGLang backend source directory to wheel build package paths.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Shell Script
    participant Python Module (dynamo.sglang)
    participant Worker

    User->>Shell Script: Run launch script (agg.sh, etc.)
    Shell Script->>Python Module (dynamo.sglang): python3 -m dynamo.sglang
    Python Module (dynamo.sglang)->>Worker: asyncio.run(worker())
    Worker-->>Python Module (dynamo.sglang): Worker runs asynchronously
Loading

Estimated code review effort

2 (~15 minutes)

Possibly related PRs

Poem

In SGLang’s warren, paths realigned,
Scripts now hop where modules are defined.
License leaves, requirements hop in,
Async worker starts with a uvloop spin.
A tidy burrow, code flows anew—
With every launch, a rabbit’s joy grew!
🐇✨


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🔭 Outside diff range comments (2)
components/backends/sglang/src/dynamo/sglang/components/decode_worker.py (1)

28-31: msgspec.json.decode expects bytes, not str.

request is annotated as str, but msgspec.json.decode() only accepts a bytes-like object.
If an upstream endpoint delivers a Python str, this will raise TypeError: Expected bytes, got str`` at runtime.

-    req = msgspec.json.decode(request, type=dict)
+    if isinstance(request, str):
+        request = request.encode()
+    req = msgspec.json.decode(request, type=dict)

Alternatively, change the endpoint contract so it always passes bytes.

components/backends/sglang/launch/agg_router.sh (1)

22-29: Trailing backslash after last CLI flag

Line 28 ends with a backslash before the blank line. Some shells treat this as a continued line and will merge with the following empty line, yielding a syntax error.
Drop the trailing \:

-  --skip-tokenizer-init \
+  --skip-tokenizer-init
♻️ Duplicate comments (2)
components/backends/sglang/launch/agg_router.sh (1)

15-16: Same CWD-sensitive hard-coded path issue as in agg.sh

Replicate the defensive path resolution here to avoid brittle executions.

components/backends/sglang/launch/disagg_dp_attn.sh (1)

15-16: CWD-dependent utils path

Same concern as other launch scripts – resolve path dynamically.

🧹 Nitpick comments (8)
components/backends/sglang/src/dynamo/sglang/components/worker.py (1)

111-133: Spelling & silent-loop hazard in metrics loop.

  1. Typo: "recieve""receive" in the log message.
  2. The broad except Exception: swallows all errors and immediately continues, potentially masking persistent failure conditions and spinning the loop.
-            except Exception:
-                logging.exception("Failed to recieve or publish metrics")
+            except Exception:
+                logging.exception("Failed to receive or publish metrics; exiting metrics loop")
+                break

Consider breaking or backing-off after repeated failures.

pyproject.toml (1)

82-82: Manual package list is brittle.

Hard-coding every sub-package path requires edits for each new backend. Prefer dynamic discovery:

[tool.hatch.build.targets.wheel]
# include everything under components/*/src/
packages = ["deploy/sdk/src/dynamo", "components/**/src/dynamo"]

This reduces maintenance overhead as backends are added or relocated.

components/backends/sglang/src/dynamo/sglang/__main__.py (1)

8-12: Runtime injection assumption—double-check dynamo_worker decorator.

worker() is invoked with no arguments. This relies on @dynamo_worker returning a callable that internally creates a DistributedRuntime. If that invariant ever changes, the entry-point will break.
Recommend adding a brief comment for future maintainers:

# `worker` is decorated with @dynamo_worker and therefore needs no explicit runtime.
components/backends/sglang/launch/agg.sh (1)

22-29: Module invocation relies on editable/installed package – document the pre-req

Switching from python3 components/worker.py to python3 -m dynamo.sglang is great, but only succeeds if users have already run uv pip install . (or pip install -e .).
Add a short comment above this block or in README so the happy-path doesn’t fail with ModuleNotFoundError.

components/backends/sglang/docs/multinode-examples.md (2)

10-13: Typo: “demonstrate and example” → “demonstrate an example”

Minor wording fix for professionalism.


14-15: Hard-coded relative path – use repo-root-independent snippet

Readers may copy/paste the command from outside the repo root; prefix with ${REPO_ROOT} or clarify that the command must be run from project root to prevent confusion.

components/backends/sglang/launch/disagg_dp_attn.sh (1)

36-37: Decode worker path hard-coded; consider module form for symmetry

The decode worker is invoked via file path while pre-fill uses the module. Expose a dynamo.sglang.decode_worker entry-point to avoid mixing styles and to reuse the uvloop setup.

components/backends/sglang/launch/disagg.sh (1)

22-30: Nit: expose model parameters via variables for easier overrides

Hard-coding model path, page size, TP, etc. forces users to edit the script.
Exporting them to variables (or reading from env) improves re-usability:

MODEL_PATH=${MODEL_PATH:-deepseek-ai/DeepSeek-R1-Distill-Llama-8B}
TP=${TP:-1}

python3 -m dynamo.sglang \
  --model-path "${MODEL_PATH}" \
  --served-model-name "${MODEL_PATH}" \
  --page-size 16 \
  --tp "${TP}" \
  --trust-remote-code \
  --skip-tokenizer-init \
  --disaggregation-mode prefill \
  --disaggregation-transfer-backend nixl &
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 20c5daf and 7720ad5.

📒 Files selected for processing (13)
  • components/backends/sglang/README.md (5 hunks)
  • components/backends/sglang/docs/dsr1-wideep-h100.md (1 hunks)
  • components/backends/sglang/docs/multinode-examples.md (3 hunks)
  • components/backends/sglang/docs/sgl-http-server.md (0 hunks)
  • components/backends/sglang/launch/agg.sh (1 hunks)
  • components/backends/sglang/launch/agg_router.sh (1 hunks)
  • components/backends/sglang/launch/disagg.sh (2 hunks)
  • components/backends/sglang/launch/disagg_dp_attn.sh (3 hunks)
  • components/backends/sglang/requirements.txt (1 hunks)
  • components/backends/sglang/src/dynamo/sglang/__main__.py (1 hunks)
  • components/backends/sglang/src/dynamo/sglang/components/decode_worker.py (1 hunks)
  • components/backends/sglang/src/dynamo/sglang/components/worker.py (1 hunks)
  • pyproject.toml (1 hunks)
🧬 Code Graph Analysis (3)
components/backends/sglang/src/dynamo/sglang/components/worker.py (2)
components/backends/sglang/src/dynamo/sglang/utils/protocol.py (1)
  • DisaggPreprocessedRequest (58-64)
components/backends/sglang/src/dynamo/sglang/utils/sgl_utils.py (1)
  • parse_sglang_args_inc (24-33)
components/backends/sglang/src/dynamo/sglang/__main__.py (1)
components/backends/sglang/src/dynamo/sglang/components/worker.py (1)
  • worker (301-315)
components/backends/sglang/src/dynamo/sglang/components/decode_worker.py (1)
components/backends/sglang/src/dynamo/sglang/utils/sgl_utils.py (1)
  • parse_sglang_args_inc (24-33)
💤 Files with no reviewable changes (1)
  • components/backends/sglang/docs/sgl-http-server.md
🧰 Additional context used
🧬 Code Graph Analysis (3)
components/backends/sglang/src/dynamo/sglang/components/worker.py (2)
components/backends/sglang/src/dynamo/sglang/utils/protocol.py (1)
  • DisaggPreprocessedRequest (58-64)
components/backends/sglang/src/dynamo/sglang/utils/sgl_utils.py (1)
  • parse_sglang_args_inc (24-33)
components/backends/sglang/src/dynamo/sglang/__main__.py (1)
components/backends/sglang/src/dynamo/sglang/components/worker.py (1)
  • worker (301-315)
components/backends/sglang/src/dynamo/sglang/components/decode_worker.py (1)
components/backends/sglang/src/dynamo/sglang/utils/sgl_utils.py (1)
  • parse_sglang_args_inc (24-33)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Build and Test - vllm
  • GitHub Check: pre-merge-rust (lib/runtime/examples)
  • GitHub Check: pre-merge-rust (lib/bindings/python)
  • GitHub Check: pre-merge-rust (.)
🔇 Additional comments (5)
components/backends/sglang/src/dynamo/sglang/components/decode_worker.py (2)

18-18: Import path update looks correct.

The new absolute package path under dynamo.sglang aligns with the repo re-org and should resolve cleanly once the package is installed.


47-51: Yielding a single dict violates Dynamo streaming convention.

flush_cache currently yields one dict and returns; Dynamo endpoints normally finish without yielding when returning a single response, or should mark the stream’s end explicitly. Verify that the client correctly handles this single-item stream.

components/backends/sglang/src/dynamo/sglang/components/worker.py (1)

30-32: Import path adjustments LGTM.

Moving the utilities under dynamo.sglang.utils keeps the namespace consistent with the new build layout.

components/backends/sglang/docs/dsr1-wideep-h100.md (1)

50-50: Path update acknowledged.

The documentation now reflects the new directory structure.

components/backends/sglang/docs/multinode-examples.md (1)

24-35: Docs still reference removed flags in new entry-point

Double-check that dynamo.sglang.__main__ accepts all the shown flags (--page-size, --dp-size, etc.). If some flags migrated or were renamed, update the docs to avoid runtime failure.

Copy link
Contributor

coderabbitai bot commented Jul 22, 2025

Walkthrough

This update reorganizes and standardizes the SGLang backend component by correcting file paths, updating script invocation to use Python modules, and adding a new entry point. Documentation and launch scripts are revised to reflect the new structure, and a requirements file is added. No functional code changes are introduced.

Changes

File(s) Change Summary
components/backends/sglang/README.md Updated documentation for directory structure, paths, and descriptions.
components/backends/sglang/docs/dsr1-wideep-h100.md
components/backends/sglang/docs/sgl-http-server.md
Removed license headers; updated directory paths.
components/backends/sglang/docs/multinode-examples.md Corrected script/module invocation and updated helper script paths.
components/backends/sglang/launch/agg.sh
components/backends/sglang/launch/agg_router.sh
Updated script paths and switched worker invocation to Python module execution.
components/backends/sglang/launch/disagg.sh
components/backends/sglang/launch/disagg_dp_attn.sh
Updated utility and worker script paths to match new structure; switched to module-based invocation.
components/backends/sglang/requirements.txt Added requirements file specifying sglang[all] and uvloop dependencies.
components/backends/sglang/src/dynamo/sglang/main.py Added new main entry point script using uvloop and running the async worker.
components/backends/sglang/src/dynamo/sglang/components/decode_worker.py Removed license header; updated import path for utility function.
components/backends/sglang/src/dynamo/sglang/components/worker.py Updated import paths for utility functions.
pyproject.toml Added SGLang backend source directory to the wheel build package list.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Shell/Script
    participant Python Interpreter
    participant dynamo.sglang.__main__
    participant Worker

    User->>Shell/Script: Run launch script (e.g., agg.sh)
    Shell/Script->>Python Interpreter: python3 -m dynamo.sglang
    Python Interpreter->>dynamo.sglang.__main__: Import and execute
    dynamo.sglang.__main__->>Worker: asyncio.run(worker())
    Worker-->>dynamo.sglang.__main__: Worker coroutine runs with uvloop
Loading

Estimated code review effort

2 (~15 minutes)

Possibly related PRs

Poem

Hopping through the code with glee,
Paths aligned for all to see.
Scripts now modules, neat and bright,
Requirements listed, structure right.
With uvloop’s speed, we run anew—
A backend fresh, for all the crew!
🐇✨


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (3)
components/backends/sglang/requirements.txt (1)

1-5: Duplicated dependency list – ensure a single source of truth

sglang and uvloop are captured here but not in pyproject.toml. Having two independent lists risks drift.
Prefer declaring runtime dependencies only in pyproject.toml and generating this file (or eliminating it) to avoid mismatches.

components/backends/sglang/launch/agg.sh (2)

14-16: Same relative-path fragility as in agg_router.sh

Replicate the SCRIPT_DIR fix suggested there to make the helper script callable from any location.


22-29: PID management identical to agg_router.sh

Apply the same foreground/background decision and cleanup enhancement here for consistency.

🧹 Nitpick comments (9)
components/backends/sglang/docs/dsr1-wideep-h100.md (1)

50-51: Update CLI examples to use the new module entry-point

The docs now point readers to /components/backends/sglang, but the invocation examples still use python3 components/worker.py.
With __main__.py added, the simpler (and location-agnostic) form works:

python -m dynamo.sglang --disaggregation-mode prefill ...

Updating the snippets keeps the docs in sync with the code.

components/backends/sglang/launch/agg_router.sh (1)

22-29: Background the worker or extend cleanup handling

The worker now runs in the foreground (python3 -m dynamo.sglang \ with no trailing &).
If the intent is to keep it in the foreground, the current trap only cleans up the ingress process ($DYNAMO_PID).
Either:

  1. background the worker and track its PID, or
  2. leave it in the foreground and drop dead-code cleanup for it (less robust).

Option 1 (recommended):

-python3 -m dynamo.sglang \
+python3 -m dynamo.sglang \
   … \
-  --skip-tokenizer-init \
+  --skip-tokenizer-init \
+  &                               # background
+WORKER_PID=$!

and in cleanup() append kill $WORKER_PID / wait $WORKER_PID.

This prevents a stray worker when the script terminates abnormally.

components/backends/sglang/launch/disagg_dp_attn.sh (1)

22-32: Inconsistent entry point between prefill and decode workers

Prefill worker uses the new module entry point (python3 -m dynamo.sglang) while decode worker still runs the file directly:

CUDA_VISIBLE_DEVICES=2,3 python3 src/dynamo/sglang/components/decode_worker.py …

Consider exposing decode_worker via dynamo.sglang.__main__ (e.g., --mode decode) or a dedicated -m dynamo.sglang.decode_worker to keep invocation symmetric and avoid importing private paths from tooling scripts.

components/backends/sglang/docs/multinode-examples.md (2)

14-15: Documented helper-script path relies on repo layout knowledge

./components/backends/sglang/src/dynamo/sglang/utils/gen_env_vars.sh is long and brittle.
Readers typically run commands from the repo root; consider recommending:

bash components/backends/sglang/src/dynamo/sglang/utils/gen_env_vars.sh

or prefix with $(git rev-parse --show-toplevel) for clarity.


24-37: Explain the switch to module execution

You introduce python3 -m dynamo.sglang without noting the rationale.
A short sentence such as dynamo.sglang is now installable and exposes the worker via its __main__ entry point” will prevent confusion for users scanning the docs.

components/backends/sglang/launch/disagg.sh (2)

22-30: Prefer module invocation for all workers for consistency

The pre-fill worker is now launched via python3 -m dynamo.sglang, but the decode worker below is still invoked by path.
For symmetry and easier packaging you could expose dynamo.sglang.components.decode_worker (or a dedicated dynamo.sglang.decode) as a module entry-point and start it with:

CUDA_VISIBLE_DEVICES=1 python3 -m dynamo.sglang.decode_worker ...

This keeps the launch script uniform and avoids hard-coding the file layout.


34-42: Hard-coded GPU index may cause accidental oversubscription

CUDA_VISIBLE_DEVICES=1 assumes GPU 0 is already occupied by the prefill worker.
In multi-GPU or containerised environments this may not hold and could overrun resources.

Consider parameterising the decode GPU via an environment variable:

: "${DECODE_GPU:=1}"
CUDA_VISIBLE_DEVICES=$DECODE_GPU python3 ...

This makes the script safer to reuse across nodes with different GPU topologies.

components/backends/sglang/README.md (2)

98-102: Nit: clarify placeholder-metrics TODO wording

Minor phrasing: “placeholder engine metrics to keep the Dynamo KV-router happy” rather than “placeholder engine metrics to keep the Dynamo KV-router happy”. Improves readability while users await the referenced PRs.


128-130: Optional: collapse repeated cd instructions

Multiple example blocks repeat the cd $DYNAMO_ROOT/components/backends/sglang step. Adding a short “Prerequisite” note once at the top (or exporting an environment variable) would reduce duplication and the chance of future path drift.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 20c5daf and 7720ad5.

📒 Files selected for processing (13)
  • components/backends/sglang/README.md (5 hunks)
  • components/backends/sglang/docs/dsr1-wideep-h100.md (1 hunks)
  • components/backends/sglang/docs/multinode-examples.md (3 hunks)
  • components/backends/sglang/docs/sgl-http-server.md (0 hunks)
  • components/backends/sglang/launch/agg.sh (1 hunks)
  • components/backends/sglang/launch/agg_router.sh (1 hunks)
  • components/backends/sglang/launch/disagg.sh (2 hunks)
  • components/backends/sglang/launch/disagg_dp_attn.sh (3 hunks)
  • components/backends/sglang/requirements.txt (1 hunks)
  • components/backends/sglang/src/dynamo/sglang/__main__.py (1 hunks)
  • components/backends/sglang/src/dynamo/sglang/components/decode_worker.py (1 hunks)
  • components/backends/sglang/src/dynamo/sglang/components/worker.py (1 hunks)
  • pyproject.toml (1 hunks)
🧬 Code Graph Analysis (2)
components/backends/sglang/src/dynamo/sglang/components/decode_worker.py (1)
components/backends/sglang/src/dynamo/sglang/utils/sgl_utils.py (1)
  • parse_sglang_args_inc (24-33)
components/backends/sglang/src/dynamo/sglang/components/worker.py (2)
components/backends/sglang/src/dynamo/sglang/utils/protocol.py (1)
  • DisaggPreprocessedRequest (58-64)
components/backends/sglang/src/dynamo/sglang/utils/sgl_utils.py (1)
  • parse_sglang_args_inc (24-33)
💤 Files with no reviewable changes (1)
  • components/backends/sglang/docs/sgl-http-server.md
🧰 Additional context used
🧬 Code Graph Analysis (2)
components/backends/sglang/src/dynamo/sglang/components/decode_worker.py (1)
components/backends/sglang/src/dynamo/sglang/utils/sgl_utils.py (1)
  • parse_sglang_args_inc (24-33)
components/backends/sglang/src/dynamo/sglang/components/worker.py (2)
components/backends/sglang/src/dynamo/sglang/utils/protocol.py (1)
  • DisaggPreprocessedRequest (58-64)
components/backends/sglang/src/dynamo/sglang/utils/sgl_utils.py (1)
  • parse_sglang_args_inc (24-33)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Build and Test - vllm
  • GitHub Check: pre-merge-rust (lib/runtime/examples)
  • GitHub Check: pre-merge-rust (lib/bindings/python)
  • GitHub Check: pre-merge-rust (.)
🔇 Additional comments (3)
components/backends/sglang/src/dynamo/sglang/components/worker.py (1)

30-31: Import-path switch looks correct

The move to the fully-qualified path (dynamo.sglang.utils.*) matches the new package layout and will work once the directory is included in the wheel (see comment in pyproject.toml).
No further action required.

components/backends/sglang/src/dynamo/sglang/components/decode_worker.py (1)

18-19: Import-path update is consistent

Same observation as in worker.py; the renamed module path is valid after the restructure.

components/backends/sglang/docs/multinode-examples.md (1)

43-57: Update remaining decode-worker commands for consistency

Later in the document (Node 3 / Node 4) the commands still reference python3 components/decode_worker.py.
With the new layout these paths are stale and will 404. Align them with the updated src/dynamo/sglang/components/decode_worker.py or, better, the future module entry point once added.

Copy link
Contributor

coderabbitai bot commented Jul 22, 2025

Walkthrough

This update reorganizes the SGLang backend by correcting directory paths, updating module import statements, and switching script invocations to use Python module syntax. Documentation and launch scripts are revised to reflect the new structure. A new requirements file and a main module entry point are introduced, and obsolete license headers are removed.

Changes

File(s) Change Summary
components/backends/sglang/README.md, docs/dsr1-wideep-h100.md, docs/multinode-examples.md, Updated documentation: fixed directory paths, clarified instructions, removed license boilerplate.
docs/sgl-http-server.md
components/backends/sglang/launch/agg.sh, launch/agg_router.sh, launch/disagg.sh, Updated script invocations to use new module paths and Python -m syntax; fixed script references.
launch/disagg_dp_attn.sh
components/backends/sglang/requirements.txt Added new requirements file specifying sglang[all]>=0.4.9.post2 and uvloop.
components/backends/sglang/src/dynamo/sglang/main.py Added new main entry point script to launch worker with uvloop event loop.
components/backends/sglang/src/dynamo/sglang/components/decode_worker.py, components/worker.py Updated import paths for utility modules; removed license header from decode_worker.py.
pyproject.toml Included new package directory for wheel build.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Shell Script
    participant Python (dynamo.sglang)
    participant Worker
    participant uvloop

    User->>Shell Script: Run launch script (e.g., agg.sh)
    Shell Script->>Python (dynamo.sglang): python3 -m dynamo.sglang
    Python (dynamo.sglang)->>uvloop: Install uvloop as event loop
    Python (dynamo.sglang)->>Worker: Run worker() coroutine
    Worker-->>Python (dynamo.sglang): Start backend process
Loading

Estimated code review effort

2 (~15 minutes)

Possibly related PRs

Poem

In burrows deep, I nudge each path anew,
Scripts now leap with Python’s -m in view.
Docs are tidied, imports hop in line,
uvloop spins the worker—how divine!
With every change, this backend’s spry—
A happy rabbit coding by. 🐇✨


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🔭 Outside diff range comments (3)
components/backends/sglang/src/dynamo/sglang/components/decode_worker.py (1)

54-57: runtime.shutdown() is invoked inside an async func but not awaited.

If DistributedRuntime.shutdown() ever becomes async the current call will silently ignore the coroutine and the process may exit before cleanup finishes.

-async def graceful_shutdown(runtime):
+async def graceful_shutdown(runtime):
     logging.info("Received shutdown signal, shutting down DistributedRuntime")
-    runtime.shutdown()
+    maybe_coro = runtime.shutdown()
+    if asyncio.iscoroutine(maybe_coro):
+        await maybe_coro
     logging.info("DistributedRuntime shutdown complete")
components/backends/sglang/launch/disagg.sh (1)

22-31: Prefill worker OK – but missing trap cleanup

PREFILL_PID is killed in trap, good. Decode worker PID is not tracked; if the script exits early, it will stay running.

Track and kill it:

PREFILL_PID=$!
...
-CUDA_VISIBLE_DEVICES=1 python3 src/dynamo/sglang/components/decode_worker.py \
+CUDA_VISIBLE_DEVICES=1 python3 -m dynamo.sglang.components.decode_worker \
 ...
- --disaggregation-transfer-backend nixl
+ --disaggregation-transfer-backend nixl &
+DECODE_PID=$!

and in the trap:

kill $DYNAMO_PID $PREFILL_PID $DECODE_PID 2>/dev/null || true
wait $DYNAMO_PID $PREFILL_PID $DECODE_PID 2>/dev/null || true
components/backends/sglang/launch/disagg_dp_attn.sh (1)

36-47: Decode worker isn’t tracked or cleaned up

CUDA_VISIBLE_DEVICES=2,3 … & starts a background job but its PID is never recorded, and therefore not killed by the cleanup trap. On exit the process will linger.

-CUDA_VISIBLE_DEVICES=2,3 python3 src/dynamo/sglang/components/decode_worker.py \
+CUDA_VISIBLE_DEVICES=2,3 python3 -m dynamo.sglang.components.decode_worker \
@@
-  --port 31000
+  --port 31000 &
+DECODE_PID=$!

and extend the trap:

-kill $DYNAMO_PID $PREFILL_PID 2>/dev/null || true
-wait $DYNAMO_PID $PREFILL_PID 2>/dev/null || true
+kill $DYNAMO_PID $PREFILL_PID $DECODE_PID 2>/dev/null || true
+wait $DYNAMO_PID $PREFILL_PID $DECODE_PID 2>/dev/null || true
♻️ Duplicate comments (2)
components/backends/sglang/launch/agg.sh (1)

15-15: Same relative-path issue as agg_router.sh

Please apply the fix described in agg_router.sh (use -m dynamo.sglang.utils.clear_namespace).

components/backends/sglang/launch/disagg.sh (1)

15-15: Relative path wrong

Same fix as earlier scripts (-m dynamo.sglang.utils.clear_namespace).

🧹 Nitpick comments (7)
components/backends/sglang/requirements.txt (1)

4-5: Add platform marker / version pin for uvloop.

uvloop only ships wheels for Linux/macOS and often introduces ABI-breaking changes between minor releases.
Consider constraining and guarding the dependency, e.g.

-sglang[all]>=0.4.9.post2
-uvloop
+sglang[all]>=0.4.9.post2
+uvloop>=0.19  ; sys_platform != "win32"

This keeps Windows installs working (they will skip the extra) and prevents accidental upgrades that break the event loop.

components/backends/sglang/docs/dsr1-wideep-h100.md (1)

50-50: Path updated – check remaining docs for the old location.

Good catch here. A quick grep still shows a few occurrences of examples/sglang in other markdown files. Making them consistent will avoid confusing newcomers.

components/backends/sglang/src/dynamo/sglang/components/decode_worker.py (1)

98-99: Redundant uvloop.install() – safe but unnecessary.

__main__.py already installs uvloop. Re-installing here is idempotent, yet a single central call keeps startup logs cleaner.

If you keep it, add a comment explaining why double-install is intentional.

components/backends/sglang/launch/agg_router.sh (1)

22-29: Consistency: invoke decode/prefill workers the same way

You switched from components/worker.py to python3 -m dynamo.sglang, which is great.
Make sure future scripts (e.g. decode_worker) follow the same pattern, otherwise users have to remember two different call styles.
No action needed in this file, just flagging for cross-script consistency.

components/backends/sglang/launch/agg.sh (1)

22-28: Environment dependency on installed package

Running python3 -m dynamo.sglang requires that dynamo is import-resolvable.
For local, editable checkouts, prepend the project root to PYTHONPATH or switch to:

PYTHONPATH="$(git rev-parse --show-toplevel)/components/backends/sglang/src:$PYTHONPATH" \
python3 -m dynamo.sglang ...

Document this in README to prevent “module not found” surprises.

components/backends/sglang/launch/disagg_dp_attn.sh (1)

5-13: Harden the bash script

Add the usual safety flags to stop on errors and unset vars:

-#!/bin/bash
+#!/usr/bin/env bash
+set -euo pipefail
components/backends/sglang/README.md (1)

98-103: Minor nit – wording

components/backends/sglang/src/dynamo/sglang/components/worker.py no longer exists; the worker lives in .../prefill_worker.py and decode_worker.py. Consider updating the note to avoid confusion.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 20c5daf and 7720ad5.

📒 Files selected for processing (13)
  • components/backends/sglang/README.md (5 hunks)
  • components/backends/sglang/docs/dsr1-wideep-h100.md (1 hunks)
  • components/backends/sglang/docs/multinode-examples.md (3 hunks)
  • components/backends/sglang/docs/sgl-http-server.md (0 hunks)
  • components/backends/sglang/launch/agg.sh (1 hunks)
  • components/backends/sglang/launch/agg_router.sh (1 hunks)
  • components/backends/sglang/launch/disagg.sh (2 hunks)
  • components/backends/sglang/launch/disagg_dp_attn.sh (3 hunks)
  • components/backends/sglang/requirements.txt (1 hunks)
  • components/backends/sglang/src/dynamo/sglang/__main__.py (1 hunks)
  • components/backends/sglang/src/dynamo/sglang/components/decode_worker.py (1 hunks)
  • components/backends/sglang/src/dynamo/sglang/components/worker.py (1 hunks)
  • pyproject.toml (1 hunks)
🧠 Learnings (10)
pyproject.toml (1)

Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.

components/backends/sglang/docs/dsr1-wideep-h100.md (2)

Learnt from: GuanLuo
PR: #1371
File: examples/llm/benchmarks/vllm_multinode_setup.sh:18-25
Timestamp: 2025-06-05T01:46:15.509Z
Learning: In multi-node setups with head/worker architecture, the head node typically doesn't need environment variables pointing to its own services (like NATS_SERVER, ETCD_ENDPOINTS) because local processes can access them via localhost. Only worker nodes need these environment variables to connect to the head node's external IP address.

Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.

components/backends/sglang/README.md (4)

Learnt from: PeaBrane
PR: #1409
File: examples/router_standalone/worker.py:171-186
Timestamp: 2025-06-08T08:30:45.126Z
Learning: Example code in the examples/ directory may intentionally use hard-coded values or simplified implementations that wouldn't be appropriate for production code, but are acceptable for demonstration and testing purposes.

Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.

Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/job_script_template.j2:59-59
Timestamp: 2025-07-02T13:20:28.800Z
Learning: In the SLURM job script template at examples/sglang/slurm_jobs/job_script_template.j2, the --total_nodes parameter represents the total nodes per worker type (prefill or decode), not the total nodes in the entire cluster. Each worker type needs to know its own group size for distributed coordination.

Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/configs/agg_tp_1_dp_8.yaml:31-38
Timestamp: 2025-07-01T15:33:53.262Z
Learning: In fault tolerance test configurations, the resources section under ServiceArgs specifies resources per individual worker, not total resources for all workers. So workers: 8 with gpu: '1' means 8 workers × 1 GPU each = 8 GPUs total.

components/backends/sglang/launch/disagg_dp_attn.sh (1)

Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.

components/backends/sglang/src/dynamo/sglang/components/worker.py (1)

Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/utils/metrics.py:30-32
Timestamp: 2025-07-01T13:55:03.940Z
Learning: The @dynamo_worker() decorator in the dynamo codebase returns a wrapper that automatically injects the runtime parameter before calling the wrapped function. This means callers only need to provide the non-runtime parameters, while the decorator handles injecting the runtime argument automatically. For example, a function with signature async def get_metrics(runtime, log_dir) decorated with @dynamo_worker() can be called as get_metrics(log_dir) because the decorator wrapper injects the runtime parameter.

components/backends/sglang/src/dynamo/sglang/components/decode_worker.py (3)

Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.

Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/utils/metrics.py:30-32
Timestamp: 2025-07-01T13:55:03.940Z
Learning: The @dynamo_worker() decorator in the dynamo codebase returns a wrapper that automatically injects the runtime parameter before calling the wrapped function. This means callers only need to provide the non-runtime parameters, while the decorator handles injecting the runtime argument automatically. For example, a function with signature async def get_metrics(runtime, log_dir) decorated with @dynamo_worker() can be called as get_metrics(log_dir) because the decorator wrapper injects the runtime parameter.

Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.

components/backends/sglang/launch/agg_router.sh (1)

Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.

components/backends/sglang/docs/multinode-examples.md (3)

Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/job_script_template.j2:59-59
Timestamp: 2025-07-02T13:20:28.800Z
Learning: In the SLURM job script template at examples/sglang/slurm_jobs/job_script_template.j2, the --total_nodes parameter represents the total nodes per worker type (prefill or decode), not the total nodes in the entire cluster. Each worker type needs to know its own group size for distributed coordination.

Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.

Learnt from: GuanLuo
PR: #1371
File: examples/llm/benchmarks/vllm_multinode_setup.sh:18-25
Timestamp: 2025-06-05T01:46:15.509Z
Learning: In multi-node setups with head/worker architecture, the head node typically doesn't need environment variables pointing to its own services (like NATS_SERVER, ETCD_ENDPOINTS) because local processes can access them via localhost. Only worker nodes need these environment variables to connect to the head node's external IP address.

components/backends/sglang/launch/disagg.sh (1)

Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.

components/backends/sglang/launch/agg.sh (1)

Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.

🧬 Code Graph Analysis (2)
components/backends/sglang/src/dynamo/sglang/components/worker.py (2)
components/backends/sglang/src/dynamo/sglang/utils/protocol.py (1)
  • DisaggPreprocessedRequest (58-64)
components/backends/sglang/src/dynamo/sglang/utils/sgl_utils.py (1)
  • parse_sglang_args_inc (24-33)
components/backends/sglang/src/dynamo/sglang/components/decode_worker.py (1)
components/backends/sglang/src/dynamo/sglang/utils/sgl_utils.py (1)
  • parse_sglang_args_inc (24-33)
💤 Files with no reviewable changes (1)
  • components/backends/sglang/docs/sgl-http-server.md
🧰 Additional context used
🧠 Learnings (10)
pyproject.toml (1)

Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.

components/backends/sglang/docs/dsr1-wideep-h100.md (2)

Learnt from: GuanLuo
PR: #1371
File: examples/llm/benchmarks/vllm_multinode_setup.sh:18-25
Timestamp: 2025-06-05T01:46:15.509Z
Learning: In multi-node setups with head/worker architecture, the head node typically doesn't need environment variables pointing to its own services (like NATS_SERVER, ETCD_ENDPOINTS) because local processes can access them via localhost. Only worker nodes need these environment variables to connect to the head node's external IP address.

Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.

components/backends/sglang/README.md (4)

Learnt from: PeaBrane
PR: #1409
File: examples/router_standalone/worker.py:171-186
Timestamp: 2025-06-08T08:30:45.126Z
Learning: Example code in the examples/ directory may intentionally use hard-coded values or simplified implementations that wouldn't be appropriate for production code, but are acceptable for demonstration and testing purposes.

Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.

Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/job_script_template.j2:59-59
Timestamp: 2025-07-02T13:20:28.800Z
Learning: In the SLURM job script template at examples/sglang/slurm_jobs/job_script_template.j2, the --total_nodes parameter represents the total nodes per worker type (prefill or decode), not the total nodes in the entire cluster. Each worker type needs to know its own group size for distributed coordination.

Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/configs/agg_tp_1_dp_8.yaml:31-38
Timestamp: 2025-07-01T15:33:53.262Z
Learning: In fault tolerance test configurations, the resources section under ServiceArgs specifies resources per individual worker, not total resources for all workers. So workers: 8 with gpu: '1' means 8 workers × 1 GPU each = 8 GPUs total.

components/backends/sglang/launch/disagg_dp_attn.sh (1)

Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.

components/backends/sglang/src/dynamo/sglang/components/worker.py (1)

Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/utils/metrics.py:30-32
Timestamp: 2025-07-01T13:55:03.940Z
Learning: The @dynamo_worker() decorator in the dynamo codebase returns a wrapper that automatically injects the runtime parameter before calling the wrapped function. This means callers only need to provide the non-runtime parameters, while the decorator handles injecting the runtime argument automatically. For example, a function with signature async def get_metrics(runtime, log_dir) decorated with @dynamo_worker() can be called as get_metrics(log_dir) because the decorator wrapper injects the runtime parameter.

components/backends/sglang/src/dynamo/sglang/components/decode_worker.py (3)

Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.

Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/utils/metrics.py:30-32
Timestamp: 2025-07-01T13:55:03.940Z
Learning: The @dynamo_worker() decorator in the dynamo codebase returns a wrapper that automatically injects the runtime parameter before calling the wrapped function. This means callers only need to provide the non-runtime parameters, while the decorator handles injecting the runtime argument automatically. For example, a function with signature async def get_metrics(runtime, log_dir) decorated with @dynamo_worker() can be called as get_metrics(log_dir) because the decorator wrapper injects the runtime parameter.

Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.

components/backends/sglang/launch/agg_router.sh (1)

Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.

components/backends/sglang/docs/multinode-examples.md (3)

Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/job_script_template.j2:59-59
Timestamp: 2025-07-02T13:20:28.800Z
Learning: In the SLURM job script template at examples/sglang/slurm_jobs/job_script_template.j2, the --total_nodes parameter represents the total nodes per worker type (prefill or decode), not the total nodes in the entire cluster. Each worker type needs to know its own group size for distributed coordination.

Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.

Learnt from: GuanLuo
PR: #1371
File: examples/llm/benchmarks/vllm_multinode_setup.sh:18-25
Timestamp: 2025-06-05T01:46:15.509Z
Learning: In multi-node setups with head/worker architecture, the head node typically doesn't need environment variables pointing to its own services (like NATS_SERVER, ETCD_ENDPOINTS) because local processes can access them via localhost. Only worker nodes need these environment variables to connect to the head node's external IP address.

components/backends/sglang/launch/disagg.sh (1)

Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.

components/backends/sglang/launch/agg.sh (1)

Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.

🧬 Code Graph Analysis (2)
components/backends/sglang/src/dynamo/sglang/components/worker.py (2)
components/backends/sglang/src/dynamo/sglang/utils/protocol.py (1)
  • DisaggPreprocessedRequest (58-64)
components/backends/sglang/src/dynamo/sglang/utils/sgl_utils.py (1)
  • parse_sglang_args_inc (24-33)
components/backends/sglang/src/dynamo/sglang/components/decode_worker.py (1)
components/backends/sglang/src/dynamo/sglang/utils/sgl_utils.py (1)
  • parse_sglang_args_inc (24-33)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Build and Test - vllm
  • GitHub Check: pre-merge-rust (lib/runtime/examples)
  • GitHub Check: pre-merge-rust (lib/bindings/python)
  • GitHub Check: pre-merge-rust (.)
🔇 Additional comments (7)
pyproject.toml (1)

82-82: Wheel target grows again – run a sanity build.

Adding yet another src/dynamo path is fine (namespace packages), but the list is getting long and easy to mistype.
Please run a local hatch build -t wheel to confirm no path typo sneaks through and that the new package ends up in the wheel as expected.

components/backends/sglang/src/dynamo/sglang/components/worker.py (2)

30-32: Import move LGTM.

The switch to absolute imports under dynamo.sglang.utils improves clarity and avoids the “script vs module” pitfalls seen earlier.


314-315: CLI parsing relies on raw sys.argv – validate downstream.

parse_sglang_args_inc() consumes the global argv list. When this worker is started through python -m dynamo.sglang, __main__.py already peeled off some arguments. Just make sure no mandatory flag is lost; otherwise consider forwarding sys.argv[1:] from __main__ instead of re-reading it here.

components/backends/sglang/src/dynamo/sglang/__main__.py (1)

10-12: Missing CLI forwarding – arguments passed after -m dynamo.sglang are ignored

asyncio.run(worker()) discards sys.argv[1:].
Worker currently expects to parse CLI flags itself (as done in the old components/worker.py).
If that API changed, fine; if not, forward the arguments:

-import asyncio
+import asyncio, sys
...
-    asyncio.run(worker())
+    asyncio.run(worker(sys.argv[1:]))

Ensure the signature of worker() supports this.

components/backends/sglang/docs/multinode-examples.md (1)

24-39: Worker invocation updated, but decode workers still use old path further below

Sections for Node 3 & 4 still reference python3 components/decode_worker.py.
Update them to the new module invocation for consistency and to avoid path issues.

Example:

-python3 components/decode_worker.py \
+python3 -m dynamo.sglang.components.decode_worker \

Also applies to: 43-58

components/backends/sglang/launch/disagg_dp_attn.sh (1)

22-33: Good move to python -m dynamo.sglang – LGTM

components/backends/sglang/README.md (1)

170-170: Confirm SLURM docs link

slurm_jobs/README.md path looks right after the move – no action required.

Copy link
Contributor

@ishandhanani ishandhanani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much Big GK

@grahamking grahamking merged commit d65ce1b into main Jul 22, 2025
13 of 14 checks passed
@grahamking grahamking deleted the gk-sglang-component branch July 22, 2025 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants