-
Notifications
You must be signed in to change notification settings - Fork 618
chore(sglang): Move examples/sglang to components/backends/sglang #2046
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This works now: - `cd $PROJECT_ROOT` - `uv pip install .` - `python -m dynamo.sglang --model-path ~/llms/Qwen3-0.6B --skip-tokenizer-init` Changes: - Top level folder move `examples/sglang` -> `components/backends/sglang` - Internal move `examples/sglang/components` -> `components/backends/sglang/src/dynamo/sglang/components` - Internal move `examples/sglang/utils` -> `components/backends/sglang/src/dynamo/sglang/utils` - Create `components/backends/sglang/src/dynamo/sglang/__main__.py` which is what makes `python -m dynamo.sglang` work. - Edit `pyproject.toml` to install this new module. - Add `components/backends/sglang/requirements.txt` - Update docs and launch scripts. There is probably more to do here.
WalkthroughThis update refactors the SGLang backend component for Dynamo by revising documentation, updating script and module invocation paths, and aligning directory references with the current project structure. It introduces a new requirements file and an asynchronous main entry script, and removes redundant license headers across several files. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Shell Script
participant Python Module (dynamo.sglang)
participant Worker
User->>Shell Script: Run launch script (agg.sh, etc.)
Shell Script->>Python Module (dynamo.sglang): python3 -m dynamo.sglang
Python Module (dynamo.sglang)->>Worker: asyncio.run(worker())
Worker-->>Python Module (dynamo.sglang): Worker runs asynchronously
Estimated code review effort2 (~15 minutes) Possibly related PRs
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🔭 Outside diff range comments (2)
components/backends/sglang/src/dynamo/sglang/components/decode_worker.py (1)
28-31
:msgspec.json.decode
expectsbytes
, notstr
.
request
is annotated asstr
, butmsgspec.json.decode()
only accepts a bytes-like object.
If an upstream endpoint delivers a Pythonstr
, this will raiseTypeError: Expected
bytes, got
str`` at runtime.- req = msgspec.json.decode(request, type=dict) + if isinstance(request, str): + request = request.encode() + req = msgspec.json.decode(request, type=dict)Alternatively, change the endpoint contract so it always passes
bytes
.components/backends/sglang/launch/agg_router.sh (1)
22-29
: Trailing backslash after last CLI flagLine 28 ends with a backslash before the blank line. Some shells treat this as a continued line and will merge with the following empty line, yielding a syntax error.
Drop the trailing\
:- --skip-tokenizer-init \ + --skip-tokenizer-init
♻️ Duplicate comments (2)
components/backends/sglang/launch/agg_router.sh (1)
15-16
: Same CWD-sensitive hard-coded path issue as inagg.sh
Replicate the defensive path resolution here to avoid brittle executions.
components/backends/sglang/launch/disagg_dp_attn.sh (1)
15-16
: CWD-dependent utils pathSame concern as other launch scripts – resolve path dynamically.
🧹 Nitpick comments (8)
components/backends/sglang/src/dynamo/sglang/components/worker.py (1)
111-133
: Spelling & silent-loop hazard in metrics loop.
- Typo:
"recieve"
→"receive"
in the log message.- The broad
except Exception:
swallows all errors and immediately continues, potentially masking persistent failure conditions and spinning the loop.- except Exception: - logging.exception("Failed to recieve or publish metrics") + except Exception: + logging.exception("Failed to receive or publish metrics; exiting metrics loop") + breakConsider breaking or backing-off after repeated failures.
pyproject.toml (1)
82-82
: Manual package list is brittle.Hard-coding every sub-package path requires edits for each new backend. Prefer dynamic discovery:
[tool.hatch.build.targets.wheel] # include everything under components/*/src/ packages = ["deploy/sdk/src/dynamo", "components/**/src/dynamo"]This reduces maintenance overhead as backends are added or relocated.
components/backends/sglang/src/dynamo/sglang/__main__.py (1)
8-12
: Runtime injection assumption—double-checkdynamo_worker
decorator.
worker()
is invoked with no arguments. This relies on@dynamo_worker
returning a callable that internally creates aDistributedRuntime
. If that invariant ever changes, the entry-point will break.
Recommend adding a brief comment for future maintainers:# `worker` is decorated with @dynamo_worker and therefore needs no explicit runtime.
components/backends/sglang/launch/agg.sh (1)
22-29
: Module invocation relies on editable/installed package – document the pre-reqSwitching from
python3 components/worker.py
topython3 -m dynamo.sglang
is great, but only succeeds if users have already runuv pip install .
(orpip install -e .
).
Add a short comment above this block or in README so the happy-path doesn’t fail withModuleNotFoundError
.components/backends/sglang/docs/multinode-examples.md (2)
10-13
: Typo: “demonstrate and example” → “demonstrate an example”Minor wording fix for professionalism.
14-15
: Hard-coded relative path – use repo-root-independent snippetReaders may copy/paste the command from outside the repo root; prefix with
${REPO_ROOT}
or clarify that the command must be run from project root to prevent confusion.components/backends/sglang/launch/disagg_dp_attn.sh (1)
36-37
: Decode worker path hard-coded; consider module form for symmetryThe decode worker is invoked via file path while pre-fill uses the module. Expose a
dynamo.sglang.decode_worker
entry-point to avoid mixing styles and to reuse theuvloop
setup.components/backends/sglang/launch/disagg.sh (1)
22-30
: Nit: expose model parameters via variables for easier overridesHard-coding model path, page size, TP, etc. forces users to edit the script.
Exporting them to variables (or reading from env) improves re-usability:MODEL_PATH=${MODEL_PATH:-deepseek-ai/DeepSeek-R1-Distill-Llama-8B} TP=${TP:-1} python3 -m dynamo.sglang \ --model-path "${MODEL_PATH}" \ --served-model-name "${MODEL_PATH}" \ --page-size 16 \ --tp "${TP}" \ --trust-remote-code \ --skip-tokenizer-init \ --disaggregation-mode prefill \ --disaggregation-transfer-backend nixl &
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
components/backends/sglang/README.md
(5 hunks)components/backends/sglang/docs/dsr1-wideep-h100.md
(1 hunks)components/backends/sglang/docs/multinode-examples.md
(3 hunks)components/backends/sglang/docs/sgl-http-server.md
(0 hunks)components/backends/sglang/launch/agg.sh
(1 hunks)components/backends/sglang/launch/agg_router.sh
(1 hunks)components/backends/sglang/launch/disagg.sh
(2 hunks)components/backends/sglang/launch/disagg_dp_attn.sh
(3 hunks)components/backends/sglang/requirements.txt
(1 hunks)components/backends/sglang/src/dynamo/sglang/__main__.py
(1 hunks)components/backends/sglang/src/dynamo/sglang/components/decode_worker.py
(1 hunks)components/backends/sglang/src/dynamo/sglang/components/worker.py
(1 hunks)pyproject.toml
(1 hunks)
🧬 Code Graph Analysis (3)
components/backends/sglang/src/dynamo/sglang/components/worker.py (2)
components/backends/sglang/src/dynamo/sglang/utils/protocol.py (1)
DisaggPreprocessedRequest
(58-64)components/backends/sglang/src/dynamo/sglang/utils/sgl_utils.py (1)
parse_sglang_args_inc
(24-33)
components/backends/sglang/src/dynamo/sglang/__main__.py (1)
components/backends/sglang/src/dynamo/sglang/components/worker.py (1)
worker
(301-315)
components/backends/sglang/src/dynamo/sglang/components/decode_worker.py (1)
components/backends/sglang/src/dynamo/sglang/utils/sgl_utils.py (1)
parse_sglang_args_inc
(24-33)
💤 Files with no reviewable changes (1)
- components/backends/sglang/docs/sgl-http-server.md
🧰 Additional context used
🧬 Code Graph Analysis (3)
components/backends/sglang/src/dynamo/sglang/components/worker.py (2)
components/backends/sglang/src/dynamo/sglang/utils/protocol.py (1)
DisaggPreprocessedRequest
(58-64)components/backends/sglang/src/dynamo/sglang/utils/sgl_utils.py (1)
parse_sglang_args_inc
(24-33)
components/backends/sglang/src/dynamo/sglang/__main__.py (1)
components/backends/sglang/src/dynamo/sglang/components/worker.py (1)
worker
(301-315)
components/backends/sglang/src/dynamo/sglang/components/decode_worker.py (1)
components/backends/sglang/src/dynamo/sglang/utils/sgl_utils.py (1)
parse_sglang_args_inc
(24-33)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Build and Test - vllm
- GitHub Check: pre-merge-rust (lib/runtime/examples)
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: pre-merge-rust (.)
🔇 Additional comments (5)
components/backends/sglang/src/dynamo/sglang/components/decode_worker.py (2)
18-18
: Import path update looks correct.The new absolute package path under
dynamo.sglang
aligns with the repo re-org and should resolve cleanly once the package is installed.
47-51
: Yielding a single dict violates Dynamo streaming convention.
flush_cache
currently yields one dict and returns; Dynamo endpoints normally finish without yielding when returning a single response, or should mark the stream’s end explicitly. Verify that the client correctly handles this single-item stream.components/backends/sglang/src/dynamo/sglang/components/worker.py (1)
30-32
: Import path adjustments LGTM.Moving the utilities under
dynamo.sglang.utils
keeps the namespace consistent with the new build layout.components/backends/sglang/docs/dsr1-wideep-h100.md (1)
50-50
: Path update acknowledged.The documentation now reflects the new directory structure.
components/backends/sglang/docs/multinode-examples.md (1)
24-35
: Docs still reference removed flags in new entry-pointDouble-check that
dynamo.sglang.__main__
accepts all the shown flags (--page-size
,--dp-size
, etc.). If some flags migrated or were renamed, update the docs to avoid runtime failure.
WalkthroughThis update reorganizes and standardizes the SGLang backend component by correcting file paths, updating script invocation to use Python modules, and adding a new entry point. Documentation and launch scripts are revised to reflect the new structure, and a requirements file is added. No functional code changes are introduced. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Shell/Script
participant Python Interpreter
participant dynamo.sglang.__main__
participant Worker
User->>Shell/Script: Run launch script (e.g., agg.sh)
Shell/Script->>Python Interpreter: python3 -m dynamo.sglang
Python Interpreter->>dynamo.sglang.__main__: Import and execute
dynamo.sglang.__main__->>Worker: asyncio.run(worker())
Worker-->>dynamo.sglang.__main__: Worker coroutine runs with uvloop
Estimated code review effort2 (~15 minutes) Possibly related PRs
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
♻️ Duplicate comments (3)
components/backends/sglang/requirements.txt (1)
1-5
: Duplicated dependency list – ensure a single source of truth
sglang
anduvloop
are captured here but not inpyproject.toml
. Having two independent lists risks drift.
Prefer declaring runtime dependencies only inpyproject.toml
and generating this file (or eliminating it) to avoid mismatches.components/backends/sglang/launch/agg.sh (2)
14-16
: Same relative-path fragility as inagg_router.sh
Replicate the
SCRIPT_DIR
fix suggested there to make the helper script callable from any location.
22-29
: PID management identical toagg_router.sh
Apply the same foreground/background decision and cleanup enhancement here for consistency.
🧹 Nitpick comments (9)
components/backends/sglang/docs/dsr1-wideep-h100.md (1)
50-51
: Update CLI examples to use the new module entry-pointThe docs now point readers to
/components/backends/sglang
, but the invocation examples still usepython3 components/worker.py
.
With__main__.py
added, the simpler (and location-agnostic) form works:python -m dynamo.sglang --disaggregation-mode prefill ...Updating the snippets keeps the docs in sync with the code.
components/backends/sglang/launch/agg_router.sh (1)
22-29
: Background the worker or extend cleanup handlingThe worker now runs in the foreground (
python3 -m dynamo.sglang \
with no trailing&
).
If the intent is to keep it in the foreground, the currenttrap
only cleans up the ingress process ($DYNAMO_PID
).
Either:
- background the worker and track its PID, or
- leave it in the foreground and drop dead-code cleanup for it (less robust).
Option 1 (recommended):
-python3 -m dynamo.sglang \ +python3 -m dynamo.sglang \ … \ - --skip-tokenizer-init \ + --skip-tokenizer-init \ + & # background +WORKER_PID=$!and in
cleanup()
appendkill $WORKER_PID
/wait $WORKER_PID
.This prevents a stray worker when the script terminates abnormally.
components/backends/sglang/launch/disagg_dp_attn.sh (1)
22-32
: Inconsistent entry point between prefill and decode workersPrefill worker uses the new module entry point (
python3 -m dynamo.sglang
) while decode worker still runs the file directly:CUDA_VISIBLE_DEVICES=2,3 python3 src/dynamo/sglang/components/decode_worker.py …
Consider exposing
decode_worker
viadynamo.sglang.__main__
(e.g.,--mode decode
) or a dedicated-m dynamo.sglang.decode_worker
to keep invocation symmetric and avoid importing private paths from tooling scripts.components/backends/sglang/docs/multinode-examples.md (2)
14-15
: Documented helper-script path relies on repo layout knowledge
./components/backends/sglang/src/dynamo/sglang/utils/gen_env_vars.sh
is long and brittle.
Readers typically run commands from the repo root; consider recommending:bash components/backends/sglang/src/dynamo/sglang/utils/gen_env_vars.shor prefix with
$(git rev-parse --show-toplevel)
for clarity.
24-37
: Explain the switch to module executionYou introduce
python3 -m dynamo.sglang
without noting the rationale.
A short sentence such as “dynamo.sglang
is now installable and exposes the worker via its__main__
entry point” will prevent confusion for users scanning the docs.components/backends/sglang/launch/disagg.sh (2)
22-30
: Prefer module invocation for all workers for consistencyThe pre-fill worker is now launched via
python3 -m dynamo.sglang
, but the decode worker below is still invoked by path.
For symmetry and easier packaging you could exposedynamo.sglang.components.decode_worker
(or a dedicateddynamo.sglang.decode
) as a module entry-point and start it with:CUDA_VISIBLE_DEVICES=1 python3 -m dynamo.sglang.decode_worker ...This keeps the launch script uniform and avoids hard-coding the file layout.
34-42
: Hard-coded GPU index may cause accidental oversubscription
CUDA_VISIBLE_DEVICES=1
assumes GPU 0 is already occupied by the prefill worker.
In multi-GPU or containerised environments this may not hold and could overrun resources.Consider parameterising the decode GPU via an environment variable:
: "${DECODE_GPU:=1}" CUDA_VISIBLE_DEVICES=$DECODE_GPU python3 ...This makes the script safer to reuse across nodes with different GPU topologies.
components/backends/sglang/README.md (2)
98-102
: Nit: clarify placeholder-metrics TODO wordingMinor phrasing: “placeholder engine metrics to keep the Dynamo KV-router happy” rather than “placeholder engine metrics to keep the Dynamo KV-router happy”. Improves readability while users await the referenced PRs.
128-130
: Optional: collapse repeatedcd
instructionsMultiple example blocks repeat the
cd $DYNAMO_ROOT/components/backends/sglang
step. Adding a short “Prerequisite” note once at the top (or exporting an environment variable) would reduce duplication and the chance of future path drift.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
components/backends/sglang/README.md
(5 hunks)components/backends/sglang/docs/dsr1-wideep-h100.md
(1 hunks)components/backends/sglang/docs/multinode-examples.md
(3 hunks)components/backends/sglang/docs/sgl-http-server.md
(0 hunks)components/backends/sglang/launch/agg.sh
(1 hunks)components/backends/sglang/launch/agg_router.sh
(1 hunks)components/backends/sglang/launch/disagg.sh
(2 hunks)components/backends/sglang/launch/disagg_dp_attn.sh
(3 hunks)components/backends/sglang/requirements.txt
(1 hunks)components/backends/sglang/src/dynamo/sglang/__main__.py
(1 hunks)components/backends/sglang/src/dynamo/sglang/components/decode_worker.py
(1 hunks)components/backends/sglang/src/dynamo/sglang/components/worker.py
(1 hunks)pyproject.toml
(1 hunks)
🧬 Code Graph Analysis (2)
components/backends/sglang/src/dynamo/sglang/components/decode_worker.py (1)
components/backends/sglang/src/dynamo/sglang/utils/sgl_utils.py (1)
parse_sglang_args_inc
(24-33)
components/backends/sglang/src/dynamo/sglang/components/worker.py (2)
components/backends/sglang/src/dynamo/sglang/utils/protocol.py (1)
DisaggPreprocessedRequest
(58-64)components/backends/sglang/src/dynamo/sglang/utils/sgl_utils.py (1)
parse_sglang_args_inc
(24-33)
💤 Files with no reviewable changes (1)
- components/backends/sglang/docs/sgl-http-server.md
🧰 Additional context used
🧬 Code Graph Analysis (2)
components/backends/sglang/src/dynamo/sglang/components/decode_worker.py (1)
components/backends/sglang/src/dynamo/sglang/utils/sgl_utils.py (1)
parse_sglang_args_inc
(24-33)
components/backends/sglang/src/dynamo/sglang/components/worker.py (2)
components/backends/sglang/src/dynamo/sglang/utils/protocol.py (1)
DisaggPreprocessedRequest
(58-64)components/backends/sglang/src/dynamo/sglang/utils/sgl_utils.py (1)
parse_sglang_args_inc
(24-33)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Build and Test - vllm
- GitHub Check: pre-merge-rust (lib/runtime/examples)
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: pre-merge-rust (.)
🔇 Additional comments (3)
components/backends/sglang/src/dynamo/sglang/components/worker.py (1)
30-31
: Import-path switch looks correctThe move to the fully-qualified path (
dynamo.sglang.utils.*
) matches the new package layout and will work once the directory is included in the wheel (see comment inpyproject.toml
).
No further action required.components/backends/sglang/src/dynamo/sglang/components/decode_worker.py (1)
18-19
: Import-path update is consistentSame observation as in
worker.py
; the renamed module path is valid after the restructure.components/backends/sglang/docs/multinode-examples.md (1)
43-57
: Update remaining decode-worker commands for consistencyLater in the document (
Node 3
/Node 4
) the commands still referencepython3 components/decode_worker.py
.
With the new layout these paths are stale and will 404. Align them with the updatedsrc/dynamo/sglang/components/decode_worker.py
or, better, the future module entry point once added.
WalkthroughThis update reorganizes the SGLang backend by correcting directory paths, updating module import statements, and switching script invocations to use Python module syntax. Documentation and launch scripts are revised to reflect the new structure. A new requirements file and a main module entry point are introduced, and obsolete license headers are removed. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Shell Script
participant Python (dynamo.sglang)
participant Worker
participant uvloop
User->>Shell Script: Run launch script (e.g., agg.sh)
Shell Script->>Python (dynamo.sglang): python3 -m dynamo.sglang
Python (dynamo.sglang)->>uvloop: Install uvloop as event loop
Python (dynamo.sglang)->>Worker: Run worker() coroutine
Worker-->>Python (dynamo.sglang): Start backend process
Estimated code review effort2 (~15 minutes) Possibly related PRs
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🔭 Outside diff range comments (3)
components/backends/sglang/src/dynamo/sglang/components/decode_worker.py (1)
54-57
:runtime.shutdown()
is invoked inside an async func but not awaited.If
DistributedRuntime.shutdown()
ever becomesasync
the current call will silently ignore the coroutine and the process may exit before cleanup finishes.-async def graceful_shutdown(runtime): +async def graceful_shutdown(runtime): logging.info("Received shutdown signal, shutting down DistributedRuntime") - runtime.shutdown() + maybe_coro = runtime.shutdown() + if asyncio.iscoroutine(maybe_coro): + await maybe_coro logging.info("DistributedRuntime shutdown complete")components/backends/sglang/launch/disagg.sh (1)
22-31
: Prefill worker OK – but missing trap cleanup
PREFILL_PID
is killed in trap, good. Decode worker PID is not tracked; if the script exits early, it will stay running.Track and kill it:
PREFILL_PID=$! ... -CUDA_VISIBLE_DEVICES=1 python3 src/dynamo/sglang/components/decode_worker.py \ +CUDA_VISIBLE_DEVICES=1 python3 -m dynamo.sglang.components.decode_worker \ ... - --disaggregation-transfer-backend nixl + --disaggregation-transfer-backend nixl & +DECODE_PID=$!and in the trap:
kill $DYNAMO_PID $PREFILL_PID $DECODE_PID 2>/dev/null || true wait $DYNAMO_PID $PREFILL_PID $DECODE_PID 2>/dev/null || truecomponents/backends/sglang/launch/disagg_dp_attn.sh (1)
36-47
: Decode worker isn’t tracked or cleaned up
CUDA_VISIBLE_DEVICES=2,3 … &
starts a background job but its PID is never recorded, and therefore not killed by thecleanup
trap. On exit the process will linger.-CUDA_VISIBLE_DEVICES=2,3 python3 src/dynamo/sglang/components/decode_worker.py \ +CUDA_VISIBLE_DEVICES=2,3 python3 -m dynamo.sglang.components.decode_worker \ @@ - --port 31000 + --port 31000 & +DECODE_PID=$!and extend the trap:
-kill $DYNAMO_PID $PREFILL_PID 2>/dev/null || true -wait $DYNAMO_PID $PREFILL_PID 2>/dev/null || true +kill $DYNAMO_PID $PREFILL_PID $DECODE_PID 2>/dev/null || true +wait $DYNAMO_PID $PREFILL_PID $DECODE_PID 2>/dev/null || true
♻️ Duplicate comments (2)
components/backends/sglang/launch/agg.sh (1)
15-15
: Same relative-path issue as agg_router.shPlease apply the fix described in agg_router.sh (use
-m dynamo.sglang.utils.clear_namespace
).components/backends/sglang/launch/disagg.sh (1)
15-15
: Relative path wrongSame fix as earlier scripts (
-m dynamo.sglang.utils.clear_namespace
).
🧹 Nitpick comments (7)
components/backends/sglang/requirements.txt (1)
4-5
: Add platform marker / version pin foruvloop
.
uvloop
only ships wheels for Linux/macOS and often introduces ABI-breaking changes between minor releases.
Consider constraining and guarding the dependency, e.g.-sglang[all]>=0.4.9.post2 -uvloop +sglang[all]>=0.4.9.post2 +uvloop>=0.19 ; sys_platform != "win32"This keeps Windows installs working (they will skip the extra) and prevents accidental upgrades that break the event loop.
components/backends/sglang/docs/dsr1-wideep-h100.md (1)
50-50
: Path updated – check remaining docs for the old location.Good catch here. A quick grep still shows a few occurrences of
examples/sglang
in other markdown files. Making them consistent will avoid confusing newcomers.components/backends/sglang/src/dynamo/sglang/components/decode_worker.py (1)
98-99
: Redundantuvloop.install()
– safe but unnecessary.
__main__.py
already installsuvloop
. Re-installing here is idempotent, yet a single central call keeps startup logs cleaner.If you keep it, add a comment explaining why double-install is intentional.
components/backends/sglang/launch/agg_router.sh (1)
22-29
: Consistency: invoke decode/prefill workers the same wayYou switched from
components/worker.py
topython3 -m dynamo.sglang
, which is great.
Make sure future scripts (e.g.decode_worker
) follow the same pattern, otherwise users have to remember two different call styles.
No action needed in this file, just flagging for cross-script consistency.components/backends/sglang/launch/agg.sh (1)
22-28
: Environment dependency on installed packageRunning
python3 -m dynamo.sglang
requires thatdynamo
is import-resolvable.
For local, editable checkouts, prepend the project root toPYTHONPATH
or switch to:PYTHONPATH="$(git rev-parse --show-toplevel)/components/backends/sglang/src:$PYTHONPATH" \ python3 -m dynamo.sglang ...
Document this in README to prevent “module not found” surprises.
components/backends/sglang/launch/disagg_dp_attn.sh (1)
5-13
: Harden the bash scriptAdd the usual safety flags to stop on errors and unset vars:
-#!/bin/bash +#!/usr/bin/env bash +set -euo pipefailcomponents/backends/sglang/README.md (1)
98-103
: Minor nit – wording
components/backends/sglang/src/dynamo/sglang/components/worker.py
no longer exists; the worker lives in.../prefill_worker.py
anddecode_worker.py
. Consider updating the note to avoid confusion.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
components/backends/sglang/README.md
(5 hunks)components/backends/sglang/docs/dsr1-wideep-h100.md
(1 hunks)components/backends/sglang/docs/multinode-examples.md
(3 hunks)components/backends/sglang/docs/sgl-http-server.md
(0 hunks)components/backends/sglang/launch/agg.sh
(1 hunks)components/backends/sglang/launch/agg_router.sh
(1 hunks)components/backends/sglang/launch/disagg.sh
(2 hunks)components/backends/sglang/launch/disagg_dp_attn.sh
(3 hunks)components/backends/sglang/requirements.txt
(1 hunks)components/backends/sglang/src/dynamo/sglang/__main__.py
(1 hunks)components/backends/sglang/src/dynamo/sglang/components/decode_worker.py
(1 hunks)components/backends/sglang/src/dynamo/sglang/components/worker.py
(1 hunks)pyproject.toml
(1 hunks)
🧠 Learnings (10)
pyproject.toml (1)
Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
components/backends/sglang/docs/dsr1-wideep-h100.md (2)
Learnt from: GuanLuo
PR: #1371
File: examples/llm/benchmarks/vllm_multinode_setup.sh:18-25
Timestamp: 2025-06-05T01:46:15.509Z
Learning: In multi-node setups with head/worker architecture, the head node typically doesn't need environment variables pointing to its own services (like NATS_SERVER, ETCD_ENDPOINTS) because local processes can access them via localhost. Only worker nodes need these environment variables to connect to the head node's external IP address.
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.
components/backends/sglang/README.md (4)
Learnt from: PeaBrane
PR: #1409
File: examples/router_standalone/worker.py:171-186
Timestamp: 2025-06-08T08:30:45.126Z
Learning: Example code in the examples/
directory may intentionally use hard-coded values or simplified implementations that wouldn't be appropriate for production code, but are acceptable for demonstration and testing purposes.
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/job_script_template.j2:59-59
Timestamp: 2025-07-02T13:20:28.800Z
Learning: In the SLURM job script template at examples/sglang/slurm_jobs/job_script_template.j2, the --total_nodes
parameter represents the total nodes per worker type (prefill or decode), not the total nodes in the entire cluster. Each worker type needs to know its own group size for distributed coordination.
Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/configs/agg_tp_1_dp_8.yaml:31-38
Timestamp: 2025-07-01T15:33:53.262Z
Learning: In fault tolerance test configurations, the resources
section under ServiceArgs
specifies resources per individual worker, not total resources for all workers. So workers: 8
with gpu: '1'
means 8 workers × 1 GPU each = 8 GPUs total.
components/backends/sglang/launch/disagg_dp_attn.sh (1)
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.
components/backends/sglang/src/dynamo/sglang/components/worker.py (1)
Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/utils/metrics.py:30-32
Timestamp: 2025-07-01T13:55:03.940Z
Learning: The @dynamo_worker()
decorator in the dynamo codebase returns a wrapper that automatically injects the runtime
parameter before calling the wrapped function. This means callers only need to provide the non-runtime parameters, while the decorator handles injecting the runtime argument automatically. For example, a function with signature async def get_metrics(runtime, log_dir)
decorated with @dynamo_worker()
can be called as get_metrics(log_dir)
because the decorator wrapper injects the runtime parameter.
components/backends/sglang/src/dynamo/sglang/components/decode_worker.py (3)
Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/utils/metrics.py:30-32
Timestamp: 2025-07-01T13:55:03.940Z
Learning: The @dynamo_worker()
decorator in the dynamo codebase returns a wrapper that automatically injects the runtime
parameter before calling the wrapped function. This means callers only need to provide the non-runtime parameters, while the decorator handles injecting the runtime argument automatically. For example, a function with signature async def get_metrics(runtime, log_dir)
decorated with @dynamo_worker()
can be called as get_metrics(log_dir)
because the decorator wrapper injects the runtime parameter.
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.
components/backends/sglang/launch/agg_router.sh (1)
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.
components/backends/sglang/docs/multinode-examples.md (3)
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/job_script_template.j2:59-59
Timestamp: 2025-07-02T13:20:28.800Z
Learning: In the SLURM job script template at examples/sglang/slurm_jobs/job_script_template.j2, the --total_nodes
parameter represents the total nodes per worker type (prefill or decode), not the total nodes in the entire cluster. Each worker type needs to know its own group size for distributed coordination.
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.
Learnt from: GuanLuo
PR: #1371
File: examples/llm/benchmarks/vllm_multinode_setup.sh:18-25
Timestamp: 2025-06-05T01:46:15.509Z
Learning: In multi-node setups with head/worker architecture, the head node typically doesn't need environment variables pointing to its own services (like NATS_SERVER, ETCD_ENDPOINTS) because local processes can access them via localhost. Only worker nodes need these environment variables to connect to the head node's external IP address.
components/backends/sglang/launch/disagg.sh (1)
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.
components/backends/sglang/launch/agg.sh (1)
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.
🧬 Code Graph Analysis (2)
components/backends/sglang/src/dynamo/sglang/components/worker.py (2)
components/backends/sglang/src/dynamo/sglang/utils/protocol.py (1)
DisaggPreprocessedRequest
(58-64)components/backends/sglang/src/dynamo/sglang/utils/sgl_utils.py (1)
parse_sglang_args_inc
(24-33)
components/backends/sglang/src/dynamo/sglang/components/decode_worker.py (1)
components/backends/sglang/src/dynamo/sglang/utils/sgl_utils.py (1)
parse_sglang_args_inc
(24-33)
💤 Files with no reviewable changes (1)
- components/backends/sglang/docs/sgl-http-server.md
🧰 Additional context used
🧠 Learnings (10)
pyproject.toml (1)
Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
components/backends/sglang/docs/dsr1-wideep-h100.md (2)
Learnt from: GuanLuo
PR: #1371
File: examples/llm/benchmarks/vllm_multinode_setup.sh:18-25
Timestamp: 2025-06-05T01:46:15.509Z
Learning: In multi-node setups with head/worker architecture, the head node typically doesn't need environment variables pointing to its own services (like NATS_SERVER, ETCD_ENDPOINTS) because local processes can access them via localhost. Only worker nodes need these environment variables to connect to the head node's external IP address.
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.
components/backends/sglang/README.md (4)
Learnt from: PeaBrane
PR: #1409
File: examples/router_standalone/worker.py:171-186
Timestamp: 2025-06-08T08:30:45.126Z
Learning: Example code in the examples/
directory may intentionally use hard-coded values or simplified implementations that wouldn't be appropriate for production code, but are acceptable for demonstration and testing purposes.
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/job_script_template.j2:59-59
Timestamp: 2025-07-02T13:20:28.800Z
Learning: In the SLURM job script template at examples/sglang/slurm_jobs/job_script_template.j2, the --total_nodes
parameter represents the total nodes per worker type (prefill or decode), not the total nodes in the entire cluster. Each worker type needs to know its own group size for distributed coordination.
Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/configs/agg_tp_1_dp_8.yaml:31-38
Timestamp: 2025-07-01T15:33:53.262Z
Learning: In fault tolerance test configurations, the resources
section under ServiceArgs
specifies resources per individual worker, not total resources for all workers. So workers: 8
with gpu: '1'
means 8 workers × 1 GPU each = 8 GPUs total.
components/backends/sglang/launch/disagg_dp_attn.sh (1)
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.
components/backends/sglang/src/dynamo/sglang/components/worker.py (1)
Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/utils/metrics.py:30-32
Timestamp: 2025-07-01T13:55:03.940Z
Learning: The @dynamo_worker()
decorator in the dynamo codebase returns a wrapper that automatically injects the runtime
parameter before calling the wrapped function. This means callers only need to provide the non-runtime parameters, while the decorator handles injecting the runtime argument automatically. For example, a function with signature async def get_metrics(runtime, log_dir)
decorated with @dynamo_worker()
can be called as get_metrics(log_dir)
because the decorator wrapper injects the runtime parameter.
components/backends/sglang/src/dynamo/sglang/components/decode_worker.py (3)
Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
Learnt from: nnshah1
PR: #1444
File: tests/fault_tolerance/utils/metrics.py:30-32
Timestamp: 2025-07-01T13:55:03.940Z
Learning: The @dynamo_worker()
decorator in the dynamo codebase returns a wrapper that automatically injects the runtime
parameter before calling the wrapped function. This means callers only need to provide the non-runtime parameters, while the decorator handles injecting the runtime argument automatically. For example, a function with signature async def get_metrics(runtime, log_dir)
decorated with @dynamo_worker()
can be called as get_metrics(log_dir)
because the decorator wrapper injects the runtime parameter.
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.
components/backends/sglang/launch/agg_router.sh (1)
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.
components/backends/sglang/docs/multinode-examples.md (3)
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/job_script_template.j2:59-59
Timestamp: 2025-07-02T13:20:28.800Z
Learning: In the SLURM job script template at examples/sglang/slurm_jobs/job_script_template.j2, the --total_nodes
parameter represents the total nodes per worker type (prefill or decode), not the total nodes in the entire cluster. Each worker type needs to know its own group size for distributed coordination.
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.
Learnt from: GuanLuo
PR: #1371
File: examples/llm/benchmarks/vllm_multinode_setup.sh:18-25
Timestamp: 2025-06-05T01:46:15.509Z
Learning: In multi-node setups with head/worker architecture, the head node typically doesn't need environment variables pointing to its own services (like NATS_SERVER, ETCD_ENDPOINTS) because local processes can access them via localhost. Only worker nodes need these environment variables to connect to the head node's external IP address.
components/backends/sglang/launch/disagg.sh (1)
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.
components/backends/sglang/launch/agg.sh (1)
Learnt from: fsaady
PR: #1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.
🧬 Code Graph Analysis (2)
components/backends/sglang/src/dynamo/sglang/components/worker.py (2)
components/backends/sglang/src/dynamo/sglang/utils/protocol.py (1)
DisaggPreprocessedRequest
(58-64)components/backends/sglang/src/dynamo/sglang/utils/sgl_utils.py (1)
parse_sglang_args_inc
(24-33)
components/backends/sglang/src/dynamo/sglang/components/decode_worker.py (1)
components/backends/sglang/src/dynamo/sglang/utils/sgl_utils.py (1)
parse_sglang_args_inc
(24-33)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Build and Test - vllm
- GitHub Check: pre-merge-rust (lib/runtime/examples)
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: pre-merge-rust (.)
🔇 Additional comments (7)
pyproject.toml (1)
82-82
: Wheel target grows again – run a sanity build.Adding yet another
src/dynamo
path is fine (namespace packages), but the list is getting long and easy to mistype.
Please run a localhatch build -t wheel
to confirm no path typo sneaks through and that the new package ends up in the wheel as expected.components/backends/sglang/src/dynamo/sglang/components/worker.py (2)
30-32
: Import move LGTM.The switch to absolute imports under
dynamo.sglang.utils
improves clarity and avoids the “script vs module” pitfalls seen earlier.
314-315
: CLI parsing relies on rawsys.argv
– validate downstream.
parse_sglang_args_inc()
consumes the global argv list. When this worker is started throughpython -m dynamo.sglang
,__main__.py
already peeled off some arguments. Just make sure no mandatory flag is lost; otherwise consider forwardingsys.argv[1:]
from__main__
instead of re-reading it here.components/backends/sglang/src/dynamo/sglang/__main__.py (1)
10-12
: Missing CLI forwarding – arguments passed after-m dynamo.sglang
are ignored
asyncio.run(worker())
discardssys.argv[1:]
.
Worker currently expects to parse CLI flags itself (as done in the oldcomponents/worker.py
).
If that API changed, fine; if not, forward the arguments:-import asyncio +import asyncio, sys ... - asyncio.run(worker()) + asyncio.run(worker(sys.argv[1:]))Ensure the signature of
worker()
supports this.components/backends/sglang/docs/multinode-examples.md (1)
24-39
: Worker invocation updated, but decode workers still use old path further belowSections for Node 3 & 4 still reference
python3 components/decode_worker.py
.
Update them to the new module invocation for consistency and to avoid path issues.Example:
-python3 components/decode_worker.py \ +python3 -m dynamo.sglang.components.decode_worker \Also applies to: 43-58
components/backends/sglang/launch/disagg_dp_attn.sh (1)
22-33
: Good move topython -m dynamo.sglang
– LGTMcomponents/backends/sglang/README.md (1)
170-170
: Confirm SLURM docs link
slurm_jobs/README.md
path looks right after the move – no action required.
And fix Code Rabbit feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much Big GK
This works now:
cd $PROJECT_ROOT
uv pip install .
python -m dynamo.sglang.worker --model-path ~/llms/Qwen3-0.6B --skip-tokenizer-init
Also
python -m dynamo.sglang.decode_worker
Changes:
examples/sglang
->components/backends/sglang
examples/sglang/components
->components/backends/sglang/src/dynamo/sglang/worker
anddecode_worker
.examples/sglang/utils
->components/backends/sglang/src/dynamo/sglang/utils
components/backends/sglang/src/dynamo/sglang/worker/__main__.py
which is what makespython -m dynamo.sglang.worker
work.pyproject.toml
to install this new module.components/backends/sglang/requirements.txt
Summary by CodeRabbit
Documentation
Chores
New Features