Skip to content

Conversation

PeaBrane
Copy link
Contributor

@PeaBrane PeaBrane commented Aug 20, 2025

Motivation

The motivation for this is to prepare for router replica warm restarts. KV router needs to be aware of the existence (and healthiness) of other KV routers, so they can properly coordinate dumping their RadixTree states and loading them back.

Changes

Register an instance of KV router under the kv_routers/ prefix upon successful creation of a KvRouter in create_kv_chooser

E2E tests

In pre-merge CI, verify that one Router gets you one key with prefix kv_routers/ and two Router replicas get you two keys with prefix kv_routers/ 😃

Summary by CodeRabbit

  • New Features
    • KV router configurations are now persisted to a cluster store in dynamic mode, improving continuity across restarts and deployments.
    • Creation of new KV routers now validates the availability of the cluster store upfront, providing immediate feedback if unavailable.
    • Added serialization support for KV router configuration, enabling safer persistence and potential future export/import flows.

Copy link
Contributor

coderabbitai bot commented Aug 20, 2025

Walkthrough

Adds dynamic-mode persistence for KV router configs: when creating a KvRouter, ModelManager fetches etcd from the distributed runtime, serializes the router config to JSON, and writes it under a generated key. Fails if etcd is unavailable. Also adds serde derives to KvRouterConfig to enable serialization.

Changes

Cohort / File(s) Summary
ModelManager: dynamic etcd persistence on KV router creation
lib/llm/src/discovery/model_manager.rs
Imports DistributedRuntimeProvider. In create_kv_chooser, acquires etcd client, constructs a unique key, serializes KvRouterConfig (or default) to pretty JSON, writes via kv_create, awaits completion, and errors if etcd is unavailable.
KvRouterConfig serialization support
lib/llm/src/kv_router.rs
Adds serde support by deriving Serialize and Deserialize on KvRouterConfig; no runtime logic changes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Caller
  participant MM as ModelManager
  participant DRT as DistributedRuntimeProvider
  participant ETCD as etcd
  participant KR as KvRouter

  Caller->>MM: create_kv_chooser(model_name, kv_router_config)
  MM->>DRT: drt().etcd_client()
  alt etcd client available
    MM->>MM: generate router_key (model_name + UUID)
    MM->>MM: serialize kv_router_config → JSON
    MM->>ETCD: kv_create(router_key, JSON)
    ETCD-->>MM: ack/result
    MM->>KR: instantiate KvRouter
    MM-->>Caller: return KvRouter
  else etcd unavailable
    MM-->>Caller: error "KV routing requires etcd (dynamic mode)"
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

A rabbit taps keys with a rhythmic delight,
Etcd gets the map, saved tidy and tight.
Routers now remember, configs in tow,
JSON footprints where the keywinds blow.
Hop, write, await—then off we go! 🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
lib/llm/src/kv_router.rs (1)

77-90: Add #[serde(default)] for forward/backward compatibility when reading from etcd.

When you later read entries written to etcd, missing fields in older configs will deserialize to defaults instead of failing.

Apply:

-#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
+#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
+#[serde(default)]
 pub struct KvRouterConfig {
lib/llm/src/discovery/model_manager.rs (2)

230-242: Use a constant for the kv_routers/ prefix.

Keeps paths consistent and discoverable next to other KV router constants.

Example:

  • Define in kv_router.rs (near other constants):
    pub const KV_ROUTERS_ROOT_PATH: &str = "kv_routers";

  • Use here:

-        let router_key = format!("kv_routers/{}/{}", model_name, uuid::Uuid::new_v4());
+        let router_key = format!("{}/{}/{}", KV_ROUTERS_ROOT_PATH, model_name, uuid::Uuid::new_v4());

230-242: Prefer compact JSON for etcd values.

Pretty-printing increases payload and network/storage overhead with no runtime benefit.

-                serde_json::to_vec_pretty(&kv_router_config.unwrap_or_default())?,
+                serde_json::to_vec(&kv_router_config.unwrap_or_default())?,
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 536280f and 5cd5bd5.

📒 Files selected for processing (2)
  • lib/llm/src/discovery/model_manager.rs (2 hunks)
  • lib/llm/src/kv_router.rs (2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
lib/llm/src/discovery/model_manager.rs (3)
lib/bindings/python/rust/lib.rs (1)
  • etcd_client (315-320)
lib/llm/src/http/service/service_v2.rs (1)
  • etcd_client (114-116)
lib/runtime/src/transports/etcd.rs (1)
  • etcd_client (142-144)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: pre-merge-rust (.)
  • GitHub Check: pre-merge-rust (lib/runtime/examples)
  • GitHub Check: pre-merge-rust (lib/bindings/python)
🔇 Additional comments (3)
lib/llm/src/kv_router.rs (1)

19-19: Serde derive feature and serde_json dependency confirmed
Verified that the root Cargo.toml declares

  • serde = { version = "1", features = ["derive"] }
  • serde_json = { version = "1" }

and that lib/llm pulls both in via workspace = true. No further changes required.

lib/llm/src/discovery/model_manager.rs (2)

5-5: Trait import LGTM.

Importing DistributedRuntimeProvider is necessary to call component.drt().


230-242: Lease semantics and UUID dependency verified

  • kv_create’s signature is kv_create(&self, key: &str, value: Vec<u8>, lease_id: Option<i64>), and None unwraps to self.lease_id(), correctly attaching the primary lease as intended.
  • The uuid crate is declared in Cargo.toml (workspace) and available for use.

Copy link

copy-pr-bot bot commented Aug 20, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@PeaBrane PeaBrane force-pushed the rupei/router-register-etcd branch from b2e4627 to ade9c6b Compare August 20, 2025 08:25
@PeaBrane PeaBrane merged commit ab9c950 into main Aug 21, 2025
11 checks passed
@PeaBrane PeaBrane deleted the rupei/router-register-etcd branch August 21, 2025 16:24
hhzhang16 pushed a commit that referenced this pull request Aug 27, 2025
KrishnanPrash pushed a commit that referenced this pull request Sep 2, 2025
nnshah1 pushed a commit that referenced this pull request Sep 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants