-
-
Notifications
You must be signed in to change notification settings - Fork 106
Open
Labels
A-networking-ringArea: Ring protocol and ops.Area: Ring protocol and ops.E-hardExperience needed to fix/implement: Hard / a lotExperience needed to fix/implement: Hard / a lotP-mediumMedium priorityMedium priority
Description
Summary
PUT
operations with subscribe:true
still complete before the follow-on subscribe request is guaranteed to succeed. The race shows up as clients receiving PutResponse
and moving on to UPDATE
or waiting for notifications long before the network subscription is actually established (or has even started). PR #1767 patched around this with sleeps, but Nacho already pointed out we really need to treat PUT+subscribe as a single committed transaction.
Requirements
- Treat the auto-subscribe that hangs off a PUT as part of the same logical transaction/operation so that the client only receives
PutResponse
after the subscribe succeeds (or fails deterministically). - If the subscribe fails, surface that error in the same response path instead of silently continuing, so the client can alert the user or retry.
- Update the RequestRouter tracking so PUT+subscribe are grouped under one transaction ID instead of two loosely-related ops.
- Ensure the solution plays nicely with recent router and subscription fixes (fix: Critical subscription routing fixes (Phases 1-3) #1854, Fix request router deduplication race with PUT operations (issue #1886) #1891) and the proximity cache work (feat: proximity-based update forwarding #1853).
- Provide regression tests that would fail without the atomic behaviour (e.g. immediate UPDATE after PUT with
subscribe:true
, and notification assertions instead of sleeps).
Open Questions
- Should PUT wait synchronously for the subscribe to complete, or should we change the API response to include an explicit subscribe status/result so clients can react without blocking the node’s op loop?
- How do we want to handle partial failures if PUT succeeds but subscribe cannot (e.g. no peers available)? Should we roll back the PUT or surface a structured warning?
References
- Fix subscribe:true flag for PUT operations #1767 Draft fix that highlights the timing gap.
- Comment from @iduartgomez on the race concern: Fix subscribe:true flag for PUT operations #1767 (comment).
Metadata
Metadata
Assignees
Labels
A-networking-ringArea: Ring protocol and ops.Area: Ring protocol and ops.E-hardExperience needed to fix/implement: Hard / a lotExperience needed to fix/implement: Hard / a lotP-mediumMedium priorityMedium priority
Type
Projects
Status
Triage