-
-
Notifications
You must be signed in to change notification settings - Fork 106
Description
Issue Description
After reviewing the subscription and update operations in the Freenet codebase, I've identified several issues with how subscriptions are established, maintained, and how updates propagate through the network. The most significant concern is the lack of distinction between upstream and downstream connections after subscription establishment, which affects the system's ability to recover when upstream connections drop.
Current Implementation
Subscription Establishment
Subscriptions are established in three main scenarios:
- Explicit subscription requests: When a client explicitly requests to subscribe to a contract using the
Subscribe
operation. - During PUT operations: When a node successfully puts a contract, it can automatically subscribe to it.
- During contract forwarding: When a contract is being forwarded through the network, nodes along the path may subscribe if they should seed the contract.
The subscription process involves:
- Finding a node with the contract
- Adding the requester as a subscriber
- Confirming the subscription with a
ReturnSub
message - Forwarding the confirmation upstream if needed
Update Propagation
Updates propagate through the subscription tree virally as intended. When a node receives an update for a contract, it:
- Updates its local copy of the contract
- Identifies all subscribers to that contract (excluding the sender)
- Broadcasts the update to all those subscribers
This creates a viral propagation pattern where updates flow through the network along subscription paths, as described in the Delta-Sync article.
Key Issues Identified
1. Insufficient Distinction Between Upstream and Downstream Connections
The most significant issue is that the system doesn't maintain a clear distinction between upstream and downstream connections after subscription establishment. While the subscription process initially tracks upstream subscribers during establishment, this distinction is lost afterward:
// During subscription establishment, upstream_subscriber is tracked
if let Some(upstream_subscriber) = upstream_subscriber {
tracing::debug!(
tx = %id,
%key,
upstream_subscriber = %upstream_subscriber.peer,
"Forwarding subscription to upstream subscriber"
);
return_msg = Some(SubscribeMsg::ReturnSub {
id: *id,
key: *key,
sender: target.clone(),
target: upstream_subscriber,
subscribed: true,
});
}
However, when broadcasting updates, all subscribers are treated equally:
// During update broadcasting, all subscribers except the sender are treated equally
pub(crate) fn get_broadcast_targets_update(
&self,
key: &ContractKey,
sender: &PeerId,
) -> Vec<PeerKeyLocation> {
let subscribers = self
.ring
.subscribers_of(key)
.map(|subs| {
subs.value()
.iter()
.filter(|pk| &pk.peer != sender)
.cloned()
.collect::<Vec<_>>()
})
.unwrap_or_default();
subscribers
}
2. No Recovery Mechanism for Upstream Connection Drops
There is no mechanism to find a new upstream connection if the current one drops. When a connection fails during broadcasting, the current implementation drops the connection entirely:
// When broadcasting fails, the connection is dropped entirely
conn_manager.drop_connection(&peer.peer).await?;
But there's no code to find a new upstream connection to ensure downstream peers continue to receive updates. This is particularly problematic because:
- It breaks the subscription tree
- Downstream nodes have no way to reconnect to the tree
- Updates stop propagating to all nodes below the broken connection
3. Aggressive Connection Dropping
When broadcasting updates, if sending to a subscriber fails, the current implementation drops the connection entirely. This might be too aggressive and could lead to network instability, especially in environments with occasional network hiccups.
4. Subscription Tree Maintenance
There doesn't appear to be a mechanism to rebalance the subscription tree if nodes leave or fail. This could lead to inefficient propagation paths over time.
5. Potential Race Conditions
The intermittent issue where users cannot join rooms about 2/3 of the time could be related to race conditions in the subscription process. When a node tries to subscribe to a contract, it might fail if:
- The contract doesn't exist yet (handled with retries)
- The maximum number of subscribers is reached
- The upstream connection drops during subscription establishment
The lack of proper recovery mechanisms when upstream connections drop could explain why this issue is intermittent - it depends on the network state and timing of connection drops.
Recommendations
-
Maintain Upstream/Downstream Distinction: Store and maintain the distinction between upstream and downstream connections throughout the lifecycle of a subscription.
-
Implement Recovery Mechanisms: When an upstream connection drops, nodes should automatically attempt to find a new upstream connection to maintain the feed to downstream peers.
-
More Nuanced Connection Handling: Consider a more nuanced approach to handling failed broadcasts, such as retrying or only dropping the subscription rather than the entire connection.
-
Subscription Tree Rebalancing: Implement a mechanism to periodically rebalance the subscription tree for efficiency.
-
Explicit Unsubscribe Operation: Implement an explicit unsubscribe operation to allow nodes to gracefully terminate subscriptions.
-
Improved Error Handling: Add better error handling and recovery mechanisms for subscription failures, particularly focusing on race conditions during subscription establishment.
Related Files
The main files involved in this issue are:
crates/core/src/operations/subscribe.rs
: Handles subscription establishment and managementcrates/core/src/operations/update.rs
: Handles update propagation through the subscription treecrates/core/src/ring/mod.rs
: Manages the ring topology and connection managementcrates/core/src/ring/seeding.rs
: Manages subscribers for contracts
Next Steps
- Modify the subscription system to maintain the distinction between upstream and downstream connections
- Implement recovery mechanisms for when upstream connections drop
- Improve error handling and connection management during update propagation
- Add explicit unsubscribe operations and subscription tree maintenance
Link to Devin run
https://app.devin.ai/sessions/4840cb5b668b4743a22e61798fb77523
Requested by: Ian Clarke ([email protected])
Metadata
Metadata
Assignees
Labels
Type
Projects
Status