Skip to content

Subscription and Update Operations Review in Freenet #1588

@sanity

Description

@sanity

Issue Description

After reviewing the subscription and update operations in the Freenet codebase, I've identified several issues with how subscriptions are established, maintained, and how updates propagate through the network. The most significant concern is the lack of distinction between upstream and downstream connections after subscription establishment, which affects the system's ability to recover when upstream connections drop.

Current Implementation

Subscription Establishment

Subscriptions are established in three main scenarios:

  1. Explicit subscription requests: When a client explicitly requests to subscribe to a contract using the Subscribe operation.
  2. During PUT operations: When a node successfully puts a contract, it can automatically subscribe to it.
  3. During contract forwarding: When a contract is being forwarded through the network, nodes along the path may subscribe if they should seed the contract.

The subscription process involves:

  • Finding a node with the contract
  • Adding the requester as a subscriber
  • Confirming the subscription with a ReturnSub message
  • Forwarding the confirmation upstream if needed

Update Propagation

Updates propagate through the subscription tree virally as intended. When a node receives an update for a contract, it:

  1. Updates its local copy of the contract
  2. Identifies all subscribers to that contract (excluding the sender)
  3. Broadcasts the update to all those subscribers

This creates a viral propagation pattern where updates flow through the network along subscription paths, as described in the Delta-Sync article.

Key Issues Identified

1. Insufficient Distinction Between Upstream and Downstream Connections

The most significant issue is that the system doesn't maintain a clear distinction between upstream and downstream connections after subscription establishment. While the subscription process initially tracks upstream subscribers during establishment, this distinction is lost afterward:

// During subscription establishment, upstream_subscriber is tracked
if let Some(upstream_subscriber) = upstream_subscriber {
    tracing::debug!(
        tx = %id,
        %key,
        upstream_subscriber = %upstream_subscriber.peer,
        "Forwarding subscription to upstream subscriber"
    );
    return_msg = Some(SubscribeMsg::ReturnSub {
        id: *id,
        key: *key,
        sender: target.clone(),
        target: upstream_subscriber,
        subscribed: true,
    });
}

However, when broadcasting updates, all subscribers are treated equally:

// During update broadcasting, all subscribers except the sender are treated equally
pub(crate) fn get_broadcast_targets_update(
    &self,
    key: &ContractKey,
    sender: &PeerId,
) -> Vec<PeerKeyLocation> {
    let subscribers = self
        .ring
        .subscribers_of(key)
        .map(|subs| {
            subs.value()
                .iter()
                .filter(|pk| &pk.peer != sender)
                .cloned()
                .collect::<Vec<_>>()
        })
        .unwrap_or_default();

    subscribers
}

2. No Recovery Mechanism for Upstream Connection Drops

There is no mechanism to find a new upstream connection if the current one drops. When a connection fails during broadcasting, the current implementation drops the connection entirely:

// When broadcasting fails, the connection is dropped entirely
conn_manager.drop_connection(&peer.peer).await?;

But there's no code to find a new upstream connection to ensure downstream peers continue to receive updates. This is particularly problematic because:

  1. It breaks the subscription tree
  2. Downstream nodes have no way to reconnect to the tree
  3. Updates stop propagating to all nodes below the broken connection

3. Aggressive Connection Dropping

When broadcasting updates, if sending to a subscriber fails, the current implementation drops the connection entirely. This might be too aggressive and could lead to network instability, especially in environments with occasional network hiccups.

4. Subscription Tree Maintenance

There doesn't appear to be a mechanism to rebalance the subscription tree if nodes leave or fail. This could lead to inefficient propagation paths over time.

5. Potential Race Conditions

The intermittent issue where users cannot join rooms about 2/3 of the time could be related to race conditions in the subscription process. When a node tries to subscribe to a contract, it might fail if:

  • The contract doesn't exist yet (handled with retries)
  • The maximum number of subscribers is reached
  • The upstream connection drops during subscription establishment

The lack of proper recovery mechanisms when upstream connections drop could explain why this issue is intermittent - it depends on the network state and timing of connection drops.

Recommendations

  1. Maintain Upstream/Downstream Distinction: Store and maintain the distinction between upstream and downstream connections throughout the lifecycle of a subscription.

  2. Implement Recovery Mechanisms: When an upstream connection drops, nodes should automatically attempt to find a new upstream connection to maintain the feed to downstream peers.

  3. More Nuanced Connection Handling: Consider a more nuanced approach to handling failed broadcasts, such as retrying or only dropping the subscription rather than the entire connection.

  4. Subscription Tree Rebalancing: Implement a mechanism to periodically rebalance the subscription tree for efficiency.

  5. Explicit Unsubscribe Operation: Implement an explicit unsubscribe operation to allow nodes to gracefully terminate subscriptions.

  6. Improved Error Handling: Add better error handling and recovery mechanisms for subscription failures, particularly focusing on race conditions during subscription establishment.

Related Files

The main files involved in this issue are:

  • crates/core/src/operations/subscribe.rs: Handles subscription establishment and management
  • crates/core/src/operations/update.rs: Handles update propagation through the subscription tree
  • crates/core/src/ring/mod.rs: Manages the ring topology and connection management
  • crates/core/src/ring/seeding.rs: Manages subscribers for contracts

Next Steps

  1. Modify the subscription system to maintain the distinction between upstream and downstream connections
  2. Implement recovery mechanisms for when upstream connections drop
  3. Improve error handling and connection management during update propagation
  4. Add explicit unsubscribe operations and subscription tree maintenance

Link to Devin run

https://app.devin.ai/sessions/4840cb5b668b4743a22e61798fb77523

Requested by: Ian Clarke ([email protected])

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    Status

    Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions