Skip to content

Conversation

Arqu
Copy link
Collaborator

@Arqu Arqu commented Oct 3, 2025

Description

So rough description of what was happening:

  • the errors got swallowed but nuked the loop so it failed silently - fixed that
  • the 0rtt tries connecting on the new port/addr but still prefers the old path so needs to wait out for the path invalidation to do it's job before it continues to work

Being slow enough made it sometimes pass. Locally now tests pass ~instantly vs 15 seconds or so after the fix to gracefully handle errors.

I'm unsure if this is safe at all, like handling path updates from other angles or partial updates etc.
If it turns out too risky, we can drop the path invalidation changes and just keep the graceful error handling. Tests will pass.

Breaking Changes

Notes & open questions

Change checklist

  • Self-review.
  • Documentation updates following the style guide, if relevant.
  • Tests if relevant.
  • All breaking changes documented.
    • List all breaking changes in the above "Breaking Changes" section.
    • Open an issue or PR on any number0 repos that are affected by this breaking change. Give guidance on how the updates should be handled or do the actual updates themselves. The major ones are:

@Arqu Arqu self-assigned this Oct 3, 2025
@Arqu Arqu added the fix Fixes a bug label Oct 3, 2025
@Arqu Arqu added this to iroh Oct 3, 2025
@github-project-automation github-project-automation bot moved this to 🏗 In progress in iroh Oct 3, 2025
Copy link

github-actions bot commented Oct 3, 2025

Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/3496/docs/iroh/

Last updated: 2025-10-07T06:43:29Z

@Arqu Arqu requested review from flub and dignifiedquire October 3, 2025 09:09
@Arqu Arqu moved this from 🏗 In progress to 👀 In review in iroh Oct 3, 2025
Copy link

github-actions bot commented Oct 3, 2025

Netsim report & logs for this PR have been generated and is available at: LOGS
This report will remain available for 3 days.

Last updated for commit: 43bc650

// Invalidate paths that are not in the new address set.
// If the remote node sends us new addresses, old ones are likely stale (e.g., after restart).
// This forces immediate re-evaluation and prevents preferring "known bad" outdated paths
// over "unknown potentially good" new paths.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Frando @flub can you look at this please, I believe the two of you looked at this before

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. On a cursory look I think we get NodeAddr for other reasons, e.g. when a user dials with a NodeAddr. And that could be totally out of date and other info could be fresher.

In the multipath branch I currently send the initial packets to all known remotes. So that would mean whatever the working path, it should find it. I think it might need some kind of expiry of paths at some point, but wasn't going to do that immediately.

I've honestly been avoiding thinking too much about how to robustly solve these kind of issues in the current version, because it just slows down multipath work... but if others agree there's an improvement here that's fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also unsure, I'll have to dig in a bit more to find out if this could have unintended side effects (ie falsely invalidating good paths).

@dignifiedquire dignifiedquire added this to the v0.93.0 milestone Oct 6, 2025
@Arqu
Copy link
Collaborator Author

Arqu commented Oct 6, 2025

So what's our path forward here, should I drop the path update and just make sure we survive errors in the loop? That's a "good enough" fix for the flaky tests for now. Just not a proper fix for the underlying issue.

@dignifiedquire
Copy link
Contributor

So what's our path forward here, should I drop the path update and just make sure we survive errors in the loop?

Yeah lets do that, and file an issue, with the details that you discovered. And then we reexamine this once multipath has landed

@Arqu
Copy link
Collaborator Author

Arqu commented Oct 7, 2025

Reverted the path invalidation logic and followed up with an issue #3504

@Arqu Arqu requested a review from dignifiedquire October 7, 2025 06:45
@Arqu Arqu enabled auto-merge October 7, 2025 10:55
@Arqu Arqu added this pull request to the merge queue Oct 7, 2025
Merged via the queue into main with commit 9e61af5 Oct 7, 2025
77 of 79 checks passed
@github-project-automation github-project-automation bot moved this from 👀 In review to ✅ Done in iroh Oct 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Fixes a bug
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

4 participants