fix: 0rtt flakes #3496

Arqu · 2025-10-03T09:00:01Z

Description

So rough description of what was happening:

the errors got swallowed but nuked the loop so it failed silently - fixed that
the 0rtt tries connecting on the new port/addr but still prefers the old path so needs to wait out for the path invalidation to do it's job before it continues to work

Being slow enough made it sometimes pass. Locally now tests pass ~instantly vs 15 seconds or so after the fix to gracefully handle errors.

I'm unsure if this is safe at all, like handling path updates from other angles or partial updates etc.
If it turns out too risky, we can drop the path invalidation changes and just keep the graceful error handling. Tests will pass.

Breaking Changes

Notes & open questions

Change checklist

github-actions · 2025-10-03T09:02:50Z

Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/3496/docs/iroh/

Last updated: 2025-10-07T06:43:29Z

github-actions · 2025-10-03T09:13:57Z

Netsim report & logs for this PR have been generated and is available at: LOGS
This report will remain available for 3 days.

Last updated for commit: 43bc650

dignifiedquire · 2025-10-06T08:37:21Z

iroh/src/magicsock/node_map/node_state.rs

+        // Invalidate paths that are not in the new address set.
+        // If the remote node sends us new addresses, old ones are likely stale (e.g., after restart).
+        // This forces immediate re-evaluation and prevents preferring "known bad" outdated paths
+        // over "unknown potentially good" new paths.


@Frando @flub can you look at this please, I believe the two of you looked at this before

I'm not sure. On a cursory look I think we get NodeAddr for other reasons, e.g. when a user dials with a NodeAddr. And that could be totally out of date and other info could be fresher.

In the multipath branch I currently send the initial packets to all known remotes. So that would mean whatever the working path, it should find it. I think it might need some kind of expiry of paths at some point, but wasn't going to do that immediately.

I've honestly been avoiding thinking too much about how to robustly solve these kind of issues in the current version, because it just slows down multipath work... but if others agree there's an improvement here that's fine.

I'm also unsure, I'll have to dig in a bit more to find out if this could have unintended side effects (ie falsely invalidating good paths).

Arqu · 2025-10-06T13:02:29Z

So what's our path forward here, should I drop the path update and just make sure we survive errors in the loop? That's a "good enough" fix for the flaky tests for now. Just not a proper fix for the underlying issue.

dignifiedquire · 2025-10-06T14:09:43Z

So what's our path forward here, should I drop the path update and just make sure we survive errors in the loop?

Yeah lets do that, and file an issue, with the details that you discovered. And then we reexamine this once multipath has landed

Arqu · 2025-10-07T06:45:23Z

Reverted the path invalidation logic and followed up with an issue #3504

fix: 0rtt flakes

50eb1b7

Arqu self-assigned this Oct 3, 2025

Arqu added the fix Fixes a bug label Oct 3, 2025

Arqu added this to iroh Oct 3, 2025

github-project-automation bot moved this to 🏗 In progress in iroh Oct 3, 2025

Arqu requested review from flub and dignifiedquire October 3, 2025 09:09

Arqu moved this from 🏗 In progress to 👀 In review in iroh Oct 3, 2025

dignifiedquire reviewed Oct 6, 2025

View reviewed changes

dignifiedquire added this to the v0.93.0 milestone Oct 6, 2025

Arqu added 2 commits October 7, 2025 08:39

Merge branch 'main' into arqu/0rtt_flake_fix

a8a3c1c

revert path invalidation changes

110fb33

Arqu mentioned this pull request Oct 7, 2025

0rtt takes 15 seconds to switch to a new path #3504

Open

Arqu requested a review from dignifiedquire October 7, 2025 06:45

dignifiedquire approved these changes Oct 7, 2025

View reviewed changes

Arqu enabled auto-merge October 7, 2025 10:55

Arqu added this pull request to the merge queue Oct 7, 2025

Merged via the queue into main with commit 9e61af5 Oct 7, 2025
77 of 79 checks passed

github-project-automation bot moved this from 👀 In review to ✅ Done in iroh Oct 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: 0rtt flakes #3496

fix: 0rtt flakes #3496

Uh oh!

Arqu commented Oct 3, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 3, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 3, 2025 •

edited

Loading

Uh oh!

dignifiedquire Oct 6, 2025

Uh oh!

flub Oct 6, 2025

Uh oh!

Frando Oct 6, 2025

Uh oh!

Arqu commented Oct 6, 2025

Uh oh!

dignifiedquire commented Oct 6, 2025

Uh oh!

Arqu commented Oct 7, 2025

Uh oh!

Uh oh!

Uh oh!

fix: 0rtt flakes #3496

fix: 0rtt flakes #3496

Uh oh!

Conversation

Arqu commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Breaking Changes

Notes & open questions

Change checklist

Uh oh!

github-actions bot commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dignifiedquire Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

flub Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

Frando Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

Arqu commented Oct 6, 2025

Uh oh!

dignifiedquire commented Oct 6, 2025

Uh oh!

Arqu commented Oct 7, 2025

Uh oh!

Uh oh!

Uh oh!

Arqu commented Oct 3, 2025 •

edited

Loading

github-actions bot commented Oct 3, 2025 •

edited

Loading

github-actions bot commented Oct 3, 2025 •

edited

Loading