Skip to content

Conversation

colinlyguo
Copy link
Contributor

@colinlyguo colinlyguo commented Jan 9, 2023

Purpose or design rationale of this PR

Symptom:

  1. Coordinator offline.
  2. Roller submits and fails.
  3. Coordinator runs again.
  4. Roller connects again.
  5. Proof session timeouts (because the task is popped and discarded, even if the submission fails).
  6. The coordinator won't assign a new task to the roller.

Analysis:
In the above scenario, the coordinator does not run freeTaskIDForRoller since the roller discards the task itself and never submits the proof. Then the coordinator would always treat the roller as not idle (won't assign a new task to the roller).

Fix:
Run freeTaskIDForRoller when proof session timeouts.

Does this PR involve a new deployment, and involve a new git tag & docker image tag? If so, has tag in common/version.go been updated?

Yes, and updated.

Is this PR a breaking change? If so, have it been attached a breaking-change label?

No


@colinlyguo colinlyguo changed the title fix: reset roller state when session timeout fix: reset roller state when proof session timeout Jan 9, 2023
@0xmountaintop
Copy link
Contributor

let also bump the version number

Thegaram
Thegaram previously approved these changes Jan 9, 2023
@colinlyguo
Copy link
Contributor Author

The reset may cause the coordinator to assign tasks to slow provers, thus the tasks assigned to these slow provers may be delayed.

@colinlyguo colinlyguo closed this Jan 10, 2023
@0xmountaintop 0xmountaintop deleted the fix/reset-roller-state-when-session-timeout branch January 11, 2023 05:27
@colinlyguo colinlyguo restored the fix/reset-roller-state-when-session-timeout branch January 18, 2023 10:05
@colinlyguo colinlyguo reopened this Jan 18, 2023
@colinlyguo colinlyguo changed the title fix: reset roller state when proof session timeout fix(coordinator): reset roller state when proof session timeout Jan 18, 2023
@colinlyguo
Copy link
Contributor Author

TODO: this is a temporary fix, will replace coordinator reset roller state when timeout with roller reports healthy status.

@colinlyguo colinlyguo force-pushed the fix/reset-roller-state-when-session-timeout branch from 534591d to 372a1fd Compare January 18, 2023 11:36
@ChuhanJin
Copy link
Contributor

46% (+0.21%) vs master 46%

@ChuhanJin
Copy link
Contributor

46% (+0.21%) vs master 46%

@colinlyguo colinlyguo requested a review from Thegaram January 18, 2023 14:03
@ChuhanJin
Copy link
Contributor

46% (-0.06%) vs master 46%

@colinlyguo colinlyguo force-pushed the fix/reset-roller-state-when-session-timeout branch from 4780a4d to dbb841a Compare January 19, 2023 06:01
@ChuhanJin
Copy link
Contributor

46% (-0.06%) vs master 46%

@colinlyguo colinlyguo requested a review from mask-pp January 19, 2023 06:25
0xmountaintop
0xmountaintop previously approved these changes Jan 19, 2023
mask-pp
mask-pp previously approved these changes Jan 19, 2023
@colinlyguo colinlyguo dismissed stale reviews from mask-pp and 0xmountaintop via 938bafa January 19, 2023 08:10
@ChuhanJin
Copy link
Contributor

46% (0.0%) vs master 46%

0xmountaintop
0xmountaintop previously approved these changes Jan 19, 2023
@colinlyguo colinlyguo requested a review from mask-pp January 19, 2023 12:49
@ChuhanJin
Copy link
Contributor

46% (+0.03%) vs master 46%

@ChuhanJin
Copy link
Contributor

46% (+0.03%) vs master 46%

@scroll-dev scroll-dev merged commit 18fd7f5 into staging Jan 19, 2023
@scroll-dev scroll-dev deleted the fix/reset-roller-state-when-session-timeout branch January 19, 2023 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants