Skip to content

store: Retry registerQueue on failure; start testing UpdateMachine.load #561

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 12, 2024

Conversation

gnprice
Copy link
Member

@gnprice gnprice commented Mar 12, 2024

Fixes #556.

In addition to the test this adds, I wanted to manually test it
end-to-end, to help be sure this covered the scenario where this
retry is known to be needed in the wild:

  • The app is offline for a while, perhaps because the device
    is asleep.
  • The app comes online, tries polling, and finds the event queue
    has expired, so it attempts a re-register.
  • Before that completes (which after all takes several seconds
    if the realm is a large one), the app goes offline again.
  • That request's response therefore never reaches the app,
    and so when it eventually comes back online it needs to retry.

Step 1 is annoying to carry out literally, because it means
waiting 10 minutes for the event queue to expire. To work around
that, I sabotaged the getEvents binding function to use a wrong
queue_id value:

'queue_id': RawParameter('wrong' + queueId),

so that the server would always respond with a BAD_EVENT_QUEUE_ID
error, just the same as if the queue had expired.

Then to take the app offline and online again, I just turned
airplane mode on and off on my device. Because I used a
physical device connected to my computer over USB, that caused
no interference to my watching the logs on the console.

In my manual testing, the retries worked perfectly: no matter
how many times I turned airplane mode on and off, or with what
timing, the app always returned to getting a fresh queue and
polling it for events.

Copy link
Collaborator

@chrisbobbe chrisbobbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM; please merge at will after seeing a small comment on a comment.

Comment on lines 167 to 170
// TODO UpdateMachine.debugPauseLoop is too late to prevent first poll attempt;
// because of the polling retry, that doesn't fail the test, but
// it clobbers the recorded registerQueue request so we can't check it.
// checkLastRequest();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I don't yet understand the "doesn't fail the test" part. If the polling retry were removed, would the test fail, and if so, why?

I notice that the test does fail if I uncomment this line, and I think that makes sense; it's explained in your comment by "it clobbers […]".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a revised version; does this successfully explain it?

      // TODO UpdateMachine.debugPauseLoop is too late to prevent first poll attempt;
      //    the polling retry catches the resulting NetworkException from lack of
      //    `connection.prepare`, so that doesn't fail the test, but it does
      //    clobber the recorded registerQueue request so we can't check it.
      // checkLastRequest();

Specifically if you bypass the retry like so:

           case Server5xxException() || NetworkException():
+            rethrow;

then this test does fail. (As do the two tests specifically for that retry.)

@gnprice gnprice force-pushed the pr-retry-register branch from f4dc515 to f8c626d Compare March 12, 2024 18:28
gnprice added 5 commits March 12, 2024 11:28
This avoids spewing the message to the console when running tests,
once we add a test that exercises this code.
This will be useful for testing [UpdateMachine.load].
In addition to the test this adds, I wanted to manually test it
end-to-end, to help be sure this covered the scenario where this
retry is known to be needed in the wild:
 * The app is offline for a while, perhaps because the device
   is asleep.
 * The app comes online, tries polling, and finds the event queue
   has expired, so it attempts a re-register.
 * Before that completes (which after all takes several seconds
   if the realm is a large one), the app goes offline again.
 * That request's response therefore never reaches the app,
   and so when it eventually comes back online it needs to retry.

Step 1 is annoying to carry out literally, because it means
waiting 10 minutes for the event queue to expire.  To work around
that, I sabotaged the getEvents binding function to use a wrong
`queue_id` value:

    'queue_id': RawParameter('wrong' + queueId),

so that the server would always respond with a BAD_EVENT_QUEUE_ID
error, just the same as if the queue had expired.

Then to take the app offline and online again, I just turned
airplane mode on and off on my device.  Because I used a
physical device connected to my computer over USB, that caused
no interference to my watching the logs on the console.

In my manual testing, the retries worked perfectly: no matter
how many times I turned airplane mode on and off, or with what
timing, the app always returned to getting a fresh queue and
polling it for events.

Fixes: zulip#556
@gnprice gnprice force-pushed the pr-retry-register branch from f8c626d to b19154a Compare March 12, 2024 18:28
@gnprice gnprice merged commit b19154a into zulip:main Mar 12, 2024
@gnprice
Copy link
Member Author

gnprice commented Mar 12, 2024

Thanks for the review! Merged, with that change to the comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Retry registerQueue
2 participants