-
Notifications
You must be signed in to change notification settings - Fork 309
Slow sometimes to replace event queue after expiry #809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This morning I observed what might be the same issue. Specifically:
One new piece of information is that the loading indicator from #852 hasn't appeared. So that should help narrow down what's happening. Another is that I went promptly over to my computer and looked at logcat. Here's what logcat showed, filtered to the com.zulip.flutter package:
So those first two entries are from when I was still asleep, before I observed the issue. And the last one is from shortly after I observed it. A similar "visibilityChanged" line appears again if I swipe the app into the background. In other words, there's nothing in logcat — at least nothing logged by our package. Sometimes logcat has relevant entries logged by system packages, but I didn't capture those; by the time I thought to broaden the logcat filter a couple of minutes later, the log buffer was overrun by its normal background noise. It's likely that if this build were logging all the things we log in debug mode, then there'd be something interesting in it. So one possible next step would be to have the release build on my device start logging more. This requires a bit of care because I don't want a normal release build (the ones we publish for users) to log all the things that our debug builds log with I've also made the logcat buffer on my device a lot bigger — it turns out that, very sensibly, the default buffer size is small because most users will never read it, but with
So that'll make it easier to catch any relevant lines that other packages may be emitting, as well as any that we add (or if I happen to spot the issue while using a debug build, then the log lines we'd already be emitting in that case). |
Rereading [UpdateMachine.poll], I think what this has to mean is probably that we threw an exception from within That represents a bug, but we can and should recover from it: we should show the user an error dialog (for the beta), and then replace the store the same way we do when the event queue expires. (As described by #563.) |
… Nope, false alarm. The message that was in the notification doesn't appear in a fresh reload either, nor in web. Presumably the message was just deleted. Still this is a thing we should do, though, in case we do get an exception in
And now my logcat buffer is bigger, which will help in any future debugging on my device. |
Well, sadly this didn't stick: the buffer size is back to 256 kiB. There's a setting at Developer settings > Logger buffer sizes, which I've now gone and set to "8M per log buffer", the largest option. These SO answers (both on the same question) were helpful: After changing that,
but So I also ran |
This comment was marked as off-topic.
This comment was marked as off-topic.
Let's continue the logcat thread in chat: |
I don't recall having seen this a second time yet, since my original report in July. (I had a second report above in August but then found at #809 (comment) that it was unrelated.)
|
Anyway. Having not seen this happen again, I'll push it out to the post-launch milestone, for the same reasons as I just described for that other similar-flavored issue: #979 (comment)
|
Well, having said that just yesterday: This morning @chrisbobbe encountered what seems like potentially the same issue, as described at #514 (comment) . Then just a few minutes ago, I also saw those same symptoms! In my case:
Edit: Also the version I was running had a previous revision of #868 in it, which would show the same notices as the later merged version of #868 does plus a toast when replacing a stale event queue. I don't think I got any toasts when I went back to the app. |
In the case Chris saw:
OTOH I guess one difference from my original report above is that on that occasion:
That suggests these could be separate issues. Then again, as I said there, that belated recovery is so puzzling that I'm not sure it wasn't some kind of error in my observation. Anyway, I'll bump the priority of that issue that tracks adding a timeout here: because that may effectively resolve this issue for us. |
When the app resumes after having been in the background for a while, or after the device was offline, its event queue on the server may have expired. (The server does this after 10 minutes of not getting any get-events requests for that queue.) Normally this is fine because we just make a new one: that was #185 / #466.
That mechanism does work, because I regularly see it working. But at least some of the time, it doesn't. I observed it failing to work this evening; details below.
I think the next step for debugging this issue is: keep an eye out for the symptom, and if you see it happen then try to get promptly to a computer — like within a few minutes — to take a look at the log. It's likely there will be something telltale there, even on a release build; if one of us happens to catch it on a debug build, there will almost certainly be good information there.
Related issues
When we do lack a queue, we should flag that to the user, so that we don't misleadingly present stale data as current. Even if we had no bugs of this kind, that'd be important for the case where a queue is lacking because the network connection is weak; but it's especially important in the presence of a bug like this one.
Show "Connecting…" banner (or equivalent) when server data is stale #465
Issue for showing more feedback which would potentially help us track this issue (and other yet-unknown issues like it) down:
Show detailed poll-failure feedback, in beta #555
Issue that would mitigate symptoms of some potential bugs of this kind, possibly including this one (depending what the cause turns out to be):
Comprehensively retry poll/register even on unforeseen errors #563
Detailed report
Specifically, earlier this evening I opened up the app after it was in the background, and for a long time it didn't recover a live event queue. I replied to a message, and the message didn't appear in the existing message list, but did appear when I navigated to another message list that should contain it — which is the telltale symptom of a lack of an event queue. (Given the absence of #465.)
I was at home, on wifi, and had no trouble loading web pages while sitting at the same spot before and after this.
I left the app open for a while, with the device unlocked on the table, because I was curious how long the situation would persist. It lasted for at least 20 minutes: I kept looking down at the screen occasionally to check, and it was still at the "Combined feed" message list and still didn't have the new message.
On the other hand it didn't last forever! A while after that, I looked and the new message had indeed appeared. I'm about 97% sure I hadn't done anything that would cause it to appear short of a restored event queue: didn't navigate out of the screen and back in, and didn't kill the app and restart. I have some doubt only because it seems strange that an issue would cause the app to take over 20 minutes to do this, and yet not just prevent it completely from succeeding.
The text was updated successfully, but these errors were encountered: