-
-
Notifications
You must be signed in to change notification settings - Fork 67
Request timeout #29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request timeout #29
Conversation
@TysonAndre what do you think? |
I'm not sure if you've seen this, but https://github.com/parse-community/node-apn/pull/27/files#diff-7e3bd8e7059a21bfd255649cbfc84b33 added various integration tests for the client properly handling some of the edge cases. You may want to copy those to make sure that the code works as intended on the supported node versions. |
Looking at it again,
The node library seems fairly low level and seems to be leaving applications to detect dead connections and reconnect. The golang library seems more mature (This is a completely different language, but looking at other languages may help in thinking up robust solutions). In it, it internally maintains a connection pool. In order to detect dead connections, it sends a ping frame and checks if any ping response(pong) was sent within 15 seconds of that ping. https://github.com/golang/go/blob/f8d80977b784fd4879963e61dc9fca1fc9bf2193/src/net/http/h2_bundle.go#L7202-L7214
(In my experience, https://github.com/uniqush/uniqush-push/ (https://github.com/uniqush/uniqush-push/blob/master/srv/apns/http_api/processor.go) has been relatively stable and these issues have been new when working on a project using node-apn, but that may just be a consequence of other factors, such as golang's http2 library vs node's. It could also be some other factor like the OS or the system load being lower) |
I’m currently using @parse-community/node-apn version 3.1.0. I came here because I also encountered an issue with the Promise not resolving after a timeout. I enabled debug logging and got the following logs when I was having this problem
When looking into this pull request, it's not really the root-cause of the issue, but I did notice a few problems.
There also didn't seem to be any recovery process from any errors that could occur during the health check.
|
Hi guys. I believe that there is a lot to be improved here and the idea of managing a pool of connections would be the best. But, at least for my case, this PR looks correctly addressing the issue with stuck pushes. Time to time the pushes were stuck and my understanding now is that it happens because if for some reason Apple do not respond to the request, the request is never closed and the promise never resolved. I tested this new PR and the problem stopped to happen. I've sent about 1M push notifications. I also enabled the debug logs while sending the pushes and I've noticed that some of them timed out but the process resumed smoothly for the next requests. I'm only afraid to start addressing a lot of different issues (most of them regarding the legacy code) in this PR and we turn out taking a lot of time to solve the main issue. So my suggestion is: we list the showstopper items we should fix before merging this PR, fix them, and address the other ones in future PRs. What do you think we really should fix for merging this PR? Also please feel free to either commit suggested changes in this PR or open new PRs with other issues. It would also be good if few of you could test this branch in your environment as well. |
@dplewis @funkenstrahlen would you mind to take a look here as well? I wanted to have it merged for the next version of parse server. |
The test case should have caught the |
I've included the tests. I've just removed the one that tries to simulate a loaded server. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall.
Would prefer weakening the test of an overloaded server (e.g. only check that all promises resolve) over removing the test due to issues (in the tested code?) but not a blocker since this is a large PR
In this case, the test was simulating a case in which apns servers would be delaying more and more over time but continuing to send successful responses. I am not sure if it can happen in the real world and the current PR is not addressing this issue. If something delays more than the timeout threshold it simply cancels the request and do not try to predict if it is the server slowing down. I'd go as is for now and try to improve later. In fact, today I've notice that one of my batches had a lot of timeouts in sequence and eventually resumed. The good news is that the process didn't stuck and the server continued sending pushes normally but in fact for some reason apns servers have intermittent bad performance in some times of the day for approximately the last 30 days and that's probably the reason why we started having problems. But I'm not sure which would be the best approach to overcome this scenario. Maybe recycle the connection after X timeouts? Maybe retry the timeout requests after some time? But i'm not sure if only wait more is the right way to solve that. I believe those requests are never responded and that's why the process was stuck. |
Just to give you feedback. We have been using this PR in production since last Thursday with success. The hanging problem seems to be resolved. Thanks for your work. |
It is good to know! Guys, if no other concern arises, I am ready to mode forward in merging this PR. Please let me know. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM other than the event listener leak when the environment variable DEBUG=apn
is set
The latest 5 commits seem reasonable; I haven't tested them but I don't expect any issues |
…eady resolved due to an error
LGTM for the latest changes, logs seem reasonable for the test output with DEBUG=apn for common use cases |
Nice. Thank you so much for all your help! And let's continue addressing the other ideas you've raised. Feel free to open new PRs as well. |
No description provided.