-
Notifications
You must be signed in to change notification settings - Fork 615
NodeHttp2Handler does not handle session failures #1525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
NodeHttp2Handler
does not handle session failures
I believe that a patch to fix this would look something along these lines? Note that I haven't tested or even run this - anything involving http/2 would probably require multiple unit tests - node's http/2 implementation has some edge cases in older node versions that can lead to crashes (possibly related to closing a session while it's in the process of closing) --- a/packages/node-http-handler/src/node-http2-handler.ts
+++ b/packages/node-http-handler/src/node-http2-handler.ts
@@ -107,14 +107,31 @@ export class NodeHttp2Handler implements HttpHandler {
const newSession = connect(authority);
connectionPool.set(authority, newSession);
+ const destroySessionCb = () => {
+ this.destroySession(authority, newSession);
+ };
+ newSession.on('close' , destroySessionCb); // probably redundant?
+ newSession.on('goaway' , destroySessionCb);
+ newSession.on('error' , destroySessionCb);
+ newSession.on('frameError' , destroySessionCb);
const sessionTimeout = this.sessionTimeout;
if (sessionTimeout) {
- newSession.setTimeout(sessionTimeout, () => {
- newSession.close();
- connectionPool.delete(authority);
- });
+ newSession.setTimeout(sessionTimeout, destroySessionCb);
}
return newSession;
}
+
+ /**
+ * Destroy a session and remove it from the http2 pool.
+ *
+ * This check ensures that the session is only closed once
+ * and that an event on one session does not close a different session.
+ */
+ private destroySession(authority: string, session: ClientHttp2Session): void {
+ if (this.connectionPool.get(authority) === session) {
+ this.connectionPool.delete(authority);
+ session.close();
+ }
+ }
} I'm moderately familiar with node's http/2 implementation - I'd proposed a fix for a similar bug in https://github.com/parse-community/node-apn/ in parse-community/node-apn#27 - the maintainers went with a different approach inspired by that code, an HTTP/2 client for APNs(Apple Push Notification Service), to tolerate spurious connection errors to HTTP/2 I'm not sure which approach you planned to take, but feel free to use this patch or base code off of the unit tests I wrote for https://github.com/parse-community/node-apn/pull/27/files#diff-ecc2f3b14c967f02fd074ac2d0e0b876e683c84f2755d251bbaa74a813a82130 (e.g. createAndStartMockServer, createAndStartMockLowLevelServer) - but I assume aws's sdk already has something similar I'd also been involved in setting up tests for https://github.com/parse-community/node-apn/blob/master/test/client.js
Also, what's the expected impact? I'd been assuming that a spurious aws server error or networking error may lead to prompt errors (and the setTimeout not being reached), meaning that a failed connection would result in all subsequent requests to aws for that
I think it should be done earlier, when the first error is encountered - close is only called after the session is finished being destroyed, which would only happen after all requests on that http2 connection had completed https://nodejs.org/api/http2.html#http2_event_close
|
The linked client also checks the |
fix: detect errors on the NodeHttp2Handler, immediately destroy connections on unexpected error mode, and reconnect. Prior to this PR, if the server sent the client a GOAWAY frame, the session would not be removed from the connection pool and requests would fail indefinitely. This tries to avoid keeping streams(requests) on the Http2Session (tcp connection) from being stuck in an open state waiting for a gentle close even if there were unexpected protocol or connection errors during the request, assuming http2 errors are rare. (if a server or load balancer or network is misbehaving, close() might get stuck waiting for requests to finish, especially if requests and sessions don't have timeouts?) I'm only slightly familiar with http/2 client implementations from working on clients for Apple Push Notification Service. - In those, the client could rely on a session and request timeout existing, so close() would finish. In aws-sdk-js-v3, timeouts are optional. - I've seen some strange race conditions prior to node 12 in different client implementations for close and/or destroying Fixes aws#1525
fix: detect errors on the NodeHttp2Handler, immediately destroy connections on unexpected error mode, and reconnect. Prior to this PR, if the server sent the client a GOAWAY frame, the session would not be removed from the connection pool and requests would fail indefinitely. This tries to avoid keeping streams(requests) on the Http2Session (tcp connection) from being stuck in an open state waiting for a gentle close even if there were unexpected protocol or connection errors during the request, assuming http2 errors are rare. (if a server or load balancer or network is misbehaving, close() might get stuck waiting for requests to finish, especially if requests and sessions don't have timeouts?) I'm only slightly familiar with http/2 client implementations from working on clients for Apple Push Notification Service. - In those, the client could rely on a session and request timeout existing, so close() would finish. In aws-sdk-js-v3, timeouts are optional. - I've seen some strange race conditions prior to node 12 in different client implementations for close and/or destroying Fixes aws#1525
Is this below error from It is happening after the bot session timeout time, even though I am on the putSession call !!
Any simple reply would be appreciated. |
That seems likely to be related to this - the intent of the PR was to fix issues like that when http/2 was used (reconnect for destroyed sessions) so that I could safely use clients such as dynamo (and various others) with http/2 The PR you linked wasn't merged yet and will likely be fixed by that PR - 3.14.0 was published on April 30th and the PR was merged 5 days later on May 5th.
I'm an external contributor to AWS, not a maintainer, so I can't publish anything, but releases seem to typically be on fridays?
I'd want to have it properly tested however the aws team normally tests changes rather than rushing to merge it out |
Closing as the fix was released in v3.15.0. Please reopen or comment if the issue still exists. |
@trivikr : I am still able to reproduce this issue. And it is
Demo git project to reproduce this:
|
I'm not sure if this is actually the issue (I'm not familiar with client-lex-runtime-v2 and don't work for aws/amazon), but I suspect it may be related to the load balancer idle timeout.
|
@hari-007 in case it's an issue that only happens on older node versions, what node major&minor version (node --version) are you using in production? |
This seems like it may have been overly broad - the implementation was based on another one mentioned earlier in the link - https://github.com/hisco/http2-client/blob/12a9e6fa6701a46e92e9b7adf395ce646b2a26b4/lib/request.js#L306-L320 I was assuming the http2 client wouldn't register its own listeners, but I'm thinking that assumption is a mistake - https://nodejs.org/api/events.html - if it was registering its own listeners before/after aws-sdk registered one or during shutdown to suppress thrown exceptions, then removeAllListeners would cause issues |
@hari-007 Is that screenshot the output from the console.log you added? Do subsequent requests to (someone with more time could run this themselves and figure it out) It's likely you only know if a connection is reset once you actually try to send data and the remote tcp server sends a TCP RST in response
A possible mitigation to keep connections alive would be to call A project I've contributed to had used this approach for push http/2 connections to a different service for push notifications - https://github.com/parse-community/node-apn/blob/c0d2bfb714fd2c428cf84654229d69783ba0a9c6/lib/client.js#L35-L45 However, I'm not sure if the lack of ping is deliberate on the part of aws (to make resource leaks less likely if an application doesn't reuse agents), though it'd be convenient to have an option to enable ping with a configurable timeout when enabling a NodeHttp2Handler (and clearInterval when destroying the handler) |
@TysonAndre :: Much appreciate your quick response. I was just mentioned you to thank you for the reply, but you took the charge of the issue and provided a detailed analysis again. Thanks for this.
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread. |
Uh oh!
There was an error while loading. Please reload this page.
Describe the bug
NodeHttp2Handler
pools itsHttp2Session
s per authority. AnHttp2Session
can be closed for a variety of reasons, including those outside of the control of the client. (See the docs for thegoaway
event which can be triggered by the server at any time.)Whenever a session is closed, it is invalid for further use, but
NodeHttp2Handler
only invalidates its cache for sessions it has closed itself. This is insufficient - if the server triggers node to close the session for any reason, theNodeHttp2Handler
is invalid for that authority forever.SDK version number
@aws-sdk/[email protected]
Is the issue in the browser/Node.js/ReactNative?
node
Details of the browser/Node.js/ReactNative version
To Reproduce (observed behavior)
The repro steps and context for this issue are the same as in #1524.
Expected behavior
NodeHttp2Handler
should invalidate its cache on theclose
event for anyHttp2Session
. Compare to another popular client:https://github.com/hisco/http2-client/blob/12a9e6fa6701a46e92e9b7adf395ce646b2a26b4/lib/request.js#L299-L303
The text was updated successfully, but these errors were encountered: