Skip to content

Stream stops reading from kernel halfway through in http.IncomingMessage #7910

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dstibrany opened this issue Jul 28, 2016 · 6 comments
Closed
Labels
http Issues or PRs related to the http subsystem.

Comments

@dstibrany
Copy link

We've been investigating a memory leak issue where we have sockets that remain stuck in CLOSE_WAIT. The sockets are being using to pull blob files (~ 250KB - 1MB in size) from Amazon S3. The code we use to pull the data is using the Knox library [https://github.com/Automattic/knox], but it's really just a wrapper around http.ClientRequest.
The code is straightforward and essentially boils down to:

var req = https.request(options);
req.end();
req.on('response', function (read_stream) {
  read_stream.on('data', function (chunk) {
    // buffer data
  }
  read_stream.on('end', function () {
    // call our main callback with buffered data
  }

I cannot reproduce this issue locally, but in production I'll have sockets stuck in this state even 10 minutes after a restart. Also, what I thought was just a memory leak appears to result in us silently not completing client request for the s3 data.

When turning on debug logging of the net.js module and cross referencing it against a tcpdump I was able to find that handle.readStop() is being called [https://github.com/nodejs/node/blob/v0.10.29-release/lib/net.js#L533] during the data transfer. After this, we never end up reading from that socket any further. The amount of data left in the kernel's Recv-Q for that socket (via netstat) is equal to the remainder shown in the tcpdump output. That is, the amount of data that node processed (calculated via https://github.com/nodejs/node/blob/v0.10.29-release/lib/net.js#L504) plus the remainder in the kernel equals the total sent from the remote host.

  • Any idea why readStop is being called, but readStart is not, given that there is still data to read?
  • Is it worth trying to switch from reading the stream via data events to using the readable event with the read() method (streams2); would that even make any difference?
  • Any thoughts on how to further debug this? I've tried to recreate by lowering the stream's highWaterMark to get readStop() to fire, but even if readStop gets called in these tests, the stream always ends up resuming. I've tried hitting S3 with low and high load but still cannot reproduce. Any suggestions?

Thanks,
Dave

@addaleax addaleax added the http Issues or PRs related to the http subsystem. label Jul 28, 2016
@addaleax
Copy link
Member

v0.10 is pretty close to its end of life, can you verify that this is still a problem with Node v4.x or v6.x?

@dstibrany
Copy link
Author

I can't - upgrading isn't really an option presently and unfortunately, I cannot reproduce locally :/

@bnoordhuis
Copy link
Member

Is upgrading to v0.10.46 an option for you? v0.10.36 is already a year and a half old.

We've been investigating a memory leak issue where we have sockets that remain stuck in CLOSE_WAIT.

What does 'remain stuck' mean? It's not unusual for sockets to stay in CLOSE_WAIT for a few minutes. Do you see sockets in TIME_WAIT state?

@dstibrany
Copy link
Author

Upgrading to v0.10.46 is an option for us.

I see no sockets in TIME_WAIT. Regarding CLOSE_WAIT, just now for example, by cross referencing netstat with application logs, I see a socket that's been in CLOSE_WAIT for over 3 hours.

In tcpdump, I'll see a FIN being sent from the remote host, but we never send our own FIN. We also just stop reading from the buffer; I see readStop() being called and after that, we never read any more data.

Any ideas where to look next? I absolutely cannot reproduce locally, but this happens in production all the time. My next idea is to add some custom logging to net.js.

@dstibrany
Copy link
Author

I'm finally able to reproduce locally. This may just actually be an application level issue, not a node issue; I"ll update when I find out more.

@dstibrany
Copy link
Author

Ended up being an application level issue. There was a stream that we were not consuming, so everything makes sense now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
http Issues or PRs related to the http subsystem.
Projects
None yet
Development

No branches or pull requests

3 participants