Skip to content

net: timed out socket blocks process termination #795

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
silverwind opened this issue Feb 11, 2015 · 20 comments
Closed

net: timed out socket blocks process termination #795

silverwind opened this issue Feb 11, 2015 · 20 comments
Labels
net Issues and PRs related to the net subsystem.

Comments

@silverwind
Copy link
Contributor

It seems that when a unref()'ed socket runs into a timeout, the program keeps running for 30 seconds after the event, even when calling process.exit. I reproduced it with below example, on OS X with WiFi (my only network connection) disabled:

var net = require("net");
var socket = new net.Socket();
socket.unref();
socket.setTimeout(1000, function() {
    console.log("timeout");
    process.exit(); // does not work 
});
socket.on("error", function (err) {
    console.log(err);
    process.exit(); // does work
});
socket.connect(80, "does.not.exist");

With Wifi enabled, the socket runs the error callback and the process exits immediately as expected. I also attempted to release the socket with socket.destroy() and socket.end() but that didn't have any effect.

I found these related commits:

nodejs/node-v0.x-archive@f46ad01
nodejs/node-v0.x-archive@a846d93

The issue exists in the 0.10 branch too.

@silverwind
Copy link
Contributor Author

Here's some debug:

WiFi on:

$ time NODE_DEBUG=net,timer iojs try
TIMER 57082: unrefList initialized
TIMER 57082: unrefTimer initialized
TIMER 57082: unrefList empty
TIMER 57082: unrefTimer scheduled
NET 57082: pipe false undefined
TIMER 57082: unrefList empty
TIMER 57082: unrefTimer scheduled
NET 57082: connect: find host does.not.exist
NET 57082: connect: dns options [object Object]
error
NODE_DEBUG=net,timer iojs try  0.14s user 0.02s system 64% cpu 0.252 total

WiFi off:

$ time NODE_DEBUG=net,timer iojs try
TIMER 57038: unrefList initialized
TIMER 57038: unrefTimer initialized
TIMER 57038: unrefList empty
TIMER 57038: unrefTimer scheduled
NET 57038: pipe false undefined
TIMER 57038: unrefList empty
TIMER 57038: unrefTimer scheduled
NET 57038: connect: find host does.net.exist
NET 57038: connect: dns options [object Object]
TIMER 57038: unrefTimer fired
TIMER 57038: unreftimer firing timeout
NET 57038: _onTimeout
timeout
NET 57038: destroy undefined
NET 57038: destroy
TIMER 57038: unenroll
NET 57038: close
NET 57038: close handle
NET 57038: onSocketFinish
NET 57038: oSF: ended, destroy { objectMode: false,
  highWaterMark: 16384,
  buffer: [],
  length: 0,
  pipes: null,
  pipesCount: 0,
  flowing: null,
  ended: false,
  endEmitted: false,
  reading: false,
  sync: true,
  needReadable: false,
  emittedReadable: false,
  readableListening: false,
  defaultEncoding: 'utf8',
  ranOut: false,
  awaitDrain: 0,
  readingMore: false,
  decoder: null,
  encoding: null }
NET 57038: destroy undefined
NET 57038: destroy
NET 57038: already destroyed, fire error callbacks
[30 seconds nothing]
NODE_DEBUG=net,timer iojs try  0.17s user 0.03s system 0% cpu 30.210 total

edit: added net debug

@silverwind
Copy link
Contributor Author

I think there are actually two issues here:

  1. Shouldn't process.exit() be an unconditional process termination?
  2. There's a bug somewhere in net or timer that blocks the termination.

@silverwind silverwind changed the title net: socket.unref() not working on timeout net: timed out socket blocks process termination Feb 11, 2015
@trevnorris trevnorris added the net Issues and PRs related to the net subsystem. label Feb 11, 2015
@sam-github
Copy link
Contributor

I couldn't reproduce, when I turn my wifi (only net connection) off on OS X, I hit the error path with ENOTFOUND. I'm not saying its not a bug, but its not trivially reproducible.

@silverwind
Copy link
Contributor Author

@sam-github I think what you're seeing is the local dns cache on your mac (which is probably disabled on my machine). Try this between runs: http://support.apple.com/en-us/HT202516

@sam-github
Copy link
Contributor

Sorry, didn't help.

@silverwind
Copy link
Contributor Author

Give it one more try with "github.com" instead of "does.not.exist". After one run, disable WiFi and try once more.

@sam-github
Copy link
Contributor

sam@sambook:~/w/core/io.js (v1.x % u=) % ./iojs io-795.js
timeout

<turn off wifi>
sam@sambook:~/w/core/io.js (v1.x % u=) % ./iojs io-795.js
error { [Error: getaddrinfo ENOTFOUND github.com github.com:80]
  code: 'ENOTFOUND',
  errno: 'ENOTFOUND',
  syscall: 'getaddrinfo',
  hostname: 'github.com',
  host: 'github.com',
  port: 80 }
sam@sambook:~/w/core/io.js (v1.x % u=) % sudo discoveryutil mdnsflushcache                      
Password:
sam@sambook:~/w/core/io.js (v1.x % u=) % ./iojs io-795.js                 
error { [Error: getaddrinfo ENOTFOUND github.com github.com:80]
  code: 'ENOTFOUND',
  errno: 'ENOTFOUND',
  syscall: 'getaddrinfo',
  hostname: 'github.com',
  host: 'github.com',
  port: 80 }

@silverwind
Copy link
Contributor Author

Hrm, must be some special condition on my machine that I don't run into any error with WiFi down. I also tried connecting to an IP, but that gave me a quick ENETUNREACH.

@bnoordhuis
Copy link
Member

Can it be that this is nodejs/node-v0.x-archive#7077 or nodejs/node-v0.x-archive#7149?

@silverwind
Copy link
Contributor Author

nodejs/node-v0.x-archive#7149 could be it, given that it errors correctly on an IP but not an hostname. Your mentioned 5-30 sec glibc timeout could be what I'm seeing.

Anything that speaks against making process.exit kill the event loop with it?

@sam-github
Copy link
Contributor

@silverwind
Copy link
Contributor Author

Are you sure you are calling it?

Yes, I get the exit event too:

var socket = new require("net").Socket();
process.on("exit", console.log.bind(console, "exit"));
socket.unref();
socket.setTimeout(1000, function() {
    console.log("timeout");
    process.exit();
})
socket.connect(80, "does.not.exist");

which prints

timeout
exit 0

and hangs 29 seconds afterwards.

Maybe the libuv call happens in another thread which keeps the main threat from exiting? (You see, I have no idea how this works)

@bnoordhuis
Copy link
Member

That's correct. Libuv shuts down the thread pool gracefully on exit. In your test case however, one of the threads is still locked up in the libc getaddrinfo() call.

That's intentional, by the way. Force-killing a thread may lead to unpredictable behavior.

@silverwind
Copy link
Contributor Author

That's intentional

Makes sense, but I still think there should be a way to force termination if I don't care about pending operations. How about something like process.exit(code, {force: true})? I'd gladly have a shot at doing a PR if we are in agreement that this is needed.

@bnoordhuis
Copy link
Member

What I mean with 'unpredictable' is that force-killing threads is a great way to crash the process or corrupt data.

@silverwind
Copy link
Contributor Author

Are there less dangerous ways to signal all libuv (or or threads) to stop whatever they are doing? Does libuv support something like SIGTERM?

@bnoordhuis
Copy link
Member

Not really. Sending signals wouldn't really help, not in general, because whatever operation is ongoing might not be interruptible.

POSIX threads have the concept of cancellable operations (see man pthread_cancel) that could help in this particular case but is fraught with dangers in the general case. After calling pthread_cancel(), the thread is killed when it reaches a cancellation point, like a call to pthread_testcancel() or some libc functions.

It requires that you carefully write your application code around the fact that execution may cease at almost any point. The thread pool is a general purpose pool and may execute user code that is not written with that assumption in mind.

It would require splitting the thread pool in two, one for user tasks, the other for libuv internal tasks (at least, conceptually; in practice, it would probably be a per-thread status flag.) It's not a bad idea and not just because of this issue but it's a lot of work.

Also, it won't help on Windows. To the best of my knowledge, Windows only has TerminateThread() and that's like sending a SIGKILL to a thread.

@silverwind
Copy link
Contributor Author

Alright, thanks for the detailed explanation.

Maybe the route of adding unref support for dns requests in libuv (nodejs/node-v0.x-archive#7149) proves simpler to implement. Though, this is far out of my league so hopefully someone finds time to add it.

@silverwind
Copy link
Contributor Author

closing this issue here in favor of libuv/libuv#203, because both feasible solutions depend on changes in libuv.

@silverwind
Copy link
Contributor Author

@bnoordhuis maybe you can open a second libuv issue for the pthread_cancel thing, as I'm having a hard time describing it correctly :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
net Issues and PRs related to the net subsystem.
Projects
None yet
Development

No branches or pull requests

4 participants