-
-
Notifications
You must be signed in to change notification settings - Fork 1k
Delays while making requests over a slow network #797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is a weird one. Does it reproduce on other systems or just the pi? |
@reconbot Agreed, it is very unusual. I took some time today to get Debian 8.4.0 set up on a 64-bit ASUS desktop, and was able to reproduce the issue on that machine. (We have also reproduced across multiple Raspberry Pis.) So it appears that it is not isolated to the Pi hardware. |
I had a great suggestion that this is due to the event queue being backed up. The network events might be queued behind serialport events and vice versa. Our |
Thanks to @stevekinney for this idea! |
@reconbot Thanks for your and @stevekinney 's help! Unfortunately the issue still reproduces on both the Raspberry Pi and the desktop machine with I would be happy to perform any debugging tasks you can suggest. |
@bradfol I'll see if I can get more attention for this issue, we're not doing anything that should be effected by network traffic. |
im very curious about this. =) |
@reconbot Thanks for the help. This is an important issue for us, so please let me know anything that can be done to provide additional debug information. @soldair It is an Atmel micro controller (very similar to Arduino) running an echo program. It is isolated to just the USB connection (does not have any other connectivity). |
@bradfol as a work around; Can you do network operations and then serialport operations? eg not at the same time |
i should have time to test later today. I'm sure this effects serial bridging on pinoccios also and i'd like to see them still work =) |
Maybe try something like Of course ... networking is (mostly) done non-blocking in the main thread in Node so it's not clear that this is even related, it's mainly file IO that's done with the worker threads but there are some other minor things. Also @bradfol, can you see how long the |
@rvagg is being in our own threadpool something that as a library I can do or provide options to do? |
@rvagg Thank you for the suggestions. I'm traveling for the next two weeks, but I will definitely test this out as soon as I get back to the office. |
@reconbot yes, libuv provides you with all the primitives you need to manage a custom threadpool. I've never done it before it but it should be relatively straightforward to duplicate the way that Node core does it. It'll add maintenance, testing and debugging burden of course so you'd want to be sure you need it first! |
@rvagg @reconbot I have tested out the above suggestion to specify a different What I have observed is that the thread pool size directly corresponds to the number of data events that come back before delays start occurring. That is, if the thread pool size is 16, exactly 16 Other than this initial number of processed events, changing the thread pool size does not have an impact on the delays that are experienced. |
Well, that's very interesting and probably helps narrow down where the problem is. It might be safe to guess that something here is locking up a worker thread per event. How can this be reproduced minimally by someone else? I don't do much hardware hacking so I'm not sure what would need to be connected if anything to run this. And what does "high latency connection" mean? Are we talking seconds or milliseconds? And what is the correlation between latency and the problem, do blockages resolve at the same rate as the latency? |
@rvagg An arduino uno with our testing echo sketch would be enough for the hardware. (Happy to mail you one if you need.) As for the network... idk. @bradfol your test case grabs a file over http while writing to the port, you wait for the response and start another http session and repeat. I assume even at 9600 we're writing faster than the http file takes to download. After 1k of writes (roughly a second), you'll have 1000 http requests in flight. That is unless you have a delay on the Arduino side or something? |
@rvagg @reconbot We first experienced the problem intermittently as we deployed the Raspberry Pis on different networks we did not control. Now we use a low quality router that reliably creates these network conditions for us, so we could isolate the problem. Notably, there are two important control cases where everything runs fine: both a fast network, and no network connection. On a fast network (reliable < 5ms pings), many requests are created and do not interrupt serial port communication. The request responses come back whenever the actual network request finishes. The serial port communication is definitely far faster than the network, but the network requests do not block the serial port communication, as expected for async operations. When there is no network, the requests fail nearly instantaneously, and again, there is no slow down of the serial port requests. |
@bradfol how many outstanding network requests do you think are in flight during these periods of high latency? |
@reconbot It seems to be limited to the thread pool size. |
That is the limit. In some applications you might only have 1 in flight at a time. What is your app doing? |
@reconbot Are you saying that Node is intended to be limited to the thread pool size worth of network requests? |
@reconbot @rvagg I wanted to provide some more context around this issue, to hopefully better demonstrate the symptoms using a video. Here's the control case video, where things are working normally: https://youtu.be/fB0J4ePF04g And here's a video showing reproduction of the issue: https://youtu.be/EJG-YSo3Oac Hopefully this is helpful to show the extent of the delays. |
I have seen the same problem. It's not node-serialport, you can replace the serialport calls with fs.write and the same delay would happen. The root cause is that the network requests (actually the dns part), serialport, fs all use the same thread pool (default size=4). On fast network network requests finish quickly so the pool drains and allows you use serialport. On no network, network requests error out immediately, and the pool also drains. On slow/unreliable network, network requests (that use dns.lookup) can take a while to return anything, so the pool is saturated and serialport calls are delayed until there's some room in the pool. I have changed my application to only do one network request at a time. It's very cumbersome, but it works. |
Hi all, I work with @bradfol and I've spent the past few days working on a way to reproduce this issue without our router. Thanks to @napalm272's tip, I was able to use filesystem calls instead of serial port calls to reproduce the issue without a serial device. I've created a Vagrant setup that spins up a Debian virtual machine with a simulated high-latency network so we don't depend on a particular router to reproduce the issue. If you'd like to see the slowdown in action, try taking the VM for a spin. Notably, latency on HTTP (port 80) is not sufficient to reproduce the issue. Though I haven't yet isolated which ports need to be delayed to cause the issue, @napalm272's DNS suggestion is particularly intriguing. I'd like to hear more about why DNS is blocking while normal HTTP delays are not. The fact that the issue appears outside of I'm going to continue looking through the |
To clarify my previous comment, the Vagrant setup should be a totally self-contained way to reproduce this issue, with no additional hardware. After additional testing, I can confirm both that our "slow" router is somehow interfering with DNS responses, and that simulating latency on only the DNS port (53) reproduces the issue. I have updated the Vagrant setup to add latency only to port 53. On our router, I have tested the speed of DNS resolution using
As seen above, the timeout takes about 15 seconds, and the simulated latency on port 53 of the Vagrant machine is 10 seconds, to give a good feel for the slowdown. I should note that I'm just using a Linux networking tool called In addition, I have observed that making HTTP requests by IP address rather than by URL in our Node scripts eliminates the slowdown, by bypassing the DNS resolution step. I have added a As an additional data-point, note that we have not been able to reproduce this issue on Mac OSX. When we connect a Mac to our "slow" router, we see the same DNS timeouts; however, the Node scripts manage to make thousands of network requests before the first network responses start coming in. As I mentioned previously, this is the behavior we would expect from a totally async system, and it highlights the Linux implementation of As @bradfol mentioned to me earlier today, |
@ottobonn Take a look at https://nodejs.org/docs/latest-v0.12.x/api/dns.html, towards the bottom, there's a section describing dns.lookup. (This behavior is the same in the newest version of node)
Some other discussion here: nodejs/node-v0.x-archive#2868 When you make the dns port have a 10 second latency, each http request is going to take 10 seconds before it returns, so you will use up the pool very quickly. When you use ip, it bypasses dns.lookup, so you don't have the issue. I see that your test uses the |
I'm going to close this as it's more of a nodejs issue than a serialport issue. We might be able to improve the situation but we don't need this issue for that. |
Uh oh!
There was an error while loading. Please reload this page.
Operating System and Hardware: Raspberry Pi 2 B running Jessie
NodeJS Version: 4.2.6
Serialport version: 3.1.1
EDIT: Also reproduced on ASUS 64-bit desktop running Debian 8.4.0
NodeJS version: 4.4.4
Serialport version: 3.1.1
Problem description
Simultaneous network requests alongside serial port writes can cause large delays to the serial port communication. When the Raspberry Pi is on a high latency connection, the serial port communication can become significantly delayed, usually multiple seconds between writes or data events. All the code should be performed async, so it is unclear why there would be a delay.
Steps and/or code to reproduce the issue
Here is a small test case to reproduce the issue. Running this on a Raspberry Pi connected to a high latency network will cause significant delays. (There is an Arduino on the other side of the serial port connection that sends back responses.)
We were experiencing this issue as part of a larger system, but have isolated it down to
node-serialport
while there are network requests being made. Simply unplugging the network connection (ethernet cable or Wifi adapter, we have tested both) will cause the serial port communications to significantly speed up (dozens of read/writes per second).Further reference
I have run across a couple other examples of people experiencing similar issues:
Any suggestions on how to further isolate or debug this issue would be greatly appreciated.
The text was updated successfully, but these errors were encountered: