-
Notifications
You must be signed in to change notification settings - Fork 781
Windows Open SSH Server cannot support more than 50 concurrent ssh sessions (posix_spawn failing) #1096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm just now getting familiar with this code but I suspect this could be due to the following hard-coded limitation in signal_sigchld.c where MAX_CHILDREN is 50. If true, it seems like this value could safely be larger.
|
@NoMoreFood you're right. Ultimately, all these child process handles will be fed in a WaitForMultipleObjects call (as part of wait_for_any_event()) and that Win32 API call has a limitation of 64 handles. I had to pick a number less than this to accommodate other events (that of async accept and connect calls). Going beyond these limits will result in multi threaded POSIX compat layer that will make it more complex and less resilient. Related to #191 Would love to hear if you have any suggestions to improve this. |
@NoMoreFood @manojampalam
Is there a fix/workaround for this? This seems to be a huge miss since an
SSH server is expected to server numerous concurrent connections.
…On Tue, 6 Mar 2018 at 11:37 PM Manoj Ampalam ***@***.***> wrote:
@NoMoreFood <https://github.com/nomorefood> you're right.
Ultimately all these child process handles will be fed in a
WaitForMultipleObjects call (as part of wait_for_any_event()) and that
Win32 API call has a limitation of 64 handles. I had to pick a number less
than this to accommodate other events (that of async accept and connect
calls).
Related to #191 <#191>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1096 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AjZY6EHvgPg7mVoZpfpMphnGylsCjxcAks5tb3H_gaJpZM4Se_Fs>
.
|
@KaizerT Not without some additional programming. There are ways to work around the limitations that @manojampalam is describing, but it's not an "easy" change. I'm new to this project and have already submitted two pull requests for other issues. Depending on how those are received, maybe I'll take on this one next. |
@NoMoreFood |
I've set up a linux box for this instead to resolve this issue |
@manojampalam I couldn't really think of a great way to come through that limitation other than a pretty significant overhaul of how process handling is being done. However, I was able to create a [almost] drop-in replacement for WaitForMultipleObjects() that handles an arbitrary number of objects. Tested on my computer to 5,000 threads with no issues. About 175 lines long. Interested? |
@NoMoreFood, absolutely, please share. |
@manojampalam See attached. I can wrap this into a pull request of course. Just need to replace the normal defines in the calling code that are used to identify the array offset with the *_ENHANCED versions. Let me know what you think. -- Attachment Removed See Commit Referenced Below -- |
It'd be great if you guys can come up with a release with @NoMoreFood 's fix in the near future. I'm doing a lot of finessing to make this work with the current set up I have. |
@NoMoreFood , your implementation looks promising. I believe it would be a good next step to create a stand alone project and unit test it thoroughly (you may borrow OpenSSH unit test framework, so we can integrate it into OpenSSH at a later point seamlessly). We specifically want the synchronization logic to be tested thoroughly. This logic is very critical since any ensuing E2E deadlocks will be pretty much impossible to debug. The specific requirements for wait_for_multipleobjects_enhanced, from wait_for_any_event point of view:
Thanks for your support and contributions. |
@manojampalam Sounds good. I know there are some areas that could be problematic, but hopefully aren't relevant in our case and I will do my best to verify that. For example, you could have a race condition where multiple signals are signally simultaneously and, in that case, only the first signal will be returned to the caller. I think this is really only a problem for objects that automatically reset after being signaled. I'll do additional testing in a standalone project and let you know how it goes. |
@manojampalam Actually now that I look at the documentation better, that simultaneous issue may actually exist with the WaitForMultipleObjects() anyhow... but regardless, I'll look into it. How does one use your "OpenSSH unit test framework"? |
Right, we only deal with manual reset events, so you should be fine there. For unittest, take a look at any of the test projects. Its implemented in test_helper.c Perhaps it will be easy if you do this work in your OpenSSH-portable fork. Make a copy of any of those test projects for your purpose and you should be good to go. |
@manojampalam. I updated the code to deal with APC completions, made it so it deals with thread cleanups more gracefully, optimized synchronization logic to be more clear, adjusted style to match that that of the rest of the project, and added logic so that it will just defer to the native function if less than 64 objects for efficiency. All my tests have been integrated thus far and been very successful. I was going to add a file to 'unittest-win32compat' for unit testing the code; does that seem reasonable? Also, do you have any existing tests for massively spawning copies to test scalability? I didn't see anything in there. |
See commit NoMoreFood/openssh-portable@817e977. |
@manojampalam Updated based on your comments. NoMoreFood/openssh-portable@ba70118. |
Thanks. I'll take a look. Please post a PR, so we can keep track of further conversation. |
@manojampalam Not foreseeing the three potential changes I've submitted thus far (group detection, symbolic link support, and this change), I believe my pull request will reflect all three changes since I made the first two without branches. I can cancel and resubmit my other pulls from three different branches if you think that makes the most sense. Any advice would be appreciated. |
Yes, it will be better for tracking if we deal with each feature/fix in a separate branch/PR/commit. |
@KaizerT Just wanted to let you know my pull request is finally in hopefully we seen this in a upcoming binary release. |
@NoMoreFood, can you comment the overarching goal of your code? E.g. in other projects, limitations in |
@R030t1 I'm sure there are a variety of ways to address those limitations. I used a variant of one idea what was suggested on MSDN WaitForMultipleObjects() page. I believe it's adequate for what this project needed. My main "goals" were to submit something that was reasonably low overhead, reliable, addressed the issue noted in this thread, and was relatively non-intrusive to the existing code such that the project maintainers would be comfortable accepting it since I'm relatively new to this project. The "optimized" wait-any version I ended up submitting and that was accepted is here: signal_wait.c. I think it is pretty well commented overall and simple enough that it doesn't need much too head-scratching to figure out what's going on. If you think that's something that's very unclear or could drastically improve maintainability, I'm certainly open to suggestions. |
PowerShell/Win32-OpenSSH#1096 PowerShell/Win32-OpenSSH#191 - Updated wait_for_multiple_objects_enhanced() to handle a no-event request while alterable. - Simplified wait_for_any_event() to by taking advantage of no-event alterable request in wait_for_multiple_objects_enhanced(). - Updated wait_for_any_event() to use MAX_CHILDREN limit instead of MAXIMUM_WAIT_OBJECTS limit. - Removed unnecessary ZeroMemory() call. - Created distinct definition MAXIMUM_WAIT_OBJECTS_ENHANCED and modified functions to use it. - Upped w32_select() event limit. - Modified wait_for_multiple_objects_enhanced() to allow for 0 millisecond wait.
@KaizerT Unless you want to wait for the official release, you can probably close this since the code changes are in. |
Please answer the following
"OpenSSH for Windows" version
v1.0.0.0
Server OperatingSystem
Windows Server 2012 R2
Client OperatingSystem
Windows 7 Enterprise
What is failing
any connection after 50 open connections
Expected output
Support >100 connections
Actual output
stops at 50, exactly
I actually have an open question at server fault for this https://serverfault.com/questions/900137/windows-open-ssh-server-cannot-support-more-than-50-concurrent-ssh-sessions
Anyways, here are the details. Any help will be greatly appreciated
I'm trying to set up a Windows OpenSSH Server (v1.0.0.0-Beta) and I've hit a problem.
I have a program that uses plink to create ssh connections. This can range from 1-200 connections and I'm pretty sure that should be safe. However, when I've made 50, EXACTLY 50 ssh connections, the ssh server starts rejecting the connections and looking at the logs, it shows that the posix_spawn failed and it can no longer instantiate child processes to handle the connections
I've tried this link(https://stackoverflow.com/questions/17472389/how-to-increase-the-maximum-number-of-child-processes-that-can-be-spawned-by-a-w) and it didn't do anything for it. I'd really appreciate your help. Thank you!
Here is the error log when the 51st connection is made
1656 2018-03-05 16:08:05.610 debug3: fd 5 is not O_NONBLOCK
1656 2018-03-05 16:08:05.610 debug3: spawning "C:\OpenSSH\sshd.exe" "-R"
1656 2018-03-05 16:08:05.610 error: posix_spawn failed
1656 2018-03-05 16:08:05.610 debug3: send_rexec_state: entering fd = 8 config len 357
1656 2018-03-05 16:08:05.610 debug3: ssh_msg_send: type 0
1656 2018-03-05 16:08:05.610 debug3: send_rexec_state: done
1656 2018-03-05 16:08:05.610 debug3: ReadFileEx() ERROR:109, io:00000010C2BF98B0
Here is my config
Subsystem sftp sftp-server.exe
LogLevel Debug3
AllowAgentForwarding yes
AllowTcpForwarding yes
GatewayPorts yes
TCPKeepAlive yes
MaxStartups 1000
PasswordAuthentication yes
PermitTTY yes
PidFile C:/ProgramData/ssh/sshd.pid
MaxSessions 1000
The text was updated successfully, but these errors were encountered: