Windows Open SSH Server cannot support more than 50 concurrent ssh sessions (posix_spawn failing) #1096

KaizerT · 2018-03-06T16:13:10Z

Please answer the following

"OpenSSH for Windows" version
v1.0.0.0

Server OperatingSystem
Windows Server 2012 R2

Client OperatingSystem
Windows 7 Enterprise

What is failing
any connection after 50 open connections

Expected output
Support >100 connections

Actual output
stops at 50, exactly

I actually have an open question at server fault for this https://serverfault.com/questions/900137/windows-open-ssh-server-cannot-support-more-than-50-concurrent-ssh-sessions

Anyways, here are the details. Any help will be greatly appreciated

I'm trying to set up a Windows OpenSSH Server (v1.0.0.0-Beta) and I've hit a problem.

I have a program that uses plink to create ssh connections. This can range from 1-200 connections and I'm pretty sure that should be safe. However, when I've made 50, EXACTLY 50 ssh connections, the ssh server starts rejecting the connections and looking at the logs, it shows that the posix_spawn failed and it can no longer instantiate child processes to handle the connections

I've tried this link(https://stackoverflow.com/questions/17472389/how-to-increase-the-maximum-number-of-child-processes-that-can-be-spawned-by-a-w) and it didn't do anything for it. I'd really appreciate your help. Thank you!

Here is the error log when the 51st connection is made

1656 2018-03-05 16:08:05.610 debug3: fd 5 is not O_NONBLOCK
1656 2018-03-05 16:08:05.610 debug3: spawning "C:\OpenSSH\sshd.exe" "-R"
1656 2018-03-05 16:08:05.610 error: posix_spawn failed
1656 2018-03-05 16:08:05.610 debug3: send_rexec_state: entering fd = 8 config len 357
1656 2018-03-05 16:08:05.610 debug3: ssh_msg_send: type 0
1656 2018-03-05 16:08:05.610 debug3: send_rexec_state: done
1656 2018-03-05 16:08:05.610 debug3: ReadFileEx() ERROR:109, io:00000010C2BF98B0

Here is my config

Subsystem sftp sftp-server.exe
LogLevel Debug3
AllowAgentForwarding yes
AllowTcpForwarding yes
GatewayPorts yes
TCPKeepAlive yes
MaxStartups 1000
PasswordAuthentication yes
PermitTTY yes
PidFile C:/ProgramData/ssh/sshd.pid
MaxSessions 1000

NoMoreFood · 2018-03-07T04:41:02Z

I'm just now getting familiar with this code but I suspect this could be due to the following hard-coded limitation in signal_sigchld.c where MAX_CHILDREN is 50. If true, it seems like this value could safely be larger.

int
register_child(HANDLE child, DWORD pid)
{
	DWORD first_zombie_index;

	debug4("Register child %p pid %d, %d zombies of %d", child, pid,
		children.num_zombies, children.num_children);
	if (children.num_children == MAX_CHILDREN) {
		errno = ENOMEM;
		return -1;
	}

manojampalam · 2018-03-07T05:36:52Z

@NoMoreFood you're right.

Ultimately, all these child process handles will be fed in a WaitForMultipleObjects call (as part of wait_for_any_event()) and that Win32 API call has a limitation of 64 handles. I had to pick a number less than this to accommodate other events (that of async accept and connect calls). Going beyond these limits will result in multi threaded POSIX compat layer that will make it more complex and less resilient.

Related to #191

Would love to hear if you have any suggestions to improve this.

KaizerT · 2018-03-07T05:40:22Z

@NoMoreFood @manojampalam Is there a fix/workaround for this? This seems to be a huge miss since an SSH server is expected to server numerous concurrent connections.

…

On Tue, 6 Mar 2018 at 11:37 PM Manoj Ampalam ***@***.***> wrote: @NoMoreFood <https://github.com/nomorefood> you're right. Ultimately all these child process handles will be fed in a WaitForMultipleObjects call (as part of wait_for_any_event()) and that Win32 API call has a limitation of 64 handles. I had to pick a number less than this to accommodate other events (that of async accept and connect calls). Related to #191 <#191> — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1096 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AjZY6EHvgPg7mVoZpfpMphnGylsCjxcAks5tb3H_gaJpZM4Se_Fs> .

NoMoreFood · 2018-03-07T21:32:58Z

@KaizerT Not without some additional programming. There are ways to work around the limitations that @manojampalam is describing, but it's not an "easy" change. I'm new to this project and have already submitted two pull requests for other issues. Depending on how those are received, maybe I'll take on this one next.

KaizerT · 2018-03-07T21:35:34Z

@NoMoreFood
I undertand that it's not that simple of a fix. Just saying that it might have been nice to have been resolved earlier. Thank you both @manojampalam for investigating.

KaizerT · 2018-03-07T21:36:00Z

I've set up a linux box for this instead to resolve this issue

NoMoreFood · 2018-03-08T04:59:51Z

@manojampalam I couldn't really think of a great way to come through that limitation other than a pretty significant overhaul of how process handling is being done. However, I was able to create a [almost] drop-in replacement for WaitForMultipleObjects() that handles an arbitrary number of objects. Tested on my computer to 5,000 threads with no issues. About 175 lines long. Interested?

manojampalam · 2018-03-08T05:24:30Z

@NoMoreFood, absolutely, please share.

NoMoreFood · 2018-03-08T06:09:44Z

@manojampalam See attached. I can wrap this into a pull request of course. Just need to replace the normal defines in the calling code that are used to identify the array offset with the *_ENHANCED versions. Let me know what you think.

-- Attachment Removed See Commit Referenced Below --

KaizerT · 2018-03-08T16:30:00Z

@NoMoreFood @manojampalam

It'd be great if you guys can come up with a release with @NoMoreFood 's fix in the near future. I'm doing a lot of finessing to make this work with the current set up I have.

manojampalam · 2018-03-08T17:57:52Z

@NoMoreFood , your implementation looks promising. I believe it would be a good next step to create a stand alone project and unit test it thoroughly (you may borrow OpenSSH unit test framework, so we can integrate it into OpenSSH at a later point seamlessly). We specifically want the synchronization logic to be tested thoroughly. This logic is very critical since any ensuing E2E deadlocks will be pretty much impossible to debug.

The specific requirements for wait_for_multipleobjects_enhanced, from wait_for_any_event point of view:

Ability to handle unbounded number of handles (or at least a sufficiently large number)
Know the event index that woke up the wait call (SIGCHLD relies on this)
Support APC completions (async IO reads and writes and other signals rely on this)
Support timeout

Thanks for your support and contributions.

NoMoreFood · 2018-03-08T18:06:00Z

@manojampalam Sounds good. I know there are some areas that could be problematic, but hopefully aren't relevant in our case and I will do my best to verify that. For example, you could have a race condition where multiple signals are signally simultaneously and, in that case, only the first signal will be returned to the caller. I think this is really only a problem for objects that automatically reset after being signaled. I'll do additional testing in a standalone project and let you know how it goes.

NoMoreFood · 2018-03-08T18:17:35Z

@manojampalam Actually now that I look at the documentation better, that simultaneous issue may actually exist with the WaitForMultipleObjects() anyhow... but regardless, I'll look into it. How does one use your "OpenSSH unit test framework"?

manojampalam · 2018-03-08T18:29:49Z

Right, we only deal with manual reset events, so you should be fine there.

For unittest, take a look at any of the test projects. Its implemented in test_helper.c Perhaps it will be easy if you do this work in your OpenSSH-portable fork. Make a copy of any of those test projects for your purpose and you should be good to go.

NoMoreFood · 2018-03-10T18:32:08Z

@manojampalam. I updated the code to deal with APC completions, made it so it deals with thread cleanups more gracefully, optimized synchronization logic to be more clear, adjusted style to match that that of the rest of the project, and added logic so that it will just defer to the native function if less than 64 objects for efficiency. All my tests have been integrated thus far and been very successful. I was going to add a file to 'unittest-win32compat' for unit testing the code; does that seem reasonable?

Also, do you have any existing tests for massively spawning copies to test scalability? I didn't see anything in there.

NoMoreFood · 2018-03-11T18:51:10Z

See commit NoMoreFood/openssh-portable@817e977.

NoMoreFood · 2018-03-12T23:36:26Z

@manojampalam Updated based on your comments. NoMoreFood/openssh-portable@ba70118.

manojampalam · 2018-03-13T00:34:51Z

Thanks. I'll take a look. Please post a PR, so we can keep track of further conversation.

NoMoreFood · 2018-03-13T00:43:23Z

@manojampalam Not foreseeing the three potential changes I've submitted thus far (group detection, symbolic link support, and this change), I believe my pull request will reflect all three changes since I made the first two without branches. I can cancel and resubmit my other pulls from three different branches if you think that makes the most sense. Any advice would be appreciated.

manojampalam · 2018-03-13T00:45:30Z

Yes, it will be better for tracking if we deal with each feature/fix in a separate branch/PR/commit.

NoMoreFood · 2018-03-15T02:43:28Z

@KaizerT Just wanted to let you know my pull request is finally in hopefully we seen this in a upcoming binary release.

ghost · 2018-03-18T15:44:57Z

@NoMoreFood, can you comment the overarching goal of your code? E.g. in other projects, limitations in WaitForMultipleObjects are handled by a red black tree which objects are sorted in to. Your solution may not be the same, but it would be helpful if any datastructures used were labelled.

NoMoreFood · 2018-03-18T23:10:55Z

@R030t1 I'm sure there are a variety of ways to address those limitations. I used a variant of one idea what was suggested on MSDN WaitForMultipleObjects() page. I believe it's adequate for what this project needed. My main "goals" were to submit something that was reasonably low overhead, reliable, addressed the issue noted in this thread, and was relatively non-intrusive to the existing code such that the project maintainers would be comfortable accepting it since I'm relatively new to this project. The "optimized" wait-any version I ended up submitting and that was accepted is here: signal_wait.c. I think it is pretty well commented overall and simple enough that it doesn't need much too head-scratching to figure out what's going on. If you think that's something that's very unclear or could drastically improve maintainability, I'm certainly open to suggestions.

PowerShell/Win32-OpenSSH#1096 PowerShell/Win32-OpenSSH#191 - Updated wait_for_multiple_objects_enhanced() to handle a no-event request while alterable. - Simplified wait_for_any_event() to by taking advantage of no-event alterable request in wait_for_multiple_objects_enhanced(). - Updated wait_for_any_event() to use MAX_CHILDREN limit instead of MAXIMUM_WAIT_OBJECTS limit. - Removed unnecessary ZeroMemory() call. - Created distinct definition MAXIMUM_WAIT_OBJECTS_ENHANCED and modified functions to use it. - Upped w32_select() event limit. - Modified wait_for_multiple_objects_enhanced() to allow for 0 millisecond wait.

NoMoreFood · 2018-03-22T22:29:13Z

@KaizerT Unless you want to wait for the official release, you can probably close this since the code changes are in.

manojampalam added the Issue-Discussion-Deprecated/use GitHub discussions label Mar 7, 2018

manojampalam added Issue-Enhancement Feature request Area-Core and removed Issue-Discussion-Deprecated/use GitHub discussions labels Mar 20, 2018

manojampalam added this to the vNext milestone Mar 20, 2018

manojampalam modified the milestones: vNext, 7.6.1.0p1-Beta Mar 30, 2018

bingbing8 closed this as completed Mar 31, 2018

golvellius1985 mentioned this issue Mar 24, 2023

Windows Open SSH Server cannot support more than 512 concurrent ssh sessions (posix_spawn failing) #2045

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Windows Open SSH Server cannot support more than 50 concurrent ssh sessions (posix_spawn failing) #1096

Windows Open SSH Server cannot support more than 50 concurrent ssh sessions (posix_spawn failing) #1096

KaizerT commented Mar 6, 2018 •

edited

Loading

NoMoreFood commented Mar 7, 2018

manojampalam commented Mar 7, 2018 •

edited

Loading

KaizerT commented Mar 7, 2018 via email •

edited

Loading

NoMoreFood commented Mar 7, 2018

KaizerT commented Mar 7, 2018

KaizerT commented Mar 7, 2018

NoMoreFood commented Mar 8, 2018

manojampalam commented Mar 8, 2018

NoMoreFood commented Mar 8, 2018 •

edited

Loading

KaizerT commented Mar 8, 2018

manojampalam commented Mar 8, 2018 •

edited

Loading

NoMoreFood commented Mar 8, 2018

NoMoreFood commented Mar 8, 2018

manojampalam commented Mar 8, 2018

NoMoreFood commented Mar 10, 2018

NoMoreFood commented Mar 11, 2018

NoMoreFood commented Mar 12, 2018

manojampalam commented Mar 13, 2018

NoMoreFood commented Mar 13, 2018

manojampalam commented Mar 13, 2018

NoMoreFood commented Mar 15, 2018

ghost commented Mar 18, 2018 •

edited by ghost

Loading

NoMoreFood commented Mar 18, 2018

NoMoreFood commented Mar 22, 2018

Windows Open SSH Server cannot support more than 50 concurrent ssh sessions (posix_spawn failing) #1096

Windows Open SSH Server cannot support more than 50 concurrent ssh sessions (posix_spawn failing) #1096

Comments

KaizerT commented Mar 6, 2018 • edited Loading

NoMoreFood commented Mar 7, 2018

manojampalam commented Mar 7, 2018 • edited Loading

KaizerT commented Mar 7, 2018 via email • edited Loading

NoMoreFood commented Mar 7, 2018

KaizerT commented Mar 7, 2018

KaizerT commented Mar 7, 2018

NoMoreFood commented Mar 8, 2018

manojampalam commented Mar 8, 2018

NoMoreFood commented Mar 8, 2018 • edited Loading

KaizerT commented Mar 8, 2018

manojampalam commented Mar 8, 2018 • edited Loading

NoMoreFood commented Mar 8, 2018

NoMoreFood commented Mar 8, 2018

manojampalam commented Mar 8, 2018

NoMoreFood commented Mar 10, 2018

NoMoreFood commented Mar 11, 2018

NoMoreFood commented Mar 12, 2018

manojampalam commented Mar 13, 2018

NoMoreFood commented Mar 13, 2018

manojampalam commented Mar 13, 2018

NoMoreFood commented Mar 15, 2018

ghost commented Mar 18, 2018 • edited by ghost Loading

NoMoreFood commented Mar 18, 2018

NoMoreFood commented Mar 22, 2018

KaizerT commented Mar 6, 2018 •

edited

Loading

manojampalam commented Mar 7, 2018 •

edited

Loading

KaizerT commented Mar 7, 2018 via email •

edited

Loading

NoMoreFood commented Mar 8, 2018 •

edited

Loading

manojampalam commented Mar 8, 2018 •

edited

Loading

ghost commented Mar 18, 2018 •

edited by ghost

Loading