Skip to content

OpenSSH for Windows often hangs if no data sent over the connection #1338

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
EricSL opened this issue Feb 7, 2019 · 15 comments · Fixed by PowerShell/openssh-portable#374
Closed

Comments

@EricSL
Copy link

EricSL commented Feb 7, 2019

Troubleshooting steps
https://github.com/PowerShell/Win32-OpenSSH/wiki/Troubleshooting-Steps

Terminal issue? please go through wiki
https://github.com/PowerShell/Win32-OpenSSH/wiki/TTY-PTY-support-in-Windows-OpenSSH

Please answer the following

"OpenSSH for Windows" version
7.7p1, 7.9p1, much less common on 7.6p1 but I've seen it on that version too.

Server OperatingSystem
Debian GNU/Linux 9.4 (stretch)

Client OperatingSystem
Windows 10

What is failing

Possible duplicate of #1334. Also filed at https://bugzilla.mindrot.org/show_bug.cgi?id=2964

This was reproduced on both 7.7p1 and 7.9p1, while ssh-ing from a Windows machine to a Linux machine. I have also reproduced it on 7.6p1 but it happens much more rarely. You may have to run a command several times in a row to get it to fail.

For a simple repro I'm using the bash commands echo "" (should output a single newline) and echo -n "" (should output nothing and immediately exit successfully; I've also used /bin/true which is equivalent.)

ssh user@host -- echo ""

Reliably returns right away as expected.

ssh user@host -tt -- echo ""

Reliably returns right away as expected.

ssh user@host -- echo -n ""

Seems to hang, but if you press a key it returns.

ssh user@host -tt -- echo -n ""

Sometimes it works, sometimes it hangs. When it hangs it will be unresponsive to input, including ^C. You need to kill it in task manager. A wireshark trace shows that the server sent the TCP FIN packet, but the client is still holding open the connection.

ssh user@host -tt -v -- echo -n ""

Turning on verbose output seems to make it work reliably.

I also have reproduced with the command sleep .001. Changing to .01 makes it reproduce less frequently, and 1 second works reliably. So it seems to be a race condition involving a very short connection with no data sent.

Not sure to what extent network latency affects this but my ping time is 11ms so you may need something similarly distant.

Expected output

All of these commands should return right away.

Actual output

Some of these commands hang, either waiting for input (if -tt is not specified) or ssh becomes unresponsive (if -tt is specified).

@NoMoreFood
Copy link

Unable to reproduce this on a connection with ~35ms latency.

@EricSL
Copy link
Author

EricSL commented Feb 12, 2019

ssh.dmp.zip

I've attached a dump of a deadlocked OpenSSH 7.9.

@NoMoreFood
Copy link

Topically, it appears that it might be waiting for the asynchronous write of "Connection to 172.217.225.62 closed." to stderr to finish. Can you also post the output when "-vvv" is set (assuming that passing that doesn't fix the issue 100% of the time)?

@NoMoreFood
Copy link

@EricSL If you're willing to help debug (and presuming that a verbose output can't be produced), I'd like to provide you a private version with some additional debug variables to trace where things could be going wrong.

@EricSL
Copy link
Author

EricSL commented Feb 13, 2019

"-vvv" reliably fixes the problem.

Send me a patch and I'll make a custom build.

@NoMoreFood
Copy link

@EricSL I was finally able to reproduce on a low-latency connection this but the problem is quite mysterious. It almost appears that after one part of the code calls TerminateThread() that no other new threads can be spawned. I'm still probing since I'm not familiar with this part of the code, but hopefully will have an update in a few days.

NoMoreFood added a commit to NoMoreFood/openssh-portable that referenced this issue Feb 16, 2019
- Replaced TerminateThread() call with an interrupt routine to gracefully call _endthreadex(0).
- Resolves PowerShell/Win32-OpenSSH#1338.
@NoMoreFood
Copy link

@NoMoreFood
Copy link

If interested, you can test using these binaries: https://github.com/NoMoreFood/openssh-portable/releases/tag/v7.9-merge-1

@EricSL
Copy link
Author

EricSL commented Mar 1, 2019

@NoMoreFood Your patch fixes the issue for me, thanks!

@pakona
Copy link

pakona commented Mar 6, 2019

@NoMoreFood
Will this patch make it to the next release? Seems like a critical issue to me.
Any idea of when to expect the next release?
Thanks!

@NoMoreFood
Copy link

@pakona Unfortunately, I cannot speak for the release timeline nor whether this patch will be ultimately be accepted. I get the sense the maintainers are preoccupied with other tasks at the moment so I can only hope there is a surge of attention on this fork at some point.

@pakona
Copy link

pakona commented Mar 13, 2019

@bingbing8 any chance we could get some idea of when this patch could be accepted?
It would also be so helpful to have a rough idea of the future releases cadence.

@EricSL
Copy link
Author

EricSL commented Apr 8, 2019

This problem is reproducing for me again, even with the patch. What I'm seeing now:

Main thread:

 	[External Code]	
>	ssh.exe!syncio_close(w32_io * pio) Line 273	C
 	ssh.exe!fileio_close(w32_io * pio) Line 973	C
 	ssh.exe!w32_close(int fd) Line 642	C
 	ssh.exe!channel_close_fd(ssh * ssh, int * fdp) Line 433	C
 	ssh.exe!chan_shutdown_read(ssh * ssh, Channel * c) Line 420	C
 	ssh.exe!chan_rcvd_oclose(ssh * ssh, Channel * c) Line 293	C
 	ssh.exe!channel_input_oclose(int type, unsigned int seq, ssh * ssh) Line 3126	C
 	ssh.exe!ssh_dispatch_run(ssh * ssh, int mode, volatile int * done) Line 114	C
 	ssh.exe!client_loop(ssh * ssh, int) Line 1330	C
 	ssh.exe!main(int ac, char * * av) Line 1528	C
 	ssh.exe!wmain(int argc, wchar_t * * wargv) Line 61	C
 	[External Code]	

It appears to be waiting for WaitForSingleObject to return, in your patched code.

-		pio	0x0000029a263c9610 {read_overlapped={Internal=0 InternalHigh=0 Offset=0 ...} write_overlapped={Internal=...} ...}	w32_io *
+		read_overlapped	{Internal=0 InternalHigh=0 Offset=0 ...}	_OVERLAPPED
+		write_overlapped	{Internal=0 InternalHigh=0 Offset=0 ...}	_OVERLAPPED
+		read_details	{buf=0x0000029a263cc7a0 "€£<&š\x2" buf_size=2048 remaining=0 ...}	<unnamed-tag>
+		write_details	{buf=0x0000000000000000 <NULL> buf_size=0 remaining=0 ...}	<unnamed-tag>
		table_index	4	int
		type	NONSOCK_SYNC_FD (3)	w32_io_type
		fd_flags	1	unsigned long
		fd_status_flags	0	unsigned long
		sock	540	unsigned __int64
		handle	0x000000000000021c	void *
+		sync_read_status	{to_transfer=0 transferred=0 error=0 }	<unnamed-tag>
+		sync_write_status	{to_transfer=0 transferred=0 error=0 }	<unnamed-tag>
+		internal	{state=SOCK_INITIALIZED (0) context=0x0000000000000000 }	<unnamed-tag>

Worker Thread:

 	[External Code]	
>	ssh.exe!ReadThread(void * lpParameter) Line 102	C
 	ssh.exe!thread_start<unsigned int (__cdecl*)(void * __ptr64)>(void * const parameter) Line 115	C++
 	[External Code]	

It is waiting for ReadFile to return, same pio, nBytesReturned = 0

@EricSL
Copy link
Author

EricSL commented Apr 8, 2019

Okay, apparently the problem with the patched syncio_close() is that I am not running on Win7 and in_raw_mode is 0 so it never calls CancelSynchronousIo. If I change the condition to just if(FILETYPE(pio) == FILE_TYPE_CHAR) then it doesn't hang. Is this a reasonable change to your patch?

@NoMoreFood
Copy link

I'd have to defer to @manojampalam since I didn't didn't investigate the Windows 7 / raw mode circumstances that let to this conditional in the first place.

EricSL pushed a commit to EricSL/openssh-portable that referenced this issue Apr 25, 2019
- Replaced TerminateThread() call with an interrupt routine to gracefully call _endthreadex(0).
- Resolves PowerShell/Win32-OpenSSH#1338.
manojampalam pushed a commit to PowerShell/openssh-portable that referenced this issue May 21, 2019
- Replaced TerminateThread() call with an interrupt routine to gracefully call _endthreadex(0).
- Resolves PowerShell/Win32-OpenSSH#1338.
@manojampalam manojampalam modified the milestones: vNext, v8.0.0.0p1-Beta Jun 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants
@pakona @EricSL @NoMoreFood @manojampalam and others