Skip to content

runtime: infinite recursion on windows triggered by morestack #21382

Closed
@kjk

Description

@kjk

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.9rc2 windows/amd64

What operating system and processor architecture are you using (go env)?

set GOARCH=amd64
set GOBIN=
set GOEXE=.exe
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOOS=windows
set GOPATH=C:\Users\kjk\src\go
set GORACE=
set GOROOT=C:\Go
set GOTOOLDIR=C:\Go\pkg\tool\windows_amd64
set GCCGO=gccgo
set CC=gcc
set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0
set CXX=g++
set CGO_ENABLED=1
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config

What did you do?

This is a continuation of #20975 so the same repro program (https://github.com/kjk/go20975) built in 64bit mode.

What did you expect to see?

No infinite recursion.

What did you see instead?

This time I used https://github.com/kjk/cv2pdb to convert dwarf to pdb so that I can get symbols in windbg.

I ran repro program under windbg.

The crash is:

 # RetAddr           : Args to Child                                                           : Call Site
00 00000000`0043cc0b : 00000000`00451a56 00000000`0043cc0b 00000000`00451a56 00000000`0043cc0b : go20975!runtime.morestack+0x10 [C:\Go\src\runtime\asm_amd64.s @ 377] 
01 00000000`00451a56 : 00000000`0043cc0b 00000000`00451a56 00000000`0043cc0b 00000000`00451a56 : go20975!runtime.sigpanic+0x18b [C:\Go\src\runtime\signal_windows.go @ 152] 
02 00000000`0043cc0b : 00000000`00451a56 00000000`0043cc0b 00000000`00451a56 00000000`0043cc0b : go20975!runtime.morestack+0x26 [C:\Go\src\runtime\asm_amd64.s @ 382] 
03 00000000`00451a56 : 00000000`0043cc0b 00000000`00451a56 00000000`0043cc0b 00000000`00451a56 : go20975!runtime.sigpanic+0x18b [C:\Go\src\runtime\signal_windows.go @ 152] 
04 00000000`0043cc0b : 00000000`00451a56 00000000`0043cc0b 00000000`00451a56 00000000`0043cc0b : go20975!runtime.morestack+0x26 [C:\Go\src\runtime\asm_amd64.s @ 382] 
05 00000000`00451a56 : 00000000`0043cc0b 00000000`00451a56 00000000`0043cc0b 00000000`00451a56 : go20975!runtime.sigpanic+0x18b [C:\Go\src\runtime\signal_windows.go @ 152] 
06 00000000`0043cc0b : 00000000`00451a56 00000000`0043cc0b 00000000`00451a56 00000000`0043cc0b : go20975!runtime.morestack+0x26 [C:\Go\src\runtime\asm_amd64.s @ 382] 
07 00000000`00451a56 : 00000000`0043cc0b 00000000`00451a56 00000000`0043cc0b 00000000`00451a56 : go20975!runtime.sigpanic+0x18b [C:\Go\src\runtime\signal_windows.go @ 152] 
08 00000000`0043cc0b : 00000000`00451a56 00000000`0043cc0b 00000000`00451a56 00000000`0045104a : go20975!runtime.morestack+0x26 [C:\Go\src\runtime\asm_amd64.s @ 382] 
09 00000000`00451a56 : 00000000`0043cc0b 00000000`00451a56 00000000`0045104a 00000000`004519ee : go20975!runtime.sigpanic+0x18b [C:\Go\src\runtime\signal_windows.go @ 152] 
0a 00000000`0043cc0b : 00000000`00451a56 00000000`0045104a 00000000`004519ee 00000000`004304f0 : go20975!runtime.morestack+0x26 [C:\Go\src\runtime\asm_amd64.s @ 382] 
0b 00000000`00451a56 : 00000000`0045104a 00000000`004519ee 00000000`004304f0 00000000`00b9fef0 : go20975!runtime.sigpanic+0x18b [C:\Go\src\runtime\signal_windows.go @ 152] 
0c 00000000`0045104a : 00000000`004519ee 00000000`004304f0 00000000`00b9fef0 00000000`00000000 : go20975!runtime.morestack+0x26 [C:\Go\src\runtime\asm_amd64.s @ 382] 
0d 00000000`004519ee : 00000000`004304f0 00000000`00b9fef0 00000000`00000000 00007ffa`59b6e618 : go20975!runtime.exitsyscallfast.func1+0xaa [C:\Go\src\runtime\proc.go @ 2717] 
0e 00000000`004304f0 : 00000000`00b9fef0 00000000`00000000 00007ffa`59b6e618 00000000`00455804 : go20975!runtime.systemstack+0x7e [C:\Go\src\runtime\asm_amd64.s @ 347] 
0f 00000000`00b9fef0 : 00000000`00000000 00007ffa`59b6e618 00000000`00455804 00000000`006307d8 : go20975!runtime.mstart [C:\Go\src\runtime\proc.go @ 1125] 
10 00000000`00000000 : 00007ffa`59b6e618 00000000`00455804 00000000`006307d8 00000000`00b90e00 : 0xb9fef0
TEXT runtime·morestack(SB),NOSPLIT,$0-0
	// Cannot grow scheduler stack (m->g0).
	get_tls(CX)
	MOVQ	g(CX), BX
	MOVQ	g_m(BX), BX
	MOVQ	m_g0(BX), SI
	CMPQ	g(CX), SI
	JNE	3(PC)
	CALL	runtime·badmorestackg0(SB)
	INT	$3

INT $3 is executed which triggers runtime.sigpanic. I assume sigpanic does stack check, calls morestack and that does INT $3 again. Infite loop happens and eventually crash will happen.

Activity

kjk

kjk commented on Aug 10, 2017

@kjk
Author

So I'm stepping through the assembly and there's more fishy stuff.

After int 3 we end up in:

00000000`00451a56 03488b          add     ecx,dword ptr [rax-75h] ds:ffffffff`ffffffa2=????????
00000000`00451a59 7350            jae     go20975!runtime.morestack+0x7b (00000000`00451aab)
00000000`00451a5b 4839b100000000  cmp     qword ptr [rcx],rsi

However, at that point rax is 0x17, so trying to de-reference [rax-75h] throws an exception:

0:000> t
(1830.205c): Access violation - code c0000005 (first chance)

That doesn't make sense to me unless this is a trick to just trigger an exception.

Here's a what gets executed, according to windbg, when single-stepping from int 3 to calling morestack again:

go20975!runtime.morestack+0x25:
00000000`00451a55 cd03            int     3
0:000> p
WARNING: This break is not a step/trace completion.
The last command has been cleared to prevent
accidental continuation of this unrelated event.
Check the event, location and thread before resuming.
(1830.205c): Break instruction exception - code 80000003 (first chance)
go20975!runtime.morestack+0x26:
00000000`00451a56 03488b          add     ecx,dword ptr [rax-75h] ds:ffffffff`ffffffa2=????????
0:000> t
(1830.205c): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
go20975!runtime.morestack+0x26:
00000000`00451a56 03488b          add     ecx,dword ptr [rax-75h] ds:ffffffff`ffffffa2=????????
0:000> t
go20975!runtime.sigpanic+0x9:
00000000`0043ca89 488b8900000000  mov     rcx,qword ptr [rcx] ds:00000000`0077b438={go20975!runtimeg0 (00000000`0077b0a0)}
0:000> t
go20975!runtime.morestack_noctxt:
00000000`00451ad0 31d2            xor     edx,edx
0:000> t
go20975!runtime.morestack_noctxt+0x2:
00000000`00451ad2 e959ffffff      jmp     go20975!runtime.morestack (00000000`00451a30)
0:000> t
go20975!runtime.morestack:
00000000`00451a30 65488b0c2528000000 mov   rcx,qword ptr gs:[28h] gs:00000000`00000028=????????????????

I don't get how executing:

go20975!runtime.sigpanic+0x9:
00000000`0043ca89 488b8900000000  mov     rcx,qword ptr [rcx] ds:00000000`0077b438={go20975!runtimeg0 (00000000`0077b0a0)}

ends up going to go20975!runtime.morestack_noctxt == 0000000000451ad0`.

mvdan

mvdan commented on Aug 10, 2017

@mvdan
Member

Just to clarify, is this when building the program, or when running it?

Does this happen with 1.8?

CC @aclements

mvdan

mvdan commented on Aug 10, 2017

@mvdan
Member

Also, if this was an infinite recursion, wouldn't you end up with a panic or crash of some sort? I don't know what windbg is, so perhaps there's something I'm missing.

kjk

kjk commented on Aug 10, 2017

@kjk
Author

Eventually the process will go away due to stack overflow exception. In this scenario runtime is incapable of handling it and generating a proper panic.

alexbrainman

alexbrainman commented on Aug 11, 2017

@alexbrainman
Member

In this scenario runtime is incapable of handling it and generating a proper panic.

I would not expect Go to generate proper panic after executing INT $3. I will let Austin decide if something needs to be done here.

Alex

aclements

aclements commented on Sep 5, 2017

@aclements
Member

I'd like to understand how we wound up in morestack without any remaining system stack space in the first place. Once we hit the INT $3, it would be nice to fail more gracefully, but things are toast anyway.

If there any way MSHTML could be calling back into Go code while deep in the stack?

If not, and I'm grasping at straws here, but my guess is that the C "syscall" code (which runs on the system stack) is running out of stack space, which invokes a Windows exception handler registered by the runtime, which also attempts to run on the system stack and fails when it sees there's no stack left. @alexbrainman, I know very little about how Windows exception handlers work; does this seem like a plausible explanation?

(Notably, on UNIX platforms, the signal handler runs on yet another stack that's only for signal handling, so even if we run out of space on the system stack, we have a little more backup room in which to fail gracefully.)

aclements

aclements commented on Sep 5, 2017

@aclements
Member

Actually, this is sort of interesting, though I'm not sure what to make of it:

0c 00000000`0045104a : 00000000`004519ee 00000000`004304f0 00000000`00b9fef0 00000000`00000000 : go20975!runtime.morestack+0x26 [C:\Go\src\runtime\asm_amd64.s @ 382] 
0d 00000000`004519ee : 00000000`004304f0 00000000`00b9fef0 00000000`00000000 00007ffa`59b6e618 : go20975!runtime.exitsyscallfast.func1+0xaa [C:\Go\src\runtime\proc.go @ 2717] 

exitsyscallfast.func1 is specifically the closure that does throw("exitsyscall: syscall frame is no longer valid"). This indicates that we tried to return from the system call (though why that would be, I'm not sure), but the stack got unwound or the SP just changed completely. Then, we tried to switch to the system stack to report this, but it was full, leading to a cascade of other problems.

@kjk, can you put a breakpoint in exitsyscall at the first call to systemstack (inside the if getcallersp(unsafe.Pointer(&dummy)) > g.syscallsp`) and see what the call stack there is?

kjk

kjk commented on Sep 5, 2017

@kjk
Author

@aclements Please also read comments #20975 (comment) and below as this is the same issue and there is more detail there.

To summarize my guesses at this point:

It's not caused by running out of stack space.

morestack is called unconditionally (i.e. regardless of how much stack is left) by the closure passed to systemstack in exitsyscallfast.func1.

When morestack is called there's plenty of stack but it detects that it's being called on scheduler stack (g.m.g0 == g) which shouldn't happen because systemstack is supposed to ensure that it's, well, system stack. There seems to be a missed case in that logic.

When morestack detects this invariant being violated, it does int 3 to trigger debugger and make debugging easy.

It seems to be 64-bit only so I assume it's some of the arch-specific runtime assembly routines.

mshtml per se doesn't call Go but there are plenty of C->Go->C transitions because of how Windows message processing works.

Each window has a callback (called wndproc) responsible for handling message for that window. In Windows every control (a button, listview, browser view etc.) is a window.

To add custom handling of messages we need to provide our own wndproc callback, which must be called via C->Go trampoline. When that callback is not interested in the message, we need to call the original wndproc for that message, which is Go->C transition.

So every GUI windows program has a high rate of C->Go and Go->C transitions, especially those using https://github.com/lxn/walk/ library, as it hoooks wndproc for all windows it creates.

This also makes debugging with breakpoints impossible. I've spent several hours setting breakpoints at various points and stepping through the code but the same code works correctly the first 1000 times and then fails.

To summarize my beliefs:

  • not caused by running out of stack
  • caused by systemstack failing to switch to system stack before calling its closure and remaining on scheduler stack
  • 64-bit only
  • not deterministic but correlated to high rate of C->Go and Go->C transitions
  • most likely a bug in runtime.systemstack in asm_arm64.s

The most promising approach would be to instrument systemstack to add the same check that morestack does but when exiting systemstack, to catch bad condition (remaining on scheduler stack) earlier.

aclements

aclements commented on Sep 5, 2017

@aclements
Member

@kjk, systemstack is extremely well-trodden code. Obviously it's not impossible that it contains a bug, but that's way down on my list of candidates.

morestack is called unconditionally (i.e. regardless of how much stack is left) by the closure passed to systemstack in exitsyscallfast.func1.

Why do you say that? It never makes sense to call morestack unconditionally, and, looking at the disassembly of exitsyscallfast.func1, it clearly does check the stack bound before calling morestack, as it's supposed to.

When morestack is called there's plenty of stack but it detects that it's being called on scheduler stack (g.m.g0 == g) which shouldn't happen because systemstack is supposed to ensure that it's, well, system stack.

This isn't quite right. There is no separate "scheduler stack", there's just the user stack and the system stack (and the signal stack on UNIX). If g.m.g0 == g, then you're on the system stack. So, systemstack is supposed to put you on the system stack, at which point g.m.g0 == g, and any call to morestack should panic.

What makes you say there's plenty of stack when it calls morestack? I didn't see evidence for that here or on the other issue (I may have just missed it; there are a lot of posts).

To add custom handling of messages we need to provide our own wndproc callback, which must be called via C->Go trampoline.

Can you point me to where your code is doing this? Normally this would go through the cgo callback paths, but since your application isn't using cgo, I'm curious how this is being done.

Given the C->Go callbacks, this is all precisely the behavior I would expect if C code were using up the system stack and then calling back into Go code.

(From my earlier post:)

exitsyscallfast.func1 is specifically the closure that does throw("exitsyscall: syscall frame is no longer valid").

Oops, I'd missed the fast in there, so I was looking at the wrong closure. Unfortunately, I would expect exitsyscallfast.func1 to be called quite frequently in normal operation, so setting a breakpoint there isn't useful. (But it does mean the SP probably isn't getting totally trashed like I thought.)

kjk

kjk commented on Sep 5, 2017

@kjk
Author

Like I said, those are guesses, you're more likely to be right than me.

There is no separate "scheduler stack"

I'm just parroting back terminology used by the code e.g.

// Cannot grow scheduler stack (m->g0).

What makes you say there's plenty of stack when it calls morestack?

I've tried the repro with ridiculously large (16 MB) stack and got the same thing.

In the debugger, I printed the callstack and it was relatively short from main().

Either way, this particular issue is due to morestack detecting an internal inconsistency (and not being able to handle it via somewhat controlled panic which eventually triggering windows exception that silently kills the process).

Given the C->Go callbacks, this is all precisely the behavior I would expect if C code were using up the system stack and then calling back into Go code.

It's also consistent with being confused about which stack the code is on.

If the code is confused about which stack it is on, then we might be on a thread with plenty of stack but "needs to grow stack" check is done on the wrong stack, wrongly detects need to expand stack, calls morestack which detects it's the wrong stack and does int 3.

Can you point me to where your code is doing this?

On windows syscall.Syscal does Go->C call and syscall.NewCallback creates C->Go callback.

This is all done in the lnx/walk library:

Windows GUI code is roughly this (https://github.com/lxn/walk/blob/2d327b4a1aba7cda2a365bc566fd60ea6bd4c8bf/form.go#L365):

  • there's an infinite loop calling (Windows OS functions) GetMessage()/DispatchMessage() (until getting a message indicating the app has been closed)
  • DispatchMessage() is a win32 OS function so that's Go -> C transition. It determines which window is target of the message and calls its wndproc. In our case it causes C -> Go transition as wndproc is a trampoline to a Go code
  • often Go callback doesn't process the message and calls the original C wndproc, which is Go -> C transition
  • then they all unwind, go back to Go code that repeats GetMessage()/DispatchMessage() until the user closes the window, triggering the exit

It's unavoidable to get Go -> C -> Go -> C in Windows GUI programs. Using cgo is not necessary for that.

alexbrainman

alexbrainman commented on Sep 6, 2017

@alexbrainman
Member

is running out of stack space, which invokes a Windows exception handler registered by the runtime, which also attempts to run on the system stack and fails when it sees there's no stack left.

Windows exception handler calls runtime.sigtramp. The runtime.sigtramp will run on scheduler stack. Also see _StackSystem is used to make sure we always have enough room to run exception handler.

Normally this would go through the cgo callback paths, but since your application isn't using cgo, I'm curious how this is being done.

If you are interested to see simple Windows GUI app, you can download d8b239ff60a62c3f50f7eb5994221b50ba055cf2 commit (initial commit) of https://github.com/alexbrainman/gowingui

Alex

changed the title [-]Runtime infinite recursion on windows triggered by morestack[/-] [+]runtime: infinite recursion on windows triggered by morestack[/+] on Jan 9, 2018
added
NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.
on Jan 9, 2018
added this to the Go1.11 milestone on Mar 30, 2018

18 remaining items

kjk

kjk commented on Jul 4, 2018

@kjk
Author

BTW: the same problem happens on 386.

kjk

kjk commented on Jul 4, 2018

@kjk
Author

Another observation: memoryBasicInformation.baseAddress and memoryBasicInformation.allocationBase is shifted by 0x1000 from TEB StackBase and StackLimit (i.e. StackLimit is 0xa1000 and allocationBase is 0xa0000).

g0.stack.lo is allocationBase + 0x2000 which explains 0x1000 difference (0x2000 - 0x1000) difference between g0.stack.lo and TEB.StackBase.

TEB is https://en.wikipedia.org/wiki/Win32_Thread_Information_Block

But not sure if that's relevant. I patched minit() with:

mbi.allocationBase += 0x1000
mbi.baseAddress += 0x1000

to make them match and that didn't change anything.

kjk

kjk commented on Jul 4, 2018

@kjk
Author

Changing slack value from 8*1024 to 16*1024 (base := mbi.allocationBase + 8*1024) fixes the stack overflow.

runtime.morestack gets called, prints fatal: morestack on g0 message, does int 3 which invokes exception handler.

However, things then get recursive i.e. code in exception handler will call runtime.morestack etc.

I've added //go:nosplit from @aclements PR, then I've added some more that are called within exception handler (e.g. traceback(), gettraceback(), findfunc()) and then I've hit the limit on those:

 runtime.test
runtime.systemstack: nosplit stack overflow
        748     assumed on entry to runtime.traceback (nosplit)
kjk

kjk commented on Jul 4, 2018

@kjk
Author

And here's a fix:

--- a/src/runtime/os_windows.go                                                                                         
+++ b/src/runtime/os_windows.go                                                                                         
@@ -698,10 +698,12 @@ func minit() {                                                                                    
                print("runtime: VirtualQuery failed; errno=", getlasterror(), "\n")                                     
                throw("VirtualQuery for stack base failed")                                                             
        }                                                                                                               
+                                                                                                                       
        // Add 8K of slop for calling C functions that don't have                                                       
        // stack checks. We shouldn't be anywhere near this bound                                                       
        // anyway.                                                                                                      
-       base := mbi.allocationBase + 8*1024                                                                             
+       base := mbi.allocationBase + 16*1024                                                                            
+                                                                                                                       
        // Sanity check the stack bounds.                                                                               
        g0 := getg()                                                                                                    
        if base > g0.stack.hi || g0.stack.hi-base > 64<<20 {                                                            
diff --git a/src/runtime/proc.go b/src/runtime/proc.go                                                                  
index f82014eb92..288324be30 100644                                                                                     
--- a/src/runtime/proc.go                                                                                               
+++ b/src/runtime/proc.go                                                                                               
@@ -436,6 +436,15 @@ func badmorestackg0() {                                                                            
        write(2, sp.str, int32(sp.len))                                                                                 
 }                                                                                                                      
                                                                                                                        
+var exceptionhandlerMsg = "exceptionhandler\n"                                                                         
+                                                                                                                       
+//go:nosplit                                                                                                           
+//go:nowritebarrierrec                                                                                                 
+func exceptionhandlerPrint() {                                                                                         
+       sp := stringStructOf(&exceptionhandlerMsg)                                                                      
+       write(2, sp.str, int32(sp.len))                                                                                 
+}                                                                                                                      
+                                                                                                                       
 var badmorestackgsignalMsg = "fatal: morestack on gsignal\n"                                                           
                                                                                                                        
 //go:nosplit                                                                                                           
diff --git a/src/runtime/signal_windows.go b/src/runtime/signal_windows.go                                              
index 500b02880d..8101931ffe 100644                                                                                     
--- a/src/runtime/signal_windows.go                                                                                     
+++ b/src/runtime/signal_windows.go                                                                                     
@@ -71,7 +71,14 @@ func isgoexception(info *exceptionrecord, r *context) bool {                                         
 // Called by sigtramp from Windows VEH handler.                                                                        
 // Return value signals whether the exception has been handled (EXCEPTION_CONTINUE_EXECUTION)                          
 // or should be made available to other handlers in the chain (EXCEPTION_CONTINUE_SEARCH).                             
+//go:nosplit                                                                                                           
 func exceptionhandler(info *exceptionrecord, r *context, gp *g) int32 {                                                
+       exceptionhandlerPrint()                                                                                         
+                                                                                                                       
+       g := getg()                                                                                                     
+       g.stack.lo -= 16 * 1024                                                                                         
+       g.stackguard0 -= 16 * 1024                                                                                      
+                                                                                                                       
        if !isgoexception(info, r) {                                                                                    
                return _EXCEPTION_CONTINUE_SEARCH                                                                       
        }                                                                                                               

Instead of trying to remove implicit calls to morestack by adding //go:nosplit I just made it believe that everything is ok by using the slack we've added in minit().

With this change I get the proper clean exit with callstacks printed:

PS C:\Users\kjk\src\go-dev\src> .\runtime.test.exe "-test.run" TestG0StackOverflow
fatal: morestack on g0
exceptionhandler
fatal error: unexpected signal during runtime execution
[signal 0x80000003 code=0x0 addr=0x0 pc=0x4557b1]

runtime stack:
runtime.throw(0x6684ed, 0x2a)
        C:/Users/kjk/src/go-dev/src/runtime/panic.go:608 +0x54 fp=0xa4cf8 sp=0xa4ce4 pc=0x42a714
runtime.sigpanic()
        C:/Users/kjk/src/go-dev/src/runtime/signal_windows.go:173 +0x14d fp=0xa4d0c sp=0xa4cf8 pc=0x43cd4d
runtime.abort()
        C:/Users/kjk/src/go-dev/src/runtime/asm_386.s:866 +0x1 fp=0xa4d10 sp=0xa4d0c pc=0x4557b1
runtime.morestack()
        C:/Users/kjk/src/go-dev/src/runtime/asm_386.s:442 +0x24 fp=0xa4d14 sp=0xa4d10 pc=0x454334
aclements

aclements commented on Jul 5, 2018

@aclements
Member

Thanks for the great debugging @kjk! You've definitely found the root of the problem: the initial INT3 traps fine and we detect that there's a problem, but since the stack bounds aren't quite right we wind up walking off the edge of the actual stack and the subsequent failures are a different exception, which we don't handle so carefully.

It seems like we should perhaps use the TIB instead of VirtualQuery to get the stack bounds (I'm not sure why I didn't come across that when originally figuring out how to get the stack bounds), and do something to make room for handling the exception (like the slack you added in exceptionhandler).

To answer some of your other questions, which you may have already found the answers to:

Another theory: there is one-off error in the code that maps system stack (which I cannot find) because this happens reliably when accessing what looks like the lowest page of the stack.

Unlike goroutine stacks, this stack is allocated by Windows itself when we create the thread.

I assume the part with PAGE_GUARD is there by default. At some point we detect we need to commit more stack and we call mmap() to extend stack.

The PAGE_GUARD is definitely a problem. Apparently the VirtualQuery call we use to find the bounds of the stack considers that to be part of the mapping, even though we can't actually use that memory. That's what causes the runtime to set up the wrong stack bounds.

There's nothing in the Go runtime that commits more stack or in any way extends a system stack. The OS commits more stack memory as we touch it, but that's transparent to Go.

Instead of trying to remove implicit calls to morestack by adding //go:nosplit I just made it believe that everything is ok by using the slack we've added in minit().

That's not a bad idea, though it needs to be done a little more carefully. :)

I've been trying to figure out what stack the vectored exception handler runs on when it's handling a stack overflow exception without much luck. The closest I've come is https://stackoverflow.com/questions/1897301/vectored-exception-handling-during-stackoverflowexception, but that could mean the OS reserves some small dedicated stack for this purpose, or that it lets the stack grow into the PAGE_GUARD region for this purpose. Either way, it's probably better if we just completely avoid overrunning the stack.

aclements

aclements commented on Jul 5, 2018

@aclements
Member

It seems like we should perhaps use the TIB instead of VirtualQuery to get the stack bounds

Sigh. Apparently the StackLimit field in the TIB gives the limit of the committed stack, not the reserved stack, so that's not useful. There's a later field with the "Address of memory allocated for stack" but that returns the same base address as VirtualQuery.

Apparently the VirtualQuery call we use to find the bounds of the stack considers that to be part of the mapping, even though we can't actually use that memory.

Based on https://docs.microsoft.com/en-us/windows/desktop/Memory/creating-guard-pages, this is bit a different from the guard pages I'm used to. Apparently Windows will let you use that memory, but only after the process has handled a STATUS_GUARD_PAGE_VIOLATION exception.

gopherbot

gopherbot commented on Jul 6, 2018

@gopherbot
Contributor

Change https://golang.org/cl/122515 mentions this issue: runtime: fix abort handling on Windows

gopherbot

gopherbot commented on Jul 6, 2018

@gopherbot
Contributor

Change https://golang.org/cl/122516 mentions this issue: runtime: account for guard zone in Windows stack size

locked and limited conversation to collaborators on Jul 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.OS-Windows

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @kjk@aclements@ianlancetaylor@mvdan@gopherbot

        Issue actions

          runtime: infinite recursion on windows triggered by morestack · Issue #21382 · golang/go