Description
I have Windows 7 computer - windows/amd64.
I am building Go from source using go1.4 as bootstrap.
I am using commit 56d6b87
I am running make.bat command.
What did you expect to see?
I expected make.bat command finish successfully.
What did you see instead?
make.bat never finishes. It hangs, for example, like this:
c:\Users\alexb\dev\go\src>make
Building Go cmd/dist using c:\users\alexb\dev\\go1.4
Building Go toolchain1 using c:\users\alexb\dev\\go1.4.
Building Go bootstrap cmd/go (go_bootstrap) using Go toolchain1.
Building Go toolchain2 using go_bootstrap and Go toolchain1.
Building Go toolchain3 using go_bootstrap and Go toolchain2.
I used process explorer https://docs.microsoft.com/en-us/sysinternals/downloads/process-explorer to examine process tree, and that is what I see:
I also used WinDbg to attach to go_bootstrap.exe (pid 9788) and print stacks of all its threads. And that is what I see:
0:019> !uniqstack
Processing 20 threads, please wait
. 0 Id: 263c.23d8 Suspend: 1 Teb: 000007ff`fffde000 Unfrozen
Start: go_bootstrap+0x648e0 (00000000`004648e0)
Priority: 0 Priority class: 32 Affinity: f
Child-SP RetAddr Call Site
00000000`0022fae8 000007fe`fcc710ac ntdll!ZwWaitForSingleObject+0xa
00000000`0022faf0 00000000`0046494e KERNELBASE!WaitForSingleObjectEx+0x9c
00000000`0022fb90 00000000`00b37be0 go_bootstrap+0x6494e
00000000`0022fb98 00000000`004270d3 go_bootstrap+0x737be0
00000000`0022fba0 00000000`00000000 go_bootstrap+0x270d3
. 1 Id: 263c.314c Suspend: 1 Teb: 000007ff`fffdc000 Unfrozen
Start: tmmon64+0x8adac (00000000`7478adac)
Priority: 0 Priority class: 32 Affinity: f
Child-SP RetAddr Call Site
00000000`0302fcb8 000007fe`fcc71430 ntdll!ZwWaitForMultipleObjects+0xa
00000000`0302fcc0 00000000`76df06c0 KERNELBASE!GetCurrentProcess+0x40
00000000`0302fdc0 00000000`747aeed4 kernel32!WaitForMultipleObjects+0xb0
00000000`0302fe50 00000000`7478ad07 tmmon64+0xaeed4
00000000`0302ff00 00000000`7478aeae tmmon64+0x8ad07
00000000`0302ff30 00000000`76df59cd tmmon64+0x8aeae
00000000`0302ff60 00000000`76f2a561 kernel32!BaseThreadInitThunk+0xd
00000000`0302ff90 00000000`00000000 ntdll!RtlUserThreadStart+0x21
. 2 Id: 263c.2d5c Suspend: 1 Teb: 000007ff`fffda000 Unfrozen
Start: ntdll!RtlDestroyHandleTable+0x270 (00000000`76f1f6f0)
Priority: 0 Priority class: 32 Affinity: f
Child-SP RetAddr Call Site
00000000`0322fcc8 00000000`76f1ed15 ntdll!ZwWaitForWorkViaWorkerFactory+0xa
00000000`0322fcd0 00000000`76df59cd ntdll!RtlValidateHeap+0x155
00000000`0322ff60 00000000`76f2a561 kernel32!BaseThreadInitThunk+0xd
00000000`0322ff90 00000000`00000000 ntdll!RtlUserThreadStart+0x21
. 3 Id: 263c.e74 Suspend: 1 Teb: 000007ff`fffd6000 Unfrozen
Start: ntdll!TpIsTimerSet+0x8b0 (00000000`76f1a280)
Priority: 0 Priority class: 32 Affinity: f
Child-SP RetAddr Call Site
00000000`0342fcb8 00000000`76f1a3c7 ntdll!ZwWaitForMultipleObjects+0xa
00000000`0342fcc0 00000000`76df59cd ntdll!TpIsTimerSet+0x9f7
00000000`0342ff60 00000000`76f2a561 kernel32!BaseThreadInitThunk+0xd
00000000`0342ff90 00000000`00000000 ntdll!RtlUserThreadStart+0x21
. 4 Id: 263c.196c Suspend: 1 Teb: 000007ff`fffd4000 Unfrozen
Start: go_bootstrap+0x64d00 (00000000`00464d00)
Priority: 0 Priority class: 32 Affinity: f
Child-SP RetAddr Call Site
00000000`28e5eb68 00000000`76f48f58 ntdll!ZwWaitForSingleObject+0xa
00000000`28e5eb70 00000000`76f48e54 ntdll!RtlDeNormalizeProcessParams+0x5a8
00000000`28e5ec20 000007fe`fa3b7f0b ntdll!RtlDeNormalizeProcessParams+0x4a4
00000000`28e5ec50 000007fe`fa3b8504 TmUmEvt64+0x17f0b
00000000`28e5eeb0 000007fe`fa3b8c96 TmUmEvt64+0x18504
00000000`28e5ef10 000007fe`fa4565ca TmUmEvt64+0x18c96
00000000`28e5efa0 000007fe`fa455f8e TmUmEvt64+0xb65ca
00000000`28e5f000 000007fe`fa410686 TmUmEvt64+0xb5f8e
00000000`28e5f150 000007fe`fa439730 TmUmEvt64+0x70686
00000000`28e5f260 00000000`7472f146 TmUmEvt64+0x99730
00000000`28e5f4a0 00000000`747e2d7d tmmon64+0x2f146
00000000`28e5f580 00000000`747e29f4 tmmon64+0xe2d7d
00000000`28e5f640 00000000`74733748 tmmon64+0xe29f4
00000000`28e5f6b0 000007fe`fcc77c3f tmmon64+0x33748
00000000`28e5f780 00000000`0046494e KERNELBASE!ResumeThread+0xf
00000000`28e5f7b0 ffffffff`ffffffff go_bootstrap+0x6494e
00000000`28e5f7b8 00000000`00000001 0xffffffff`ffffffff
00000000`28e5f7c0 ffffffff`ffffffff 0x1
00000000`28e5f7c8 00000000`28e5f928 0xffffffff`ffffffff
00000000`28e5f7d0 00000000`00000000 0x28e5f928
. 5 Id: 263c.3204 Suspend: 1 Teb: 000007ff`fffae000 Unfrozen
Start: go_bootstrap+0x64d00 (00000000`00464d00)
Priority: 0 Priority class: 32 Affinity: f
Child-SP RetAddr Call Site
00000000`2905fb08 000007fe`fcc710ac ntdll!ZwWaitForSingleObject+0xa
00000000`2905fb10 00000000`0046494e KERNELBASE!WaitForSingleObjectEx+0x9c
00000000`2905fbb0 000000c0`0002c980 go_bootstrap+0x6494e
00000000`2905fbb8 00000000`00000000 0xc0`0002c980
. 6 Id: 263c.32f0 Suspend: 1 Teb: 000007ff`fffac000 Unfrozen
Start: go_bootstrap+0x64d00 (00000000`00464d00)
Priority: 0 Priority class: 32 Affinity: f
Child-SP RetAddr Call Site
00000000`2925fb08 000007fe`fcc710ac ntdll!ZwWaitForSingleObject+0xa
00000000`2925fb10 00000000`0046494e KERNELBASE!WaitForSingleObjectEx+0x9c
00000000`2925fbb0 000000c0`0002cd00 go_bootstrap+0x6494e
00000000`2925fbb8 00000000`00000164 0xc0`0002cd00
00000000`2925fbc0 7fffffff`00000000 0x164
00000000`2925fbc8 00000000`00000160 0x7fffffff`00000000
00000000`2925fbd0 000000c0`004c6300 0x160
00000000`2925fbd8 00000000`2925fcc0 0xc0`004c6300
00000000`2925fbe0 00000000`0043daa2 0x2925fcc0
00000000`2925fbe8 00000000`00b36bf8 go_bootstrap+0x3daa2
00000000`2925fbf0 00000000`00000000 go_bootstrap+0x736bf8
. 7 Id: 263c.29d8 Suspend: 1 Teb: 000007ff`fffaa000 Unfrozen
Start: go_bootstrap+0x64d00 (00000000`00464d00)
Priority: 0 Priority class: 32 Affinity: f
Child-SP RetAddr Call Site
00000000`2945fb08 000007fe`fcc710ac ntdll!ZwWaitForSingleObject+0xa
00000000`2945fb10 00000000`0046494e KERNELBASE!WaitForSingleObjectEx+0x9c
00000000`2945fbb0 000000c0`00080280 go_bootstrap+0x6494e
00000000`2945fbb8 00000000`00000000 0xc0`00080280
. 8 Id: 263c.104c Suspend: 1 Teb: 000007ff`fffa8000 Unfrozen
Start: go_bootstrap+0x64d00 (00000000`00464d00)
Priority: 0 Priority class: 32 Affinity: f
Child-SP RetAddr Call Site
00000000`29a8fb08 000007fe`fcc710ac ntdll!ZwWaitForSingleObject+0xa
00000000`29a8fb10 00000000`0046494e KERNELBASE!WaitForSingleObjectEx+0x9c
00000000`29a8fbb0 000000c0`00206280 go_bootstrap+0x6494e
00000000`29a8fbb8 00000000`00000000 0xc0`00206280
. 9 Id: 263c.31b4 Suspend: 1 Teb: 000007ff`fffa6000 Unfrozen
Start: go_bootstrap+0x64d00 (00000000`00464d00)
Priority: 0 Priority class: 32 Affinity: f
Child-SP RetAddr Call Site
00000000`29ccfb08 000007fe`fcc710ac ntdll!ZwWaitForSingleObject+0xa
00000000`29ccfb10 00000000`0046494e KERNELBASE!WaitForSingleObjectEx+0x9c
00000000`29ccfbb0 000000c0`00207080 go_bootstrap+0x6494e
00000000`29ccfbb8 00000000`00000000 0xc0`00207080
. 10 Id: 263c.2b14 Suspend: 1 Teb: 000007ff`fffa4000 Unfrozen
Start: go_bootstrap+0x64d00 (00000000`00464d00)
Priority: 0 Priority class: 32 Affinity: f
Child-SP RetAddr Call Site
00000000`29eceb68 00000000`76f48f58 ntdll!ZwWaitForSingleObject+0xa
00000000`29eceb70 00000000`76f48e54 ntdll!RtlDeNormalizeProcessParams+0x5a8
00000000`29ecec20 000007fe`fa3b7f0b ntdll!RtlDeNormalizeProcessParams+0x4a4
00000000`29ecec50 000007fe`fa3b8504 TmUmEvt64+0x17f0b
00000000`29eceeb0 000007fe`fa3b8c96 TmUmEvt64+0x18504
00000000`29ecef10 000007fe`fa4565ca TmUmEvt64+0x18c96
00000000`29ecefa0 000007fe`fa455f8e TmUmEvt64+0xb65ca
00000000`29ecf000 000007fe`fa410686 TmUmEvt64+0xb5f8e
00000000`29ecf150 000007fe`fa439730 TmUmEvt64+0x70686
00000000`29ecf260 00000000`7472f146 TmUmEvt64+0x99730
00000000`29ecf4a0 00000000`747e2d7d tmmon64+0x2f146
00000000`29ecf580 00000000`747e29f4 tmmon64+0xe2d7d
00000000`29ecf640 00000000`74733748 tmmon64+0xe29f4
00000000`29ecf6b0 000007fe`fcc77c3f tmmon64+0x33748
00000000`29ecf780 00000000`0046494e KERNELBASE!ResumeThread+0xf
00000000`29ecf7b0 ffffffff`ffffffff go_bootstrap+0x6494e
00000000`29ecf7b8 00000000`00000000 0xffffffff`ffffffff
. 11 Id: 263c.30b4 Suspend: 1 Teb: 000007ff`fffa2000 Unfrozen
Start: go_bootstrap+0x64d00 (00000000`00464d00)
Priority: 0 Priority class: 32 Affinity: f
Child-SP RetAddr Call Site
00000000`2a10fb08 000007fe`fcc710ac ntdll!ZwWaitForSingleObject+0xa
00000000`2a10fb10 00000000`0046494e KERNELBASE!WaitForSingleObjectEx+0x9c
00000000`2a10fbb0 000000c0`00195400 go_bootstrap+0x6494e
00000000`2a10fbb8 00000000`00000009 0xc0`00195400
00000000`2a10fbc0 00000000`00000000 0x9
. 12 Id: 263c.30fc Suspend: 1 Teb: 000007ff`fffa0000 Unfrozen
Start: go_bootstrap+0x64d00 (00000000`00464d00)
Priority: 0 Priority class: 32 Affinity: f
Child-SP RetAddr Call Site
00000000`2a30f248 00000000`76f48f58 ntdll!ZwWaitForSingleObject+0xa
00000000`2a30f250 00000000`76f48e54 ntdll!RtlDeNormalizeProcessParams+0x5a8
00000000`2a30f300 000007fe`fa3b7f0b ntdll!RtlDeNormalizeProcessParams+0x4a4
00000000`2a30f330 000007fe`fa3b8504 TmUmEvt64+0x17f0b
00000000`2a30f590 000007fe`fa3b8c96 TmUmEvt64+0x18504
00000000`2a30f5f0 000007fe`fa4565ca TmUmEvt64+0x18c96
00000000`2a30f680 000007fe`fa455f8e TmUmEvt64+0xb65ca
00000000`2a30f6e0 000007fe`fa410686 TmUmEvt64+0xb5f8e
00000000`2a30f830 000007fe`fa4363b0 TmUmEvt64+0x70686
00000000`2a30f940 00000000`7472f146 TmUmEvt64+0x963b0
00000000`2a30fb80 00000000`747e2d7d tmmon64+0x2f146
00000000`2a30fc60 00000000`747e29f4 tmmon64+0xe2d7d
00000000`2a30fd20 00000000`74732869 tmmon64+0xe29f4
00000000`2a30fd90 00000000`0046494e tmmon64+0x32869
00000000`2a30fe60 000000c0`00306100 go_bootstrap+0x6494e
00000000`2a30fe68 000000c0`0016b840 0xc0`00306100
00000000`2a30fe70 00000000`2a30fe60 0xc0`0016b840
00000000`2a30fe78 00000000`00000000 0x2a30fe60
. 13 Id: 263c.13a0 Suspend: 2 Teb: 000007ff`fff9e000 Unfrozen
Start: go_bootstrap+0x64d00 (00000000`00464d00)
Priority: 0 Priority class: 32 Affinity: f
Child-SP RetAddr Call Site
00000000`2a50ec50 000007fe`fa3b8504 TmUmEvt64+0x17fa0
00000000`2a50eeb0 000007fe`fa3b8c96 TmUmEvt64+0x18504
00000000`2a50ef10 000007fe`fa4565ca TmUmEvt64+0x18c96
00000000`2a50efa0 000007fe`fa455f8e TmUmEvt64+0xb65ca
00000000`2a50f000 000007fe`fa410686 TmUmEvt64+0xb5f8e
00000000`2a50f150 000007fe`fa439730 TmUmEvt64+0x70686
00000000`2a50f260 00000000`7472f146 TmUmEvt64+0x99730
00000000`2a50f4a0 00000000`747e2d7d tmmon64+0x2f146
00000000`2a50f580 00000000`747e29f4 tmmon64+0xe2d7d
00000000`2a50f640 00000000`74733748 tmmon64+0xe29f4
00000000`2a50f6b0 000007fe`fcc77c3f tmmon64+0x33748
00000000`2a50f780 00000000`0046494e KERNELBASE!ResumeThread+0xf
00000000`2a50f7b0 ffffffff`ffffffff go_bootstrap+0x6494e
00000000`2a50f7b8 00000000`00000000 0xffffffff`ffffffff
. 15 Id: 263c.1c34 Suspend: 1 Teb: 000007ff`fff9a000 Unfrozen
Start: go_bootstrap+0x64d00 (00000000`00464d00)
Priority: 0 Priority class: 32 Affinity: f
Child-SP RetAddr Call Site
00000000`2a90faf8 000007fe`fcc710ac ntdll!ZwWaitForSingleObject+0xa
00000000`2a90fb00 00000000`0046494e KERNELBASE!WaitForSingleObjectEx+0x9c
00000000`2a90fba0 000000c0`003b9400 go_bootstrap+0x6494e
00000000`2a90fba8 ffffffff`fff85ee0 0xc0`003b9400
00000000`2a90fbb0 00000000`00000000 0xffffffff`fff85ee0
. 16 Id: 263c.2530 Suspend: 1 Teb: 000007ff`fff98000 Unfrozen
Start: go_bootstrap+0x64d00 (00000000`00464d00)
Priority: 0 Priority class: 32 Affinity: f
Child-SP RetAddr Call Site
00000000`2ab0fb08 000007fe`fcc710ac ntdll!ZwWaitForSingleObject+0xa
00000000`2ab0fb10 00000000`0046494e KERNELBASE!WaitForSingleObjectEx+0x9c
00000000`2ab0fbb0 000000c0`003b9780 go_bootstrap+0x6494e
00000000`2ab0fbb8 00000000`00000009 0xc0`003b9780
00000000`2ab0fbc0 00000000`00000000 0x9
. 17 Id: 263c.3114 Suspend: 1 Teb: 000007ff`fff96000 Unfrozen
Start: go_bootstrap+0x64d00 (00000000`00464d00)
Priority: 0 Priority class: 32 Affinity: f
Child-SP RetAddr Call Site
00000000`2ad0fb08 000007fe`fcc710ac ntdll!ZwWaitForSingleObject+0xa
00000000`2ad0fb10 00000000`0046494e KERNELBASE!WaitForSingleObjectEx+0x9c
00000000`2ad0fbb0 000000c0`00508d00 go_bootstrap+0x6494e
00000000`2ad0fbb8 00000000`00000000 0xc0`00508d00
. 18 Id: 263c.3344 Suspend: 3 Teb: 000007ff`fff94000 Unfrozen
Start: go_bootstrap+0x64d00 (00000000`00464d00)
Priority: 0 Priority class: 32 Affinity: f
Child-SP RetAddr Call Site
00000000`2af0fb08 000007fe`fcc71430 ntdll!ZwWaitForMultipleObjects+0xa
00000000`2af0fb10 00000000`76df06c0 KERNELBASE!GetCurrentProcess+0x40
00000000`2af0fc10 00000000`0046494e kernel32!WaitForMultipleObjects+0xb0
00000000`2af0fca0 000007ff`fff94000 go_bootstrap+0x6494e
00000000`2af0fca8 00000000`00000000 0x7ff`fff94000
. 19 Id: 263c.3128 Suspend: 1 Teb: 000007ff`fff92000 Unfrozen
Start: ntdll!DbgUiRemoteBreakin (00000000`76ff2dd0)
Priority: 0 Priority class: 32 Affinity: f
Child-SP RetAddr Call Site
00000000`2b10ff28 00000000`76ff2e08 ntdll!DbgBreakPoint
00000000`2b10ff30 00000000`76df59cd ntdll!DbgUiRemoteBreakin+0x38
00000000`2b10ff60 00000000`76f2a561 kernel32!BaseThreadInitThunk+0xd
00000000`2b10ff90 00000000`00000000 ntdll!RtlUserThreadStart+0x21
Total threads: 20
Duplicate callstacks: 1 (windbg thread #s follow):
14
I was able to use Delve to examine this bug once (see #35775 (comment)), but not anymore. Delve just fails to attach now.
I can reproduce this pretty reliably on this particular computer - make.bat never completes. Sometimes it hangs in go_bootstrap.exe and sometimes in compile.exe. Sometimes there are more than single hung compile.exe.
I can make problem go away, if I change source code to have runtime.preemptMSupported set to false.
/cc @aclements
Alex
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Activity
aclements commentedon Jan 10, 2020
Thanks for the detailed report.
Most of the threads look uninteresting, except, I think, these two:
These are all stuck in
ResumeThread
, which I didn't know threads could get stuck in.Thread 13 is also interesting because the "suspend count" is 2, suggesting that some other thread has suspended it and is failing to resume it. This may also be why threads 4 and 10 are stopped in obviously blocking operations, while thread 13 is stopped at a seemingly random point.
Do you know what "tmmon64" and "TmUmEvt64" are?
Maybe we just need to hold the suspendLock for longer (though I don't have a theory for why this would be). What happens if you move the
unlock(&suspendLock)
inpreemptM
to the very bottom of the function?alexbrainman commentedon Jan 10, 2020
This is not my computer, so I cannot properly poke at it. But I suspect this computer has some standard anrivirus software installed. And these kinds of software often would install their code to intercept real Win32 API calls.
I won't see her on the weekend. But I will try it Monday or Tuesday. Also please show diff of your suggested change, because I don't trust myself with reading your mind.
Thank you.
Alex
alexbrainman commentedon Jan 13, 2020
I changed the code, like this
And I managed to run make.bat once to successful completion. But then it hung as before, when I run make.bat second time. This time, it is compile.exe that hung. Here is the stack trace from windbg
Alex
alexbrainman commentedon Apr 19, 2020
I just tried to verify this issue. And it is still broken on af9ab6b. It gets stuck here
This what process tree looks like when stuck
And this is what windbg says about compile.exe process with pid of 11956:
Alex
aclements commentedon Apr 20, 2020
Thanks for the two other dumps.
I did some more searching around and I'm almost positive tmmon64 and TmUmEvt64 are related to Trend Micro anti-virus, which agrees with what you said in #36492 (comment).
Unfortunately, I think its syscall interception is introducing a lock cycle that's leading to a deadlock.
In #36492 (comment), threads 6 and 8 have suspend counts > 1. Threads 0 and 6 (again) are in ResumeThread. So, thread 6 must have suspended thread 8 for preemption, and then when it was trying to resume thread 8, thread 0 suspended thread 6 for preemption. Where this gets interesting is that all three of these threads are in Windows memory allocation functions via tmmon64/TmUmEvt64. My guess is there's a cycle between threads 0 and 6: TmUmEvt64 on thread 6 locked the Windows heap inside ResumeThread, and was then suspended with that lock held. When thread 0 then tried to resume it with ResumeThread, TmUmEvt64 again tried to lock the Windows heap, but it can't get that lock, so it's stuck.
#36492 (comment) shows similar evidence: thread 6 is suspended in RtlUnlockHeap (via TmUmEvt64) and thread 4 is in GetThreadContext -> TmUmEvt64 -> RtlAllocateHeap, indicating that it has thread 6 suspended and is in a lock cycle on the Windows heap lock. Even completely serializing thread suspend/resume by moving the unlock doesn't help enough because TmUmEvt64 can wind up in Windows heap functions through other means.
So, ultimately, this is probably a bug in Trend Micro, but only because we're doing something really unusual with suspending our own threads, which, sadly, probably makes this our problem. The downsides of using SuspendThread keep piling up, but I have no idea what to replace it with. :(
aclements commentedon Apr 20, 2020
Fascinating. It seems .NET uses SuspendThread for driving threads to GC safe-points. It seems they ran into similar problems with Windows heap locks in general, though not specifically related to system call interceptors.
I don't see anything in threadsuspend.cpp itself that's obviously different from what we do, but there's a huge comment about OS resources and SuspendThread that indicates they're carefully synchronizing every transition into and out of managed code (presumably this includes every "system call") so they don't even attempt to suspend a thread that isn't in managed code. If I've followed the twisty passages correctly, this winds up at DisablePreemptiveGC and EnablePreemptiveGC and ultimately RareDisablePreemptiveGC and RareEnablePreemptiveGC, which look like they can block transitions into and out of managed code depending on GC preemption state.
alexbrainman commentedon Apr 23, 2020
You are, probably, correct. I use this PC at work. Our admin run whatever software they like on it.
I agree. I run a lot of different programs on that computer, and none of it hangs.
This makes Go build tools impossible to use on this computer. I suspect the same can be said about programs built with Go.
The PC is still running Windows 7 - which is rare this days. Hopefully this bug is uncommon.
Personally I don't see any benefits from preemption. I don't run any code that requires preemption on Windows. I would just disable preempt code on Windows. You will also avoid Delve problems on Windows.
It is quite possible. But I expect to see restrictions like that mentioned on Windows API descriptions. I am not aware of any such thing.
Adding @zx2c4 in case he has some bright ideas.
Alex
10 remaining items