-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
Finally think I have gotten to the root cause of the raft of failures seen in x86 jitstress=2 gcstress=0xC zapdisable=1 runs discussed over in dotnet/coreclr#17330. The case I have been looking at is baseservices\threading\waithandle\waitall\waitallex3 -- the other waitall tests should fail for similar reasons.
The key method here is System.Threading.WaitHandle:WaitAll(ref,int,bool):bool
. This method is normally prejitted but with zapdisable=1 it gets jitted during this run. Jitstress kicks in and via STRESS_BB_PROFILE ends up altering the block layout. So we end up not moving BBJ_THROW blocks to the end of the method.
That leaves us with this sequence of generated code:
07150225 8b4304 mov eax,dword ptr [ebx+4] << -- ESI dead here per gc info
07150228 0580000000 add eax,80h
0715022d 3bc6 cmp eax,esi
0715022f 7ec3 jle System_Private_CoreLib!System.Threading.WaitHandle.WaitAll(System.Threading.WaitHandle[], Int32, Boolean)+0x174 (071501f4)
;; jit knows this call does not return
07150231 ff151c1e4205 call dword ptr ds:[5421E1Ch] (System.Threading.WaitHandle.ThrowAbandonedMutexException(), mdToken: 06002692)
;; join
07150237 8945f0 mov dword ptr [ebp-10h],eax <<< ESI live here
0715023a 8b548608 mov edx,dword ptr [esi+eax*4+8]
This method also has fully interruptible GC reporting. The jit knows ThrowAbandonedMutexException
does not return and reflects that in its liveness calculations. So ESI is not live at the call but becomes live right after the call.
During the run at the point of failure, this method has just called ThrowAbandonedMutexException
GC stress kicks off a GC in the prestub. The VM walks the stack for GC and reports that the EIP for this frame is 07150237. When enumerating GC roots, it reports ESI as live. But when in the callee, ESI is not in fact live and contains the scalar value 0x80. So the reporting triggers an assert for an invalid objref.
The logic in EECodeManager::EnumGcRefs
knows that for stack locations liveness can change across a call, but it (implicitly?) assumes that for callee-saves registers that liveness before/after the call should be the same. So when reporting it looks at the after call liveness state.
Possible fixes here include:
- modify
EnumGcRefs
to make the samecurOffs
adjustment for non-active fully interruptible callee save registers that it does for stack locations, to use their "in call" liveness state. - modify the jit to pad after normal user calls that it knows do not return so that the before and after liveness state of callee saves registers is the same. I have this prototyped and suspect it will be pretty surgical.
There are still a few loose ends to chase down:
- why we didn't notice this long ago, as the does not return logic has been in the jit a while
- why this only impacts x86
- whether or not this situation can arise without jitstress