Skip to content

Memory dump of AccessViolationException on gc_heap::mark_object_simple and heap corruption #65694

@arekpalinski

Description

@arekpalinski

When running our test suite we got a crash: The process was terminated due to an internal error in the .NET Runtime at IP 00007FFDC50AB5FC (00007FFDC5010000) with exit code 80131506.:

Faulting application name: dotnet.exe, version: 6.0.222.6406, time stamp: 0x61e1d8df
Faulting module name: coreclr.dll, version: 6.0.222.6406, time stamp: 0x61e1d09e
Exception code: 0xc0000005
Fault offset: 0x000000000009b5fc
Faulting process id: 0x22b4
Faulting application start time: 0x01d823e188848226
Faulting application path: C:\Program Files\dotnet\dotnet.exe
Faulting module path: C:\Program Files\dotnet\shared\Microsoft.NETCore.App\6.0.2\coreclr.dll

We have configured automatic memory dumps creation which resulted in creating the following memory dump:

https://drive.google.com/file/d/19S1k74Foe9V6A03hRwIuebE42GVQUirI/view?usp=sharing

In our project (github.com/ravendb/ravendb) we use unmanaged memory directly, so it might be that it's because of our code.

The following analysis was made so far in WinDBG.

  1. Based on !analyze -v I the crashing stacktrace is:
EXCEPTION_RECORD:  (.exr -1)
ExceptionAddress: 00007ffdc50ab5fc (coreclr!WKS::gc_heap::mark_object_simple+0x000000000000011c)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000001
NumberParameters: 2
   Parameter[0]: 0000000000000000
   Parameter[1]: 0000022da7b73000
Attempt to read from address 0000022da7b73000
coreclr!WKS::gc_heap::mark_object_simple+0x11c
coreclr!WKS::GCHeap::Promote+0x74
coreclr!GcEnumObject+0x76
coreclr!GcInfoDecoder::EnumerateLiveSlots+0x792
coreclr!EECodeManager::EnumGcRefs+0xe9
coreclr!GcStackCrawlCallBack+0x12f
coreclr!Thread::StackWalkFramesEx+0xee
coreclr!Thread::StackWalkFrames+0xae
coreclr!ScanStackRoots+0x7a
coreclr!GCToEEInterface::GcScanRoots+0x9f
coreclr!WKS::gc_heap::mark_phase+0x291
coreclr!WKS::gc_heap::gc1+0x98
coreclr!WKS::gc_heap::garbage_collect+0x1ad
coreclr!WKS::GCHeap::GarbageCollectGeneration+0x14f
coreclr!WKS::gc_heap::trigger_gc_for_alloc+0x2b
coreclr!WKS::gc_heap::try_allocate_more_space+0x5c141
coreclr!WKS::gc_heap::allocate_more_space+0x31
coreclr!WKS::GCHeap::Alloc+0x84
coreclr!JIT_NewArr1+0x4bd
0x00007ffd`778e62c6
0x00007ffd`6f3e3770
0x00007ffd`741c1efb
...
0x00007ffd`67d765da
0x00007ffd`67deeff2
coreclr!CallDescrWorkerInternal+0x83
coreclr!DispatchCallSimple+0x80
coreclr!ThreadNative::KickOffThread_Worker+0x63
coreclr!ManagedThreadBase_DispatchMiddle+0x85
coreclr!ManagedThreadBase_DispatchOuter+0xae
coreclr!ThreadNative::KickOffThread+0x79
kernel32!BaseThreadInitThunk+0x14
ntdll!RtlUserThreadStart+0x21
  1. The heap is corrupted:
0:340> !verifyheap
object 0000022da400fff8: bad member 0000022D04C05821 at 0000022DA4010000
Last good object: 0000022DA400FFE0.
  1. The last good object is:
0:340> !do 0000022DA400FFE0
Name:        Sparrow.Utils.TimeoutManager+<>c__DisplayClass6_0
MethodTable: 00007ffd67a1a6e0
EEClass:     00007ffd67a24988
Tracked Type: false
Size:        24(0x18) bytes
File:        c:\Jenkins\workspace\PR_Tests\s\test\SlowTests\bin\Release\net6.0\Sparrow.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ffd679b44a0  40002eb        8 ...Private.CoreLib]]  0 instance 0000022da4010160 onCancel

I see onCancel member so it's likely the following from TimeoutManager.cs:

var onCancel = new TaskCompletionSource<object>(TaskCreationOptions.RunContinuationsAsynchronously);
using (token.Register(tcs => onCancel.TrySetCanceled(), onCancel))
{
}

https://github.com/ravendb/ravendb/blob/193624d559fe2e6525cc383de362c83d19aacffd/src/Sparrow/Utils/TimeoutManager.cs#L139

  1. Bad object is:
0:340> !do 0000022da400fff8
Name:        System.Action`1[[System.Object, System.Private.CoreLib]]
MethodTable: 00007ffd66a69428
EEClass:     00007ffd658f6788
Tracked Type: false
Size:        64(0x40) bytes
File:        C:\Program Files\dotnet\shared\Microsoft.NETCore.App\6.0.2\System.Private.CoreLib.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007ffd655a5678  40001ec        8        System.Object  0 instance 0000022d04c05821 _target
00007ffd655a5678  40001ed       10        System.Object  0 instance 0000000000000000 _methodBase
00007ffd65654228  40001ee       18        System.IntPtr  1 instance 00007FFD67569160 _methodPtr
00007ffd65654228  40001ef       20        System.IntPtr  1 instance 0000000000000000 _methodPtrAux
00007ffd655a5678  4000273       28        System.Object  0 instance 0000000000000000 _invocationList
00007ffd65654228  4000274       30        System.IntPtr  1 instance 0000000000000000 _invocationCount

It is System.Action1[[System.Object, System.Private.CoreLib]] so my suspicion is that it's this action tcs => onCancel.TrySetCanceled().

The attempt to get its _target results in:

!DumpObj /d 0000022d04c05821
<Note: this object has an invalid CLASS field>
Invalid object

The address matches the output of verifyheap - bad member 0000022D04C05821 at 0000022DA4010000 so we know that the corrupted member is _target.

  1. I see that in our code we use directly onCancel variable in tcs => onCancel.TrySetCanceled() instead of using the callback action: tcs => ((TaskCompletionSource<object>)tcs).TrySetCanceled() but effectively it's the same thing. Could it cause any GC problems and result in something like that?

Metadata

Metadata

Assignees

Labels

area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions