Skip to content

Generic dictionary lookup race condition on expansion due to JIT codegen issue #101131

@davidwrighton

Description

@davidwrighton

Description

The situation is that we are in a method compiled for shared generics, which has several specific conditions.

  1. The dictionary layout has a minimum layout size of 4 slots, but an actual layout size of 8. (The exact number is irrelevant, but this is the particular case we have)
  2. There are multiple threads accessing the dictionary of the same type at roughly the same time.
    a. Thread A is attempting to access slot 4
    b. Thread B is attempting to access slot 5
    c. Thread C is attempting to access slot 4
  3. These threads run in the following interleaving
    a. Thread A attempts to access slot 4, determines the value which should be placed in the slot, and gathers the address of the slot which should be updated
    b. Thread B attempts to access slot 5, determines the value which should be placed in the slot, determines that the accessing the slot requires expanding the generic dictionary, expands it, and copies the current state of the slots into the newly expanded dictionary. Notably, since thread A has not written to slot 4, that slot remains as NULL. However, the expanded dictionary is not yet written into the MethodDesc
    c. Thread A fills in slot 4 in the old dictionary with the new value.
    d. Thread C reads the value in slot 4, and determines that it is not NULL, and so commits to the fast path of generic dictionary lookup.
    e. Thread B writes the newly expanded dictionary into the MethodDesc
    f. Thread C then reads the value in slot 4, and since the expanded dictionary does not have a non-NULL value in slot 4, the rest of execution fails unexpectedly.

The issue is caused by incorrect codegen by the JIT, at least in Tier0 mode.

The failing logic is here

/_/src/Compilers/Core/Portable/Collections/ImmutableArrayExtensions.cs @ 575:
00007ffb`b9e1b4e0 488b4d10        mov     rcx,qword ptr [rbp+10h]
00007ffb`b9e1b4e4 488b4910        mov     rcx,qword ptr [rcx+10h]
00007ffb`b9e1b4e8 4883792800      cmp     qword ptr [rcx+28h],0
00007ffb`b9e1b4ed 7415            je      Microsoft_CodeAnalysis!Microsoft.CodeAnalysis.ImmutableArrayExtensions.ToDictionary[[System.__Canon, System.Private.CoreLib],[System.__Canon, System.Private.CoreLib]](System.Collections.Immutable.ImmutableArray`1<System.__Canon>, System.Func`2<System.__Canon,System.__Canon>, System.Collections.Generic.IEqualityComparer`1<System.__Canon>)+0x514 (00007ffb`b9e1b504)
00007ffb`b9e1b4ef 488b4d10        mov     rcx,qword ptr [rbp+10h]
00007ffb`b9e1b4f3 488b4910        mov     rcx,qword ptr [rcx+10h]
00007ffb`b9e1b4f7 488b4928        mov     rcx,qword ptr [rcx+28h]
00007ffb`b9e1b4fb 48898d10ffffff  mov     qword ptr [rbp-0F0h],rcx
00007ffb`b9e1b502 eb1a            jmp     Microsoft_CodeAnalysis!Microsoft.CodeAnalysis.ImmutableArrayExtensions.ToDictionary[[System.__Canon, System.Private.CoreLib],[System.__Canon, System.Private.CoreLib]](System.Collections.Immutable.ImmutableArray`1<System.__Canon>, System.Func`2<System.__Canon,System.__Canon>, System.Collections.Generic.IEqualityComparer`1<System.__Canon>)+0x52e (00007ffb`b9e1b51e)
00007ffb`b9e1b504 488b4d10        mov     rcx,qword ptr [rbp+10h]
00007ffb`b9e1b508 48bab0cef0b9fb7f0000 mov rdx,7FFBB9F0CEB0h
00007ffb`b9e1b512 e8b9b8665e      call    coreclr!JIT_GenericHandleMethod (00007ffc`18486dd0)
00007ffb`b9e1b517 48898510ffffff  mov     qword ptr [rbp-0F0h],rax
00007ffb`b9e1b51e 488b8d10ffffff  mov     rcx,qword ptr [rbp-0F0h]

The problem is that the dictionary returned by
00007ffb`b9e1b4e4 488b4910 mov rcx,qword ptr [rcx+10h]
Can be updated by another thread, which can cause the actual dictionary entry to be resolved without having a proper NULL check.

There are 2 possibilities for valid code here.
Either we could generate code that looks like…

/_/src/Compilers/Core/Portable/Collections/ImmutableArrayExtensions.cs @ 575:
mov     rcx,qword ptr [rbp+10h]
mov     rcx,qword ptr [rcx+10h]
cmp     qword ptr [rcx+28h],0
je      Microsoft_CodeAnalysis!Microsoft.CodeAnalysis.ImmutableArrayExtensions.ToDictionary[[System.__Canon, System.Private.CoreLib],[System.__Canon, System.Private.CoreLib]](System.Collections.Immutable.ImmutableArray`1<System.__Canon>, System.Func`2<System.__Canon,System.__Canon>, System.Collections.Generic.IEqualityComparer`1<System.__Canon>)+0x514 (00007ffb`b9e1b504)
mov     rcx,qword ptr [rcx+28h]
mov     qword ptr [rbp-0F0h],rcx
jmp     Microsoft_CodeAnalysis!Microsoft.CodeAnalysis.ImmutableArrayExtensions.ToDictionary[[System.__Canon, System.Private.CoreLib],[System.__Canon, System.Private.CoreLib]](System.Collections.Immutable.ImmutableArray`1<System.__Canon>, System.Func`2<System.__Canon,System.__Canon>, System.Collections.Generic.IEqualityComparer`1<System.__Canon>)+0x52e (00007ffb`b9e1b51e)
mov     rcx,qword ptr [rbp+10h]
mov rdx,7FFBB9F0CEB0h
call    coreclr!JIT_GenericHandleMethod (00007ffc`18486dd0)
mov     qword ptr [rbp-0F0h],rax
mov     rcx,qword ptr [rbp-0F0h]

Where we only read the dictionary pointer once.

OR

/_/src/Compilers/Core/Portable/Collections/ImmutableArrayExtensions.cs @ 575:
mov     rcx,qword ptr [rbp+10h]
mov     rcx,qword ptr [rcx+10h]
mov     rcx,qword ptr [rcx+28h]
cmp     rcx,0
je      Microsoft_CodeAnalysis!Microsoft.CodeAnalysis.ImmutableArrayExtensions.ToDictionary[[System.__Canon, System.Private.CoreLib],[System.__Canon, System.Private.CoreLib]](System.Collections.Immutable.ImmutableArray`1<System.__Canon>, System.Func`2<System.__Canon,System.__Canon>, System.Collections.Generic.IEqualityComparer`1<System.__Canon>)+0x514 (00007ffb`b9e1b504)
mov     qword ptr [rbp-0F0h],rcx
jmp     Microsoft_CodeAnalysis!Microsoft.CodeAnalysis.ImmutableArrayExtensions.ToDictionary[[System.__Canon, System.Private.CoreLib],[System.__Canon, System.Private.CoreLib]](System.Collections.Immutable.ImmutableArray`1<System.__Canon>, System.Func`2<System.__Canon,System.__Canon>, System.Collections.Generic.IEqualityComparer`1<System.__Canon>)+0x52e (00007ffb`b9e1b51e)
mov     rcx,qword ptr [rbp+10h]
mov rdx,7FFBB9F0CEB0h
call    coreclr!JIT_GenericHandleMethod (00007ffc`18486dd0)
mov     qword ptr [rbp-0F0h],rax
mov     rcx,qword ptr [rbp-0F0h]

Where we read the dictionary entry once.

Reproduction Steps

Run code which uses many threads, on very subtly different code paths. This particular failure happens in Roslyn, but requires the machine running the code to be running under a tremendous amount of stress.

Expected behavior

Generic dictionary should be able to expand without triggering a crash.

Actual behavior

Under extremely rare conditions it will fail.

Regression?

No, this failing behavior has been in place since 2020.

Known Workarounds

No response

Configuration

No response

Other information

No response

Metadata

Metadata

Assignees

Labels

area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions