-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
Description
The situation is that we are in a method compiled for shared generics, which has several specific conditions.
- The dictionary layout has a minimum layout size of 4 slots, but an actual layout size of 8. (The exact number is irrelevant, but this is the particular case we have)
- There are multiple threads accessing the dictionary of the same type at roughly the same time.
a. Thread A is attempting to access slot 4
b. Thread B is attempting to access slot 5
c. Thread C is attempting to access slot 4 - These threads run in the following interleaving
a. Thread A attempts to access slot 4, determines the value which should be placed in the slot, and gathers the address of the slot which should be updated
b. Thread B attempts to access slot 5, determines the value which should be placed in the slot, determines that the accessing the slot requires expanding the generic dictionary, expands it, and copies the current state of the slots into the newly expanded dictionary. Notably, since thread A has not written to slot 4, that slot remains as NULL. However, the expanded dictionary is not yet written into the MethodDesc
c. Thread A fills in slot 4 in the old dictionary with the new value.
d. Thread C reads the value in slot 4, and determines that it is not NULL, and so commits to the fast path of generic dictionary lookup.
e. Thread B writes the newly expanded dictionary into the MethodDesc
f. Thread C then reads the value in slot 4, and since the expanded dictionary does not have a non-NULL value in slot 4, the rest of execution fails unexpectedly.
The issue is caused by incorrect codegen by the JIT, at least in Tier0 mode.
The failing logic is here
/_/src/Compilers/Core/Portable/Collections/ImmutableArrayExtensions.cs @ 575:
00007ffb`b9e1b4e0 488b4d10 mov rcx,qword ptr [rbp+10h]
00007ffb`b9e1b4e4 488b4910 mov rcx,qword ptr [rcx+10h]
00007ffb`b9e1b4e8 4883792800 cmp qword ptr [rcx+28h],0
00007ffb`b9e1b4ed 7415 je Microsoft_CodeAnalysis!Microsoft.CodeAnalysis.ImmutableArrayExtensions.ToDictionary[[System.__Canon, System.Private.CoreLib],[System.__Canon, System.Private.CoreLib]](System.Collections.Immutable.ImmutableArray`1<System.__Canon>, System.Func`2<System.__Canon,System.__Canon>, System.Collections.Generic.IEqualityComparer`1<System.__Canon>)+0x514 (00007ffb`b9e1b504)
00007ffb`b9e1b4ef 488b4d10 mov rcx,qword ptr [rbp+10h]
00007ffb`b9e1b4f3 488b4910 mov rcx,qword ptr [rcx+10h]
00007ffb`b9e1b4f7 488b4928 mov rcx,qword ptr [rcx+28h]
00007ffb`b9e1b4fb 48898d10ffffff mov qword ptr [rbp-0F0h],rcx
00007ffb`b9e1b502 eb1a jmp Microsoft_CodeAnalysis!Microsoft.CodeAnalysis.ImmutableArrayExtensions.ToDictionary[[System.__Canon, System.Private.CoreLib],[System.__Canon, System.Private.CoreLib]](System.Collections.Immutable.ImmutableArray`1<System.__Canon>, System.Func`2<System.__Canon,System.__Canon>, System.Collections.Generic.IEqualityComparer`1<System.__Canon>)+0x52e (00007ffb`b9e1b51e)
00007ffb`b9e1b504 488b4d10 mov rcx,qword ptr [rbp+10h]
00007ffb`b9e1b508 48bab0cef0b9fb7f0000 mov rdx,7FFBB9F0CEB0h
00007ffb`b9e1b512 e8b9b8665e call coreclr!JIT_GenericHandleMethod (00007ffc`18486dd0)
00007ffb`b9e1b517 48898510ffffff mov qword ptr [rbp-0F0h],rax
00007ffb`b9e1b51e 488b8d10ffffff mov rcx,qword ptr [rbp-0F0h]
The problem is that the dictionary returned by
00007ffb`b9e1b4e4 488b4910 mov rcx,qword ptr [rcx+10h]
Can be updated by another thread, which can cause the actual dictionary entry to be resolved without having a proper NULL check.
There are 2 possibilities for valid code here.
Either we could generate code that looks like…
/_/src/Compilers/Core/Portable/Collections/ImmutableArrayExtensions.cs @ 575:
mov rcx,qword ptr [rbp+10h]
mov rcx,qword ptr [rcx+10h]
cmp qword ptr [rcx+28h],0
je Microsoft_CodeAnalysis!Microsoft.CodeAnalysis.ImmutableArrayExtensions.ToDictionary[[System.__Canon, System.Private.CoreLib],[System.__Canon, System.Private.CoreLib]](System.Collections.Immutable.ImmutableArray`1<System.__Canon>, System.Func`2<System.__Canon,System.__Canon>, System.Collections.Generic.IEqualityComparer`1<System.__Canon>)+0x514 (00007ffb`b9e1b504)
mov rcx,qword ptr [rcx+28h]
mov qword ptr [rbp-0F0h],rcx
jmp Microsoft_CodeAnalysis!Microsoft.CodeAnalysis.ImmutableArrayExtensions.ToDictionary[[System.__Canon, System.Private.CoreLib],[System.__Canon, System.Private.CoreLib]](System.Collections.Immutable.ImmutableArray`1<System.__Canon>, System.Func`2<System.__Canon,System.__Canon>, System.Collections.Generic.IEqualityComparer`1<System.__Canon>)+0x52e (00007ffb`b9e1b51e)
mov rcx,qword ptr [rbp+10h]
mov rdx,7FFBB9F0CEB0h
call coreclr!JIT_GenericHandleMethod (00007ffc`18486dd0)
mov qword ptr [rbp-0F0h],rax
mov rcx,qword ptr [rbp-0F0h]
Where we only read the dictionary pointer once.
OR
/_/src/Compilers/Core/Portable/Collections/ImmutableArrayExtensions.cs @ 575:
mov rcx,qword ptr [rbp+10h]
mov rcx,qword ptr [rcx+10h]
mov rcx,qword ptr [rcx+28h]
cmp rcx,0
je Microsoft_CodeAnalysis!Microsoft.CodeAnalysis.ImmutableArrayExtensions.ToDictionary[[System.__Canon, System.Private.CoreLib],[System.__Canon, System.Private.CoreLib]](System.Collections.Immutable.ImmutableArray`1<System.__Canon>, System.Func`2<System.__Canon,System.__Canon>, System.Collections.Generic.IEqualityComparer`1<System.__Canon>)+0x514 (00007ffb`b9e1b504)
mov qword ptr [rbp-0F0h],rcx
jmp Microsoft_CodeAnalysis!Microsoft.CodeAnalysis.ImmutableArrayExtensions.ToDictionary[[System.__Canon, System.Private.CoreLib],[System.__Canon, System.Private.CoreLib]](System.Collections.Immutable.ImmutableArray`1<System.__Canon>, System.Func`2<System.__Canon,System.__Canon>, System.Collections.Generic.IEqualityComparer`1<System.__Canon>)+0x52e (00007ffb`b9e1b51e)
mov rcx,qword ptr [rbp+10h]
mov rdx,7FFBB9F0CEB0h
call coreclr!JIT_GenericHandleMethod (00007ffc`18486dd0)
mov qword ptr [rbp-0F0h],rax
mov rcx,qword ptr [rbp-0F0h]
Where we read the dictionary entry once.
Reproduction Steps
Run code which uses many threads, on very subtly different code paths. This particular failure happens in Roslyn, but requires the machine running the code to be running under a tremendous amount of stress.
Expected behavior
Generic dictionary should be able to expand without triggering a crash.
Actual behavior
Under extremely rare conditions it will fail.
Regression?
No, this failing behavior has been in place since 2020.
Known Workarounds
No response
Configuration
No response
Other information
No response