Skip to content

Undoing boxing doesn't work with C# 7 pattern matching #10195

@svick

Description

@svick

Short version: I was reading @stephentoub's article Performance Improvements in .NET Core 2.1. I noticed that his example for avoiding boxing allocations thanks to dotnet/coreclr#14698 uses is followed by a cast, when in C# 7, the same code could be simplified using pattern matching. So I was wondering if using C# 7 features also results in the same efficient code. It turns out it doesn't and I think this should be improved.

More details:

Consider this code:

using System.Runtime.CompilerServices;

class Program
{
    static void Main()
    {
        Cast(new Dog());
        Pattern(new Dog());
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    static void Cast<T>(T thing)
    {
        if (thing is IAnimal)
            ((IAnimal)thing).MakeSound();
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    static void Pattern<T>(T thing)
    {
        if (thing is IAnimal animal)
            animal.MakeSound();
    }    
}

struct Dog : IAnimal
{
    public void Bark() { }
    void IAnimal.MakeSound() => Bark();
}

interface IAnimal
{
    void MakeSound();
}

The IL for the relevant methods is:

.method private hidebysig static void  Cast<T>(!!T thing) cil managed noinlining
{
  // Code size       30 (0x1e)
  .maxstack  8
  IL_0000:  ldarg.0
  IL_0001:  box        !!T
  IL_0006:  isinst     IAnimal
  IL_000b:  brfalse.s  IL_001d
  IL_000d:  ldarg.0
  IL_000e:  box        !!T
  IL_0013:  castclass  IAnimal
  IL_0018:  callvirt   instance void IAnimal::MakeSound()
  IL_001d:  ret
}

.method private hidebysig static void  Pattern<T>(!!T thing) cil managed noinlining
{
  // Code size       22 (0x16)
  .maxstack  2
  .locals init (class IAnimal V_0)
  IL_0000:  ldarg.0
  IL_0001:  box        !!T
  IL_0006:  isinst     IAnimal
  IL_000b:  dup
  IL_000c:  stloc.0
  IL_000d:  brfalse.s  IL_0015
  IL_000f:  ldloc.0
  IL_0010:  callvirt   instance void IAnimal::MakeSound()
  IL_0015:  ret
}

Notice how in Pattern, the boxed object is saved to a local variable (typed as the interface).

The disassembly from .Net Core 2.1.0-preview2-26406-04 win10-x64 is:

; Assembly listing for method Program:Cast(struct)
; Emitting BLENDED_CODE for X64 CPU with AVX
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 arg0         [V00    ] (  2,  2   )  struct ( 8) [rsp+0x08]   do-not-enreg[XS] addr-exposed
;* V01 tmp0         [V01    ] (  0,  0   )     ref  ->  zero-ref    class-hnd exact
;* V02 tmp1         [V02    ] (  0,  0   )  struct ( 8) zero-ref    do-not-enreg[SF] class-hnd exact
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]  
;
; Lcl frame size = 0

G_M19994_IG01:
       48894C2408           mov      qword ptr [rsp+08H], rcx

G_M19994_IG02:
       C3                   ret      

; Total bytes of code 6, prolog size 0 for method Program:Cast(struct)
; ============================================================
; Assembly listing for method Program:Pattern(struct)
; Emitting BLENDED_CODE for X64 CPU with AVX
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 arg0         [V00    ] (  4,  4   )  struct ( 8) [rsp+0x30]   do-not-enreg[XSF] addr-exposed
;  V01 loc0         [V01,T02] (  3,  2   )     ref  ->  rax         class-hnd exact
;  V02 tmp0         [V02,T00] (  4,  8   )     ref  ->  rax         class-hnd exact
;  V03 tmp1         [V03,T01] (  2,  4   )     ref  ->  rax         class-hnd exact
;  V04 OutArgs      [V04    ] (  1,  1   )  lclBlk (32) [rsp+0x00]  
;
; Lcl frame size = 40

G_M22101_IG01:
       4883EC28             sub      rsp, 40
       48894C2430           mov      qword ptr [rsp+30H], rcx

G_M22101_IG02:
       48B9005F64B2F87F0000 mov      rcx, 0x7FF8B2645F00
       E8A86B0F5F           call     CORINFO_HELP_NEWSFAST
       480FBE4C2430         movsx    rcx, byte  ptr [rsp+30H]
       884808               mov      byte  ptr [rax+8], cl
       488BC8               mov      rcx, rax
       E897FBFFFF           call     Dog:IAnimal.MakeSound():this
       90                   nop      

G_M22101_IG03:
       4883C428             add      rsp, 40
       C3                   ret      

; Total bytes of code 47, prolog size 4 for method Program:Pattern(struct)
; ============================================================

Notice how for Cast, almost all the code, including the boxing allocation, is optimized away (the remaining mov seems to be unnecessary, but that's not really relevant here). But for Pattern, all the code is still there, including an allocation and a non-inlined call to Dog.IAnimal.MakeSound.

The two versions of the code do the same thing, so I think they should have comparable performance. Especially since the pattern matching version is more readable and I suspect it's also going to be more common in new code than the other version.

How hard would it be to make this optimization work even in the pattern matching version?

If it would be too hard to perform this optimization in the JIT, is there a reasonable way for the C# compiler to emit IL that would be optmized?

cc (?): @AndyAyersMS, @benaadams, @justinvp

category:cq
theme:importer
skill-level:expert
cost:medium

Metadata

Metadata

Labels

Priority:2Work that is important, but not critical for the releasearea-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIoptimization

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions