Skip to content

Optimization: left ?? right can be optimized to left.GetValueOfDefault() when appropriate #22800

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
MrJul opened this issue Oct 21, 2017 · 12 comments
Labels
Area-Compilers Resolution-Fixed The bug has been fixed and/or the requested behavior has been implemented
Milestone

Comments

@MrJul
Copy link
Contributor

MrJul commented Oct 21, 2017

While writing some code the other day, I wondered if, for a variable x of type int?, whether x ?? 0 and x.GetValueOrDefault() are really compiled to the same IL (I first assumed that they are). To my surprise, they aren't!

GetValueOrDefault is faster because it's implemented as a single unconditional field load. You can't really get much simpler than that, and it's a great candidate for inlining.

While it's an implementation detail, Roslyn already relies on it: in some cases where it's known that a nullable expression has a value, the compiler generates nullable.GetValueOrDefault() to access the value rather than using nullable.Value.

The change seemed simple enough: lower any left ?? right to left.GetValueOrDefault() when left is T? and right is the default value of T. I've wanted to poke around the Roslyn codebase for a while, so here was my chance :)

Here's the commit implementing this optimization along with a test: https://github.com/MrJul/roslyn/commit/ed21c4d8b025ec203a474e814c91ec701c6ac59f

Is it worth it? While it's a micro optimization, it's still an optimization (BenchmarkDotNet shows a ~60% improvement with RyuJIT x64 and .NET 4.7 on my machine - we're talking about sub-nanoseconds times here), and a pretty straightforward one in my opinion.

I didn't handle the case left ?? right when left is a T? and right is a non-default T constant since GetValueOrDefault(T) (with a parameter) has branches equivalent to what the compiler generates so the only benefit is a very slight reduction in IL size (which might still help with inlining). However, I can add it if needed.

If the team is interested in this change, I'll open a PR.

@jcouv jcouv added this to the Unknown milestone Nov 5, 2017
@sharwell
Copy link
Contributor

sharwell commented Jun 7, 2018

This came up in a Gitter discussion today about the three ways to tell if a bool? has the value true:

  1. value == true
  2. value is true
  3. value ?? false

Currently 1 and 3 are similar, and the work in features/recursive-patterns makes 2 similar to the existing ones. The change proposed here would give a clear advantage to 3. That's not necessarily bad, but I also wouldn't mind seeing all three cases handled as part of the work here.

If the team is interested in this change

@gafter @jaredpar would need to weigh in here, but the benchmark numbers I saw today were compelling.

@alrz
Copy link
Member

alrz commented Jun 7, 2018

I think codegen for lifted comparison operators should be unified somehow, there's no reason for ?? == and is to generate different codes

@jaredpar
Copy link
Member

jaredpar commented Jun 7, 2018

the benchmark numbers I saw today were compelling.

Perhaps we could all see them? 😄

@jnm2
Copy link
Contributor

jnm2 commented Jun 7, 2018

@jaredpar Sam is referring to performance tables tables starting at https://gitter.im/dotnet/csharplang?at=5b192e4535e25f399757c1e0.

@jaredpar
Copy link
Member

jaredpar commented Jun 7, 2018

@jnm2 where is the code for the benchmarks? That is more important than the resulting tables.

@jnm2
Copy link
Contributor

jnm2 commented Jun 7, 2018

@jaredpar The source code is near each table of results. Sam just asked for a new one, so here it all is:

Code (project targets net472 and netcoreapp2.1):

    [ClrJob, CoreJob]
    public class Benchmarks
    {
        private static readonly Random Random = new Random();
        private static readonly bool?[] Values = { null, false, true };

        private static bool? GetValue() => Values[Random.Next(0, 3)];

        [Benchmark]
        public bool Coalesce() => GetValue() ?? false;

        [Benchmark]
        public bool Compare() => GetValue() == true;

        [Benchmark]
        public bool GetValueOrDefault() => GetValue().GetValueOrDefault();

        [Benchmark]
        public void Noop() => GetValue();
    }
}

Running on i7-7700K with Windows 10.0.17134.48:

            Method |  Job | Runtime |      Mean |     Error |    StdDev |
------------------ |----- |-------- |----------:|----------:|----------:|
          Coalesce |  Clr |     Clr | 13.066 ns | 0.0279 ns | 0.0261 ns |
           Compare |  Clr |     Clr | 13.141 ns | 0.0495 ns | 0.0463 ns |
 GetValueOrDefault |  Clr |     Clr | 10.327 ns | 0.2313 ns | 0.2841 ns |
              Noop |  Clr |     Clr |  9.464 ns | 0.1349 ns | 0.1196 ns |
          Coalesce | Core |    Core | 14.71  ns | 0.1071 ns | 0.0949 ns |
           Compare | Core |    Core | 14.52  ns | 0.1573 ns | 0.1471 ns |
 GetValueOrDefault | Core |    Core | 11.84  ns | 0.0457 ns | 0.0381 ns |
              Noop | Core |    Core | 10.87  ns | 0.3366 ns | 0.3741 ns |

@jnm2
Copy link
Contributor

jnm2 commented Jun 7, 2018

Ideally BenchmarkDotNet would let you inject different arguments during the same run. The only way to get the Random.Next and lookup overhead out is either subtraction after the fact, or test each input in a separate run and see branch prediction play out.

@jnm2
Copy link
Contributor

jnm2 commented Jun 7, 2018

Subtracting the no-op mean from the other rows' means and adding the no-op error to the other row's errors:

|            Method | Runtime |     Mean |     Error |
|------------------ |-------- |---------:|----------:|
|          Coalesce |     Clr | 3.602 ns | 0.1628 ns |
|           Compare |     Clr | 3.677 ns | 0.1844 ns |
| GetValueOrDefault |     Clr | 0.863 ns | 0.3662 ns |
|          Coalesce |    Core | 3.84  ns | 0.4437 ns |
|           Compare |    Core | 3.65  ns | 0.4939 ns |
| GetValueOrDefault |    Core | 0.97  ns | 0.3823 ns |

This seems to indicate that GetValueOrDefault brings a 3.96× speed increase on .NET Core 2.1 and a 4.17× increase on .NET Framework 4.7.2.

@Suchiman
Copy link
Contributor

Suchiman commented Jun 7, 2018

Here's a comparison without random:

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Attributes.Jobs;
using BenchmarkDotNet.Running;

namespace Benchmarks
{
    public static class Program
    {
        public static void Main()
        {
            BenchmarkRunner.Run<Benchmarks>();
        }
    }

    [ClrJob, CoreJob]
    public class Benchmarks
    {
        [Params(null, false, true)]
        public bool? Value;

        [Benchmark]
        public bool Coalesce() => Value ?? false;

        [Benchmark]
        public bool Compare() => Value == true;

        [Benchmark]
        public bool PatternOld() => Value is true; // code as being generated by C# 7.3

        [Benchmark]
        public bool PatternNew() // code as being generated by C# 8 recursive patterns branch
        {
            var x = Value;
            if (x.HasValue)
            {
                return x.GetValueOrDefault();
            }
            return false;
        }

        [Benchmark]
        public bool GetOrDefault() => Value.GetValueOrDefault();
    }
}
BenchmarkDotNet=v0.10.14, OS=Windows 10.0.17134
Intel Core i5-4670K CPU 3.40GHz (Haswell), 1 CPU, 4 logical and 4 physical cores
.NET Core SDK=2.1.300
  [Host] : .NET Core 2.1.0 (CoreCLR 4.6.26515.07, CoreFX 4.6.26515.06), 64bit RyuJIT
  Clr    : .NET Framework 4.7.1 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3110.0
  Core   : .NET Core 2.1.0 (CoreCLR 4.6.26515.07, CoreFX 4.6.26515.06), 64bit RyuJIT

Method Job Runtime Value Mean Error StdDev
Coalesce Clr Clr ? 0.2336 ns 0.0040 ns 0.0037 ns
Compare Clr Clr ? 0.2254 ns 0.0038 ns 0.0034 ns
PatternOld Clr Clr ? 23.0132 ns 0.0646 ns 0.0605 ns
PatternNew Clr Clr ? 0.2038 ns 0.0016 ns 0.0015 ns
GetOrDefault Clr Clr ? 0.0040 ns 0.0029 ns 0.0027 ns
Coalesce Core Core ? 0.2547 ns 0.0020 ns 0.0018 ns
Compare Core Core ? 0.2292 ns 0.0014 ns 0.0013 ns
PatternOld Core Core ? 22.2258 ns 0.0647 ns 0.0605 ns
PatternNew Core Core ? 0.2694 ns 0.0015 ns 0.0014 ns
GetOrDefault Core Core ? 0.0000 ns 0.0000 ns 0.0000 ns
Coalesce Clr Clr False 0.1438 ns 0.0044 ns 0.0041 ns
Compare Clr Clr False 0.2281 ns 0.0044 ns 0.0041 ns
PatternOld Clr Clr False 63.9141 ns 0.2468 ns 0.2309 ns
PatternNew Clr Clr False 0.2059 ns 0.0031 ns 0.0029 ns
GetOrDefault Clr Clr False 0.0513 ns 0.0016 ns 0.0015 ns
Coalesce Core Core False 0.2725 ns 0.0034 ns 0.0032 ns
Compare Core Core False 0.2270 ns 0.0026 ns 0.0025 ns
PatternOld Core Core False 58.1550 ns 0.1855 ns 0.1644 ns
PatternNew Core Core False 0.2574 ns 0.0030 ns 0.0028 ns
GetOrDefault Core Core False 0.0000 ns 0.0000 ns 0.0000 ns
Coalesce Clr Clr True 0.2421 ns 0.0048 ns 0.0045 ns
Compare Clr Clr True 0.2271 ns 0.0060 ns 0.0053 ns
PatternOld Clr Clr True 64.0497 ns 0.2420 ns 0.2264 ns
PatternNew Clr Clr True 0.2069 ns 0.0022 ns 0.0021 ns
GetOrDefault Clr Clr True 0.0026 ns 0.0034 ns 0.0031 ns
Coalesce Core Core True 0.2716 ns 0.0023 ns 0.0021 ns
Compare Core Core True 0.2276 ns 0.0013 ns 0.0011 ns
PatternOld Core Core True 55.3565 ns 0.2089 ns 0.1954 ns
PatternNew Core Core True 0.2553 ns 0.0032 ns 0.0029 ns
GetOrDefault Core Core True 0.0023 ns 0.0049 ns 0.0046 ns

@MrJul
Copy link
Contributor Author

MrJul commented Jun 7, 2018

@sharwell

That's not necessarily bad, but I also wouldn't mind seeing all three cases handled as part of the work here.

I can open a first PR for what's this issue about (lower T? ?? default(T) to T?.GetDefaultValue()).

I think optimizing bool? comparisons is a bit different, as there are several things we could do (for example !x == false can also be lowered to x.GetValueOrDefault(), same thing with is). I'd be interested in making those modifications.

@jcouv jcouv modified the milestones: Unknown, 15.8 Jun 25, 2018
@jcouv jcouv added the 4 - In Review A fix for the issue is submitted for review. label Jun 25, 2018
@jcouv jcouv modified the milestones: 15.8, 16.0 Jun 25, 2018
@jcouv jcouv added Resolution-Fixed The bug has been fixed and/or the requested behavior has been implemented and removed 4 - In Review A fix for the issue is submitted for review. labels Jun 27, 2018
@jcouv
Copy link
Member

jcouv commented Jun 27, 2018

The fix was merged into features/compiler for 16.0.

@xtqqczze
Copy link

I didn't handle the case left ?? right when left is a T? and right is a non-default T constant since GetValueOrDefault(T) (with a parameter) has branches equivalent to what the compiler generates so the only benefit is a very slight reduction in IL size (which might still help with inlining). However, I can add it if needed.

I suggested something like this in #56007, as the current implementation generates an unnecessary temporary for value types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compilers Resolution-Fixed The bug has been fixed and/or the requested behavior has been implemented
Projects
None yet
Development

No branches or pull requests

8 participants