Skip to content

Should JIT recognize and hoist NumberFormatInfo.CurrentInfo calls when parsing numbers? #11444

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
adamsitnik opened this issue Nov 11, 2018 · 4 comments
Labels

Comments

@adamsitnik
Copy link
Member

I am currently profiling ML.NET to search for some perf bottlenecks and places where we could do a better job.

One of the patterns I have found is following: parsing numbers in a loop causes a call to NumberFormatInfo.CurrentInfo per loop iteration if the FormatInfo is not provided in explicit way.

image

I have done manual hoisting in ML.NET and I saved 8% for huge file reads (gigabytes of data) (more info here dotnet/machinelearning#1599)

My proposal: JIT could recognize such pattern and hoist the result of NumberFormatInfo.CurrentInfo.

Some benchmarks results:

Method Mean Error StdDev Ratio
NotHoisted 39.45 us 0.1046 us 0.0874 us 1.00
Hoisted 37.37 us 0.1850 us 0.1730 us 0.95

Benchmark code:

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Globalization;
using System.Threading;

namespace ParsingFloats
{
    class Program
    {
        static void Main(string[] args) => BenchmarkRunner.Run<ParsingFloats>();
    }

    // [BenchmarkDotNet.Diagnostics.Windows.Configs.EtwProfiler(performExtraBenchmarksRun: false)] install BenchmarkDotNet.Diagnostics.Windows and uncomment to get a profile
    public class ParsingFloats
    {
        // the input uses . as separator which is not true for all cultures
        [GlobalSetup] public void Setup() => Thread.CurrentThread.CurrentCulture = CultureInfo.InvariantCulture;

        [Benchmark(Baseline = true)]
        public float NotHoisted()
        {
            float result = 0;

            foreach (var toParse in numbers)
                if (float.TryParse(toParse, out float value))
                    result += value;

            return result;
        }

        [Benchmark]
        public float Hoisted()
        {
            float result = 0;

            NumberFormatInfo info = NumberFormatInfo.CurrentInfo;
            foreach (var toParse in numbers)
                if (float.TryParse(toParse, NumberStyles.Float | NumberStyles.AllowThousands, info, out float value))
                    result += value;

            return result;
        }

        private readonly string[] numbers = new[] // this is real input line from wiki.en.vec file
        {
            "-0.023167", "-0.0042483", "-0.10572", "0.042783", "-0.14316", "-0.078954", "0.078187", "-0.19454", "0.022303", "0.31207",
            "0.057462", "-0.11589", "0.096633", "-0.093229", "-0.034229", "-0.14652", "-0.11094", "-0.11102", "0.067728", "0.10023",
            "-0.067413", "0.23761", "-0.13105", "-0.0083979", "-0.10593", "0.24526", "0.065903", "-0.2374", "-0.10758", "0.0057082",
            "-0.081413", "0.26264", "-0.052461", "0.20306", "0.05062", "-0.18866", "-0.11494", "-0.25752", "0.046799", "-0.050525",
            "0.06265", "0.15433", "-0.056289", "-0.048437", "-0.099688", "-0.035332", "-0.091647", "-0.081151", "-0.0010844", "-0.08414",
            "-0.13026", "0.01498", "-0.086276", "-0.053041", "-0.10644", "-0.042314", "0.086469", "0.22614", "-0.16078", "0.18845", "0.053098",
            "-0.21475", "0.16699", "-0.14442", "-0.1593", "0.0062456", "-0.07663", "-0.091568", "-0.28984", "0.027078", "0.021275", "0.023939",
            "0.14903", "-0.33062", "-0.097811", "-0.033814", "0.070587", "0.023294", "0.065382", "0.18716", "-0.13444", "0.14431", "-0.0268",
            "-0.022903", "0.097554", "-0.032909", "-0.027827", "-0.068771", "0.17053", "-0.05946", "0.020424", "-0.077589", "0.1216", "-0.077437",
            "0.10665", "0.051087", "0.0076379", "-0.064936", "0.09031", "0.059447", "0.0048881", "0.078309", "-0.012163", "0.062155", "-0.072664",
            "0.17857", "-0.22874", "0.066397", "-0.039295", "-0.027717", "0.061571", "0.072824", "-0.092512", "-0.087984", "-0.12753", "-0.0018705",
            "0.18689", "0.0051173", "-0.0013532", "0.043246", "0.10867", "-0.12209", "-0.0091676", "0.23938", "-0.059501", "-0.0010456", "0.086584",
            "0.020238", "0.21686", "0.16495", "0.037256", "0.12343", "0.17706", "0.075777", "0.031022", "-0.12948", "0.030936", "0.096897", "-0.10793",
            "0.12644", "-0.056489", "0.082232", "0.20679", "0.11679", "0.13965", "0.26362", "0.037603", "-0.003105", "-0.089501", "-0.0076969", "-0.11654",
            "-0.28567", "0.046616", "-0.0082062", "0.15621", "-0.14641", "0.064561", "-0.1133", "0.27129", "0.14532", "-0.021773", "0.23305", "-0.1617",
            "0.15705", "0.13845", "0.022417", "-0.10982", "-0.049431", "0.076855", "-0.0453", "-0.19029", "0.011183", "-0.010393", "0.0016916", "0.089407",
            "-0.051022", "-0.086066", "0.083933", "-0.0081962", "-0.0077321", "0.033991", "-0.20092", "0.03328", "0.062224", "0.016121", "0.27143", "-0.19754",
            "-0.15222", "-0.015345", "-0.063907", "-0.098597", "-0.20162", "0.14004", "-1.1533e-05", "-0.18928", "0.12253", "-0.0070378", "0.0864", "-0.30255",
            "-0.03908", "0.045517", "-0.16449", "-0.23548", "0.052781", "0.13847", "-0.20022", "-0.015974", "0.027137", "0.18287", "-0.02389", "0.22072",
            "-0.04271", "-0.075939", "-0.087386", "-0.049337", "0.047824", "-0.059078", "-0.15181", "-0.21229", "-0.054944", "-0.011453", "-0.11996", "-0.15307",
            "-0.054828", "-0.053217", "-0.048546", "0.028856", "-0.094537", "0.27144", "0.054638", "0.059727", "0.061772", "0.009259", "-0.12032", "-0.16646",
            "-0.029087", "0.0028752", "-0.16076", "-0.1371", "-0.18988", "0.022857", "0.18455", "-0.018236", "-0.0060562", "0.14302", "0.032535", "0.14333",
            "-0.030871", "-0.15218", "0.092813", "0.066358", "0.018316", "-0.24143", "0.0054391", "-0.064479", "-0.08596", "0.030446", "0.082157", "0.026093",
            "0.058985", "0.0051085", "0.089127", "-0.018164", "-0.077821", "0.0034232", "0.13892", "0.046106", "-0.05417", "0.0084399", "-0.15362", "-0.14735",
            "0.065191", "-0.022883", "-0.14498", "-0.16917", "-0.19215", "0.10611", "0.001678", "-0.16331", "-0.07307", "0.11576", "0.083567", "-0.060317",
            "-0.064714", "0.15305", "-0.11949", "0.16684", "0.14109", "0.046036", "-0.060393", "0.046595", "-0.11558", "0.044184", "-0.023124", "0.02586",
            "-0.11653", "0.010936", "0.089398", "-0.0159", "0.14866"
        };
    }
}

cc @danmosemsft @tannergooding @AndyAyersMS @CarolEidt @davidwrighton

@mikedn
Copy link
Contributor

mikedn commented Nov 11, 2018

My proposal: JIT could recognize such pattern and hoist the result of NumberFormatInfo.CurrentInfo.

It's obvious to us humans that the parsing code won't change the current culture but this isn't something that the JIT can figure out without interprocedural analysis. In general, calls cannot be hoisted, with the exception of a few special helper calls that the JIT knows that they're "pure".

@jkotas
Copy link
Member

jkotas commented Nov 12, 2018

It's obvious to us humans that the parsing code won't change the current culture

Other threads can change the current culture though. The current culture is a very complex algorithm with many fallbacks.

I think we can make this better by:

  • Streamlining the CurrentCulture getter
  • Caching NumberFormatInfo per thread to avoid unnecessary indirections
  • Ideally, the JIT should be able to hoist the pointer to thread local block that contains the cached NumberFormatInfo. It is an optimization that the JIT does already. The code just needs to be structured properly to take advantage of it.

Also, you may want to consider using the newer Utf8 parsers for any high-performance parsing since you probably got the input as Utf8 anyway and do not care about any of the current sensitive parsing.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 26, 2020
@tarekgh tarekgh removed the untriaged New issue has not been triaged by the area owner label Jun 17, 2020
Copy link
Contributor

Due to lack of recent activity, this issue has been marked as a candidate for backlog cleanup. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will undo this process.

This process is part of our issue cleanup automation.

@dotnet-policy-service dotnet-policy-service bot added backlog-cleanup-candidate An inactive issue that has been marked for automated closure. no-recent-activity labels Dec 17, 2024
Copy link
Contributor

This issue will now be closed since it had been marked no-recent-activity but received no further activity in the past 14 days. It is still possible to reopen or comment on the issue, but please note that the issue will be locked if it remains inactive for another 30 days.

@dotnet-policy-service dotnet-policy-service bot removed this from the Future milestone Dec 31, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Jan 31, 2025
@dotnet-policy-service dotnet-policy-service bot removed no-recent-activity backlog-cleanup-candidate An inactive issue that has been marked for automated closure. labels Jan 31, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants