-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Convert ValueMapper to use 'in' parameters #1475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
/// <summary> | ||
/// Delegate type to map/convert among three values, for example, one input with two | ||
/// outputs, or two inputs with one output. | ||
/// </summary> | ||
public delegate void ValueMapper<TVal1, TVal2, TVal3>(ref TVal1 val1, ref TVal2 val2, ref TVal3 val3); | ||
public delegate void ValueMapper<TVal1, TVal2, TVal3>(in TVal1 val1, ref TVal2 val2, ref TVal3 val3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I inspected all the usages of this delegate, and most times it is used by the below interface IValueMapperDist
, which takes 1 input and returns 2 outputs.
The only case where this was used for 2 inputs and 1 output was in one place in matrix factorization code (the two inputs were column and row I believe).
I didn't feel it was necessary to split this delegate. If someone does feel it is necessary, I think a separate issue should be logged and addressed separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -9,13 +9,13 @@ namespace Microsoft.ML.Runtime.Data | |||
/// <summary> | |||
/// Delegate type to map/convert a value. | |||
/// </summary> | |||
public delegate void ValueMapper<TSrc, TDst>(ref TSrc src, ref TDst dst); | |||
public delegate void ValueMapper<TSrc, TDst>(in TSrc src, ref TDst dst); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm, is TSrc
expected to always be a readonly struct
? in
causes hidden copies when calling an instance member for non-readonly structs and can cause unexpected perf-regressions in some cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a practical matter in most applications they will be some sort of VBuffer
, with some isolated instances here and there as you've observed. Yet these other types from what I see tend to also be readonly structs. (E.g., you observed elsewhere TimeSpan
, and here is Int32
) I just ran a pretty simple benchmark like so (using BenchmarkDotNet, my new favorite project)...
private static int Foo(in byte a, in byte b) => a + b;
private static int Bar(byte a, byte b) => a + b;
[Benchmark]
public void Foo() { for (int i = 0; i < 1000; ++i) Foo(2, 5); }
[Benchmark]
public void Bar() { for (int i = 0; i < 1000; ++i) Bar(2, 5); }
At least on my computer this gives:
Method | Mean | Error | StdDev |
---|---|---|---|
Foo | 239.6 ns | 2.8371 ns | 2.6538 ns |
Bar | 238.2 ns | 0.8780 ns | 0.7332 ns |
So it seems like it is at least not harmful for things like int
, while still giving some benefit for things like VBuffer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my limited knowledge of the area, I would guess that >95%
of the time ValueMapper will be used for primitives, reference types, and readonly struct
.
The most common case is to use these types:
using BL = Boolean;
using DT = DateTime;
using DZ = DateTimeOffset;
using R4 = Single;
using R8 = Double;
using I1 = SByte;
using I2 = Int16;
using I4 = Int32;
using I8 = Int64;
using SB = StringBuilder;
using TX = ReadOnlyMemory<char>;
using TS = TimeSpan;
using U1 = Byte;
using U2 = UInt16;
using U4 = UInt32;
using U8 = UInt64;
using UG = UInt128;
Along with VBuffer
, which just got made readonly
in #1454.
Skimming this list, I found that UInt128
was not marked readonly
. So I'm opening #1496 to make that (and other obvious readonly structs) marked as readonly
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any concern with the primitive types not being readonly structs
for full framework and/or netstandard right now?
{ | ||
TextSaverUtils.MapText(src.Span, ref sb, Sep); | ||
} | ||
|
||
protected void MapTimeSpan(ref TimeSpan src, ref StringBuilder sb) | ||
protected void MapTimeSpan(in TimeSpan src, ref StringBuilder sb) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note: some of these don't seem worthwhile to pass by in
, since they are often smaller than a register in size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we see though we benefit from having a uniform signature no matter what the input type is, even if not all types will benefit, since we can then use the same high level code for all types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These have to be changed to in
because these methods are used as ValueMapper
delegates.
machinelearning/src/Microsoft.ML.Data/DataLoadSave/Text/TextSaver.cs
Lines 100 to 114 in a8bc268
else if (type.IsTimeSpan) | |
{ | |
ValueMapper<TimeSpan, StringBuilder> c = MapTimeSpan; | |
Conv = (ValueMapper<T, StringBuilder>)(Delegate)c; | |
} | |
else if (type.IsDateTime) | |
{ | |
ValueMapper<DateTime, StringBuilder> c = MapDateTime; | |
Conv = (ValueMapper<T, StringBuilder>)(Delegate)c; | |
} | |
else if (type.IsDateTimeZone) | |
{ | |
ValueMapper<DateTimeOffset, StringBuilder> c = MapDateTimeZone; | |
Conv = (ValueMapper<T, StringBuilder>)(Delegate)c; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @eerhardt looks great!
ValueMapper takes in a source value, and maps it to a destination value. The source value that is passed "in" is current passed by
ref
, which isn't correct because we don't want ValueMapper implementations to modify this value.Change ValueMapper (both versions) to use
in
parameters instead.This is a follow-up to #1454 requested in the PR feedback.
Working towards #608