-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Conversion of NAIndicatorTransform to estimator with related pigstensions #1217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
using System.Text; | ||
using Microsoft.ML.Core.Data; | ||
using Microsoft.ML.Runtime; | ||
using Microsoft.ML.Runtime.CommandLine; | ||
using Microsoft.ML.Runtime.Data; | ||
using Microsoft.ML.Runtime.Data.Conversion; | ||
using Microsoft.ML.Runtime.EntryPoints; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you Ctrl+R+G they will get re-ordered differently. #Resolved
|
||
namespace Microsoft.ML.Runtime.Data | ||
{ | ||
/// <include file='doc.xml' path='doc/members/member[@name="NAIndicator"]'/> | ||
public sealed class NAIndicatorTransform : OneToOneTransformBase | ||
public sealed class NAIndicatorTransform : OneToOneTransformerBase | ||
{ | ||
public sealed class Column : OneToOneColumn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public [](start = 8, length = 6)
as we go through this, if you have time to add a one line comment on the public members, a they will appear in the documentation. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
|
||
private const string RegistrationName = "NaIndicator"; | ||
private const string RegistrationName = nameof(NAIndicatorTransform); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nameof(NAIndicatorTransform); [](start = 48, length = 29)
The usage of nameof here is great! Shall we get rid of the LoadName and just substitute with this? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public NAIndicatorTransform(IHostEnvironment env, Arguments args, IDataView input) | ||
: base(env, RegistrationName, Contracts.CheckRef(args, nameof(args)).Column, | ||
input, TestType) | ||
public static IDataTransform Create(IHostEnvironment env, Arguments args, IDataView input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public [](start = 8, length = 6)
can it be internal? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NAIndicatorTransform.FriendlyName, NAIndicatorTransform.LoadName)] | ||
|
||
[assembly: LoadableClass(typeof(IRowMapper), typeof(NAIndicatorTransform), null, typeof(SignatureLoadRowMapper), | ||
NAIndicatorTransform.FriendlyName, NAIndicatorTransform.LoadName)] | ||
|
||
namespace Microsoft.ML.Runtime.Data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Runtime.Data [](start = 23, length = 12)
Transforms instead of Runtime.Data #Closed
/// <summary> | ||
/// Factory method for SignatureDataTransform. | ||
/// </summary> | ||
public static NAIndicatorTransform Create(IHostEnvironment env, ModelLoadContext ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make all of them (factory methods) private if you can.
also I still prefer to have // comment instead of summary. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I thought they were going to be public that's why I added the summary, but I'll change to comment then.
In reply to: 224554120 [](ancestors = 224554120)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ | ||
// Output is normalized. | ||
bldr.AddPrimitive(MetadataUtils.Kinds.IsNormalized, BoolType.Instance, true); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this metadata been properly propagated in estimator/ mapper #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
|
||
[Fact] | ||
public void NAIndicatorStatic() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NAIndicatorStatic [](start = 20, length = 17)
Please move next to other static tests #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We decided to group static tests together and other transformer tests together
In reply to: 224861996 [](ancestors = 224861996)
@@ -68,116 +78,196 @@ private static VersionInfo GetVersionInfo() | |||
internal const string FriendlyName = "NA Indicator Transform"; | |||
internal const string ShortName = "NAInd"; | |||
|
|||
private static string TestType(ColumnType type) | |||
internal static string TestType(ColumnType type) | |||
{ | |||
// Item type must have an NA value. We'll get the predicate again later when we're ready to use it. | |||
Delegate del; | |||
if (Conversions.Instance.TryGetIsNAPredicate(type.ItemType, out del)) | |||
return null; | |||
return string.Format("Type '{0}' is not supported by {1} since it doesn't have an NA value", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Format [](start = 26, length = 6)
use $ syntax #Closed
type, LoadName); | ||
} | ||
|
||
public class ColumnInfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c class [](start = 13, length = 7)
sealed
. Also, why do you need it? The base class is doing fine with just a (string input, string output)
value tuple #Closed
} | ||
} | ||
return (inputTypes, outputTypes); | ||
} | ||
|
||
/// <summary> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
summary [](start = 13, length = 7)
a) this summary should change, because it is not for the public facing API.
Also the param
comments are bad.
b) this constructor should not be public at all. The only way to train a trainable transformer should be via estimator's Fit
call, not directly via some constructor #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I take it even further. This transformer should not be trainable at all. It should not memorize the column types, only names.
All of the column type stuff should belong to Mapper
.
In reply to: 224863446 [](ancestors = 224863446)
private NAIndicatorTransform(IHost host, ModelLoadContext ctx, IDataView input) | ||
: base(host, ctx, input, TestType) | ||
/// <summary> | ||
/// Factory method for SignatureDataTransform. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Factory method for SignatureDataTransform. [](start = 11, length = 43)
This is not true. This is a factory method for SignatureLoadModel
#Closed
/// <summary> | ||
/// Factory method for SignatureDataTransform. | ||
/// </summary> | ||
public static IDataTransform Create(IHostEnvironment env, IDataView input, params ColumnInfo[] columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create [](start = 37, length = 6)
It is not a factory method for anything, remove it altogether #Closed
var metadata = new List<SchemaShape.Column>(); | ||
if (col.Metadata.TryFindColumn(MetadataUtils.Kinds.SlotNames, out var slotMeta)) | ||
metadata.Add(slotMeta); | ||
if (col.Metadata.TryFindColumn(MetadataUtils.Kinds.IsNormalized, out var normalized)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (col.Metadata.TryFindColumn(MetadataUtils.Kinds.IsNormalized, out var normalized)) [](start = 15, length = 86)
I think previously it got added unconditionally, and here you check source column #Closed
InputSchema.TryGetColumnIndex(_infos[iinfo].Input, out int colIndex); | ||
Host.Assert(colIndex >= 0); | ||
var colMetaInfo = new ColumnMetadataInfo(_infos[iinfo].Output); | ||
var meta = RowColumnUtils.GetMetadataAsRow(InputSchema, colIndex, x => x == MetadataUtils.Kinds.SlotNames || x == MetadataUtils.Kinds.IsNormalized); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|| x == MetadataUtils.Kinds.IsNormalized [](start = 126, length = 40)
it should be unconditional. #Closed
} | ||
|
||
[Fact] | ||
public void TestOldSavingAndLoading() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add small metadata test? #Closed
: this(env, new Arguments() { Column = new[] { new Column() { Source = source ?? name, Name = name } } }, input) | ||
/// <param name="env">The environment to use.</param> | ||
/// <param name="columns">The names of the input columns of the transformation and the corresponding names for the output columns.</param> | ||
internal NAIndicatorTransform(IHostEnvironment env, params (string input, string output)[] columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internal [](start = 8, length = 8)
now it's safe to make this ctor public :) #Closed
|
||
// *** Binary format *** | ||
// <base> | ||
private static (string input, string output)[] GetColumnPairs(Column[] columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetColumnPairs [](start = 55, length = 14)
this entire method can be replaced with
columns.Select(c=>(c.Source ?? c.Name, c.Name)).ToArray()
#Closed
Output = output; | ||
InputType = inType; | ||
OutputType = outType; | ||
InputIsNA = GetIsNADelegate(InputType); ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
; [](start = 59, length = 2)
remove #Closed
|
||
// The output column types, parallel to Infos. | ||
private readonly ColumnType[] _types; | ||
internal (string input, string output)[] GetColumnPairs() => ColumnPairs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internal [](start = 8, length = 8)
The pattern is :
public IReadOnlyList<(string input, string output)> Columns => ColumnPairs.AsReadOnly()
#Resolved
@@ -84,7 +84,7 @@ public void KMeansEstimator() | |||
public void TestEstimatorHogwildSGD() | |||
{ | |||
(IEstimator<ITransformer> pipe, IDataView dataView) = GetBinaryClassificationPipeline(); | |||
pipe.Append(new StochasticGradientDescentClassificationTrainer(Env, "Features", "Label")); | |||
pipe = pipe.Append(new StochasticGradientDescentClassificationTrainer(Env, "Features", "Label")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pipe [](start = 19, length = 4)
nice catch :) #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
var dataView = ComponentCreation.CreateDataView(Env, data); | ||
var args = new NAIndicatorTransform.Arguments(); | ||
args.Column = new NAIndicatorTransform.Column[] { new NAIndicatorTransform.Column() { Name = "NAA", Source = "A" } }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why you keep using arguments class? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ongoing work on converting the transformers to estimators (#754). This PR converts the NA Indicator transform to estimator and adds the relative pigsty extensions.