Skip to content

Conversion of NAIndicatorTransform to estimator with related pigstensions #1217

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Oct 20, 2018

Conversation

artidoro
Copy link
Contributor

Ongoing work on converting the transformers to estimators (#754). This PR converts the NA Indicator transform to estimator and adds the relative pigsty extensions.

@artidoro artidoro requested review from dotnet-bot, Ivanidzo4ka, singlis and sfilipi and removed request for dotnet-bot October 10, 2018 21:48
@artidoro artidoro added the API Issues pertaining the friendly API label Oct 10, 2018
@Zruty0 Zruty0 mentioned this pull request Oct 10, 2018
using System.Text;
using Microsoft.ML.Core.Data;
using Microsoft.ML.Runtime;
using Microsoft.ML.Runtime.CommandLine;
using Microsoft.ML.Runtime.Data;
using Microsoft.ML.Runtime.Data.Conversion;
using Microsoft.ML.Runtime.EntryPoints;
Copy link
Member

@sfilipi sfilipi Oct 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you Ctrl+R+G they will get re-ordered differently. #Resolved


namespace Microsoft.ML.Runtime.Data
{
/// <include file='doc.xml' path='doc/members/member[@name="NAIndicator"]'/>
public sealed class NAIndicatorTransform : OneToOneTransformBase
public sealed class NAIndicatorTransform : OneToOneTransformerBase
{
public sealed class Column : OneToOneColumn
Copy link
Member

@sfilipi sfilipi Oct 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public [](start = 8, length = 6)

as we go through this, if you have time to add a one line comment on the public members, a they will appear in the documentation. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok will do that!


In reply to: 224315913 [](ancestors = 224315913)

}

private const string RegistrationName = "NaIndicator";
private const string RegistrationName = nameof(NAIndicatorTransform);
Copy link
Member

@sfilipi sfilipi Oct 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nameof(NAIndicatorTransform); [](start = 48, length = 29)

The usage of nameof here is great! Shall we get rid of the LoadName and just substitute with this? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!


In reply to: 224316082 [](ancestors = 224316082)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree because of the above comment


In reply to: 224647220 [](ancestors = 224647220,224316082)

public NAIndicatorTransform(IHostEnvironment env, Arguments args, IDataView input)
: base(env, RegistrationName, Contracts.CheckRef(args, nameof(args)).Column,
input, TestType)
public static IDataTransform Create(IHostEnvironment env, Arguments args, IDataView input)
Copy link
Member

@sfilipi sfilipi Oct 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public [](start = 8, length = 6)

can it be internal? #Closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even private!


In reply to: 224316542 [](ancestors = 224316542)

NAIndicatorTransform.FriendlyName, NAIndicatorTransform.LoadName)]

[assembly: LoadableClass(typeof(IRowMapper), typeof(NAIndicatorTransform), null, typeof(SignatureLoadRowMapper),
NAIndicatorTransform.FriendlyName, NAIndicatorTransform.LoadName)]

namespace Microsoft.ML.Runtime.Data
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Oct 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runtime.Data [](start = 23, length = 12)

Transforms instead of Runtime.Data #Closed

/// <summary>
/// Factory method for SignatureDataTransform.
/// </summary>
public static NAIndicatorTransform Create(IHostEnvironment env, ModelLoadContext ctx)
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Oct 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make all of them (factory methods) private if you can.
also I still prefer to have // comment instead of summary. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I thought they were going to be public that's why I added the summary, but I'll change to comment then.


In reply to: 224554120 [](ancestors = 224554120)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not a constructor?


In reply to: 224647551 [](ancestors = 224647551,224554120)

{
// Output is normalized.
bldr.AddPrimitive(MetadataUtils.Kinds.IsNormalized, BoolType.Instance, true);
}
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Oct 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this metadata been properly propagated in estimator/ mapper #Closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. You need to have IsNormalized metadata


In reply to: 224554861 [](ancestors = 224554861)

}

[Fact]
public void NAIndicatorStatic()
Copy link
Contributor

@Zruty0 Zruty0 Oct 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NAIndicatorStatic [](start = 20, length = 17)

Please move next to other static tests #Closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided to group static tests together and other transformer tests together


In reply to: 224861996 [](ancestors = 224861996)

@Zruty0
Copy link
Contributor

Zruty0 commented Oct 12, 2018

            // REVIEW: temporary name

remove #Closed


Refers to: src/Microsoft.ML.Transforms/NAIndicatorTransform.cs:67 in 76a893a. [](commit_id = 76a893a, deletion_comment = False)

@@ -68,116 +78,196 @@ private static VersionInfo GetVersionInfo()
internal const string FriendlyName = "NA Indicator Transform";
internal const string ShortName = "NAInd";

private static string TestType(ColumnType type)
internal static string TestType(ColumnType type)
{
// Item type must have an NA value. We'll get the predicate again later when we're ready to use it.
Delegate del;
if (Conversions.Instance.TryGetIsNAPredicate(type.ItemType, out del))
return null;
return string.Format("Type '{0}' is not supported by {1} since it doesn't have an NA value",
Copy link
Contributor

@Zruty0 Zruty0 Oct 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format [](start = 26, length = 6)

use $ syntax #Closed

type, LoadName);
}

public class ColumnInfo
Copy link
Contributor

@Zruty0 Zruty0 Oct 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c class [](start = 13, length = 7)

sealed. Also, why do you need it? The base class is doing fine with just a (string input, string output) value tuple #Closed

}
}
return (inputTypes, outputTypes);
}

/// <summary>
Copy link
Contributor

@Zruty0 Zruty0 Oct 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

summary [](start = 13, length = 7)

a) this summary should change, because it is not for the public facing API.
Also the param comments are bad.

b) this constructor should not be public at all. The only way to train a trainable transformer should be via estimator's Fit call, not directly via some constructor #Closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I take it even further. This transformer should not be trainable at all. It should not memorize the column types, only names.
All of the column type stuff should belong to Mapper.


In reply to: 224863446 [](ancestors = 224863446)

private NAIndicatorTransform(IHost host, ModelLoadContext ctx, IDataView input)
: base(host, ctx, input, TestType)
/// <summary>
/// Factory method for SignatureDataTransform.
Copy link
Contributor

@Zruty0 Zruty0 Oct 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Factory method for SignatureDataTransform. [](start = 11, length = 43)

This is not true. This is a factory method for SignatureLoadModel #Closed

/// <summary>
/// Factory method for SignatureDataTransform.
/// </summary>
public static IDataTransform Create(IHostEnvironment env, IDataView input, params ColumnInfo[] columns)
Copy link
Contributor

@Zruty0 Zruty0 Oct 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create [](start = 37, length = 6)

It is not a factory method for anything, remove it altogether #Closed

var metadata = new List<SchemaShape.Column>();
if (col.Metadata.TryFindColumn(MetadataUtils.Kinds.SlotNames, out var slotMeta))
metadata.Add(slotMeta);
if (col.Metadata.TryFindColumn(MetadataUtils.Kinds.IsNormalized, out var normalized))
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Oct 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (col.Metadata.TryFindColumn(MetadataUtils.Kinds.IsNormalized, out var normalized)) [](start = 15, length = 86)

I think previously it got added unconditionally, and here you check source column #Closed

InputSchema.TryGetColumnIndex(_infos[iinfo].Input, out int colIndex);
Host.Assert(colIndex >= 0);
var colMetaInfo = new ColumnMetadataInfo(_infos[iinfo].Output);
var meta = RowColumnUtils.GetMetadataAsRow(InputSchema, colIndex, x => x == MetadataUtils.Kinds.SlotNames || x == MetadataUtils.Kinds.IsNormalized);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Oct 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

|| x == MetadataUtils.Kinds.IsNormalized [](start = 126, length = 40)

it should be unconditional. #Closed

}

[Fact]
public void TestOldSavingAndLoading()
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Oct 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add small metadata test? #Closed

: this(env, new Arguments() { Column = new[] { new Column() { Source = source ?? name, Name = name } } }, input)
/// <param name="env">The environment to use.</param>
/// <param name="columns">The names of the input columns of the transformation and the corresponding names for the output columns.</param>
internal NAIndicatorTransform(IHostEnvironment env, params (string input, string output)[] columns)
Copy link
Contributor

@Zruty0 Zruty0 Oct 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

internal [](start = 8, length = 8)

now it's safe to make this ctor public :) #Closed


// *** Binary format ***
// <base>
private static (string input, string output)[] GetColumnPairs(Column[] columns)
Copy link
Contributor

@Zruty0 Zruty0 Oct 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetColumnPairs [](start = 55, length = 14)

this entire method can be replaced with
columns.Select(c=>(c.Source ?? c.Name, c.Name)).ToArray() #Closed

Output = output;
InputType = inType;
OutputType = outType;
InputIsNA = GetIsNADelegate(InputType); ;
Copy link
Contributor

@Zruty0 Zruty0 Oct 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

; [](start = 59, length = 2)

remove #Closed


// The output column types, parallel to Infos.
private readonly ColumnType[] _types;
internal (string input, string output)[] GetColumnPairs() => ColumnPairs;
Copy link
Contributor

@Zruty0 Zruty0 Oct 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

internal [](start = 8, length = 8)

The pattern is :
public IReadOnlyList<(string input, string output)> Columns => ColumnPairs.AsReadOnly() #Resolved

@@ -84,7 +84,7 @@ public void KMeansEstimator()
public void TestEstimatorHogwildSGD()
{
(IEstimator<ITransformer> pipe, IDataView dataView) = GetBinaryClassificationPipeline();
pipe.Append(new StochasticGradientDescentClassificationTrainer(Env, "Features", "Label"));
pipe = pipe.Append(new StochasticGradientDescentClassificationTrainer(Env, "Features", "Label"));
Copy link
Contributor

@Zruty0 Zruty0 Oct 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pipe [](start = 19, length = 4)

nice catch :) #Resolved

Copy link
Contributor

@Zruty0 Zruty0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@artidoro artidoro closed this Oct 18, 2018
@artidoro artidoro reopened this Oct 18, 2018
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:


var dataView = ComponentCreation.CreateDataView(Env, data);
var args = new NAIndicatorTransform.Arguments();
args.Column = new NAIndicatorTransform.Column[] { new NAIndicatorTransform.Column() { Name = "NAA", Source = "A" } };
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Oct 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why you keep using arguments class? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I will change this!


In reply to: 226800773 [](ancestors = 226800773)

@artidoro artidoro merged commit 06179b8 into dotnet:master Oct 20, 2018
@yaeldekel yaeldekel mentioned this pull request Oct 20, 2018
@artidoro artidoro deleted the naindicator branch January 5, 2019 00:01
@ghost ghost locked as resolved and limited conversation to collaborators Mar 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants