-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Role mapped improvements #496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d618134
to
158b653
Compare
@@ -312,7 +312,6 @@ public static bool HasSlotNames(this ISchema schema, int col, int vectorSize) | |||
public static void GetSlotNames(RoleMappedSchema schema, RoleMappedSchema.ColumnRole role, int vectorSize, ref VBuffer<DvText> slotNames) | |||
{ | |||
Contracts.CheckValueOrNull(schema); | |||
Contracts.CheckValue(role.Value, nameof(role)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Contracts.CheckValue(role.Value, nameof(role)); [](start = 12, length = 47)
you removing it because ColumnRole has only one constuctor in which we invoke Contracts.CheckNonEmpty, so Value always gonna have Value, right? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. In the issue #445 when I said "Also a fair amount of code exists in the class to detect conditions that cannot possibly happen." this is partially what I am referring to. Indeed I think it's the only real reason to have a separate struct instead of just using string
directly, so it's a little weird that we spent so much effort checking and rechecking. #Closed
/// <param name="schema">The schema over which roles are defined</param> | ||
/// <param name="roles">The column role to column name mappings</param> | ||
/// <param name="opt">Whether to consider the column names specified "optional" or not. If <c>false</c> then any non-empty | ||
/// values for the column names that does not appear in <paramref name="schema"/> will result iin an exception being thrown, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iin [](start = 102, length = 3)
iin #Closed
public static RoleMappedSchema Create(ISchema schema) | ||
/// <param name="schema">The schema over which roles are defined</param> | ||
/// <param name="opt">Whether to consider the column names specified "optional" or not. If <c>false</c> then any non-empty | ||
/// values for the column names that does not appear in <paramref name="schema"/> will result iin an exception being thrown, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iin [](start = 102, length = 3)
iin #Closed
@@ -363,45 +380,82 @@ public ColumnInfo GetUniqueColumn(ColumnRole role) | |||
} | |||
|
|||
/// <summary> | |||
/// Creates a RoleMappedSchema from the given schema with no column role assignments. | |||
/// Constructor from the given schema and role/column-name pairs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
role/column-name [](start = 50, length = 16)
I can read it as role | column-name pairs or
(role or column) | name pair.
Which probably show my lack of English, but still can it be rephrased? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm OK. I guess I could omit names since columns are always identified by name, that might make clear I'm not talking about the strings used to construct the roles in the first place. #Closed
if (role.Value != null && _map.TryGetValue(role.Value, out list)) | ||
return list; | ||
return null; | ||
return _map.TryGetValue(role.Value, out var list) ? list : null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return [](start = 12, length = 6)
it look strange to see all methods above get converted from return to => and this one still use return. #Closed
@@ -4,6 +4,7 @@ | |||
|
|||
using System; | |||
using System.Collections.Generic; | |||
using System.Linq; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using System.Linq [](start = 0, length = 17)
My VS highlight gray using System and using System.Linq for me and suggest to remove them as unused. #Closed
@@ -36,7 +37,7 @@ public static ColumnInfo CreateFromName(ISchema schema, string name, string desc | |||
{ | |||
ColumnInfo colInfo; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ColumnInfo colInfo; [](start = 12, length = 19)
can be moved into method call
out ColumnInfo colInfo #Closed
@@ -98,68 +112,99 @@ public sealed class RoleMappedSchema | |||
private const string NameString = "Name"; | |||
private const string FeatureContributionsString = "FeatureContributions"; | |||
|
|||
/// <summary> | |||
/// Instances of this are the keys of a <see cref="RoleMappedSchema"/>. This class also some important commonly used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some [](start = 95, length = 6)
this phrasing looks weird for me, but what do I know #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// <summary> | ||
/// Role for sample weights. Commonly used to point to a number to make trainers give more weight | ||
/// to a particular example. | ||
/// </summary> | ||
public static ColumnRole Weight => WeightString; | ||
public static ColumnRole Name => NameString; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public static ColumnRole Name => NameString; [](start = 12, length = 44)
considering you make reference to this column (see line 205) it's probably make sense to add description for this as well. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes... also I think FeatureContributions should be done away with, I'm not sure why it is here. Maybe someone thought once upon a time that for a column role to be used, it has to be here.
In reply to: 200438589 [](ancestors = 200438589)
// REVIEW: Does this really belong here? | ||
/// <summary> | ||
/// Role for feature contributions. Useful for specific diagnostic functionality. | ||
/// </summary> | ||
public static ColumnRole FeatureContributions => FeatureContributionsString; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FeatureContributions [](start = 37, length = 20)
I don't really feel like this actually belongs here... certainly in ML.NET there are no usages, and the usages elsewhere are heavily localized so putting a custom role in the basic class is sort of odd... it would be better to have the score-roles.
@@ -98,68 +108,109 @@ public sealed class RoleMappedSchema | |||
private const string NameString = "Name"; | |||
private const string FeatureContributionsString = "FeatureContributions"; | |||
|
|||
/// <summary> | |||
/// Instances of this are the keys of a <see cref="RoleMappedSchema"/>. This class also some holds important |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some holds [](start = 96, length = 10)
are you sure it's correct order? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var schema = TrainUtils.CreateRoleMappedSchemaOpt(input.Schema, featureColName, groupColName, customColumns); | ||
ISchemaBoundMapper mapper; | ||
var sc = GetScorerComponentAndMapper(predictor, scorer, schema, env, out mapper); | ||
var schema = new RoleMappedSchema(input.Schema, label: null, feature: featureColName, group: groupColName, opt: true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you lost customColumns here.
at least we used to pass them. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -379,8 +377,9 @@ public Dense(int count, string[] names) | |||
if (size > 0) | |||
Array.Copy(names, _names, size); | |||
|
|||
_schema = RoleMappedSchema.Create(new FeatureNameCollectionSchema(this), | |||
RoleMappedSchema.CreatePair(RoleMappedSchema.ColumnRole.Feature, RoleMappedSchema.ColumnRole.Feature.Value)); | |||
// REVIEW: This seems wrong. The default feature column name is "Features" yet the role is named "Feature". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// REVIEW: This seems wrong. The default feature column name is "Features" yet the role is named "Feature". [](start = 16, length = 107)
Should we move this review to RoleMappedSchema file? #ByDesign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not really waht I mean. I think it's fine to have the role be "feature", but the column name be "features." What I meant was, this very specific usage of it in this file.
In reply to: 200457828 [](ancestors = 200457828)
@@ -127,11 +127,11 @@ protected MamlEvaluatorBase(ArgumentsBase args, IHostEnvironment env, string sco | |||
|
|||
// Get the label column information. | |||
string lab = EvaluateUtils.GetColName(LabelCol, schema.Label, DefaultColumnNames.Label); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string lab [](start = 12, length = 10)
can we change it to var label?
it just looks odd to have lab and then few lines later weight. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind if I go the other way? I like to be explicit with my types (in this case not very obvious) unless it's something longer, or the type is immediately obvious to a reader. (Except int
... don't get people that aren't explicit about int
.)
In reply to: 200458796 [](ancestors = 200458796)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you can probably see from my comment, I tend to prefer var, but i don't have problem with explicit types, it was more about shortening label to lab, which for some reason bugs me enormously.
In reply to: 200460563 [](ancestors = 200460563,200458796)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -185,7 +185,7 @@ public void Train(List<FeatureSubsetModel<IPredictorProducing<TOutput>>> models, | |||
bldr.AddColumn("Features", NumberType.Float, features); | |||
|
|||
var view = bldr.GetDataView(); | |||
var rmd = RoleMappedData.Create(view, ColumnRole.Label.Bind("Label"), ColumnRole.Feature.Bind("Features")); | |||
var rmd = new RoleMappedData(view, "Label", "Features"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Label", "Features" [](start = 51, length = 19)
can we replace it with DefaultColumnNames.Label, DefaultColumnNames.Features
same for line 184 and line 185 #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_dataForEvaluator = RoleMappedData.Create(builder.GetDataView(), RoleMappedSchema.ColumnRole.Label.Bind("Label"), | ||
RoleMappedSchema.CreatePair(MetadataUtils.Const.ScoreValueKind.Score, "Score")); | ||
_dataForEvaluator = new RoleMappedData(builder.GetDataView(), opt: false, RoleMappedSchema.ColumnRole.Label.Bind("Label"), | ||
new RoleMappedSchema.ColumnRole(MetadataUtils.Const.ScoreValueKind.Score).Bind("Score")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DefaultColumnNames.Label, DefaultColumnNames.Score ? same for few lines above #Closed
@@ -902,7 +902,7 @@ public bool CanWeightComputationBeAvoided(Float instanceDistanceToBestOldCluster | |||
arrDv.AddColumn("Features", PrimitiveType.FromKind(DataKind.R4), clusters); | |||
arrDv.AddColumn("Weights", PrimitiveType.FromKind(DataKind.R4), totalWeights); | |||
var subDataViewCursorFactory = new FeatureFloatVectorCursor.Factory( | |||
TrainUtils.CreateExamples(arrDv.GetDataView(), null, "Features", weight: "Weights"), CursOpt.Weight | CursOpt.Features); | |||
new RoleMappedData(arrDv.GetDataView(), null, "Features", weight: "Weights"), CursOpt.Weight | CursOpt.Features); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Features [](start = 71, length = 8)
DefaultColumnNames? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. And I see here instead of DefaultColumnNames.Weight
which is "Weight"
we instead used "Weights"
, which is not great. (Just inconsistent, probably doesn't change things.)
In reply to: 200460833 [](ancestors = 200460833)
var rmd = RoleMappedData.Create(splitOutput.TrainData[0], | ||
RoleMappedSchema.CreatePair(RoleMappedSchema.ColumnRole.Feature, "Features"), | ||
RoleMappedSchema.CreatePair(RoleMappedSchema.ColumnRole.Label, "Label")); | ||
var rmd = new RoleMappedData(splitOutput.TrainData[0], "Label", "Features"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Features" [](start = 76, length = 10)
DefaultColumnNames? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little less excited about this for unit tests.
In reply to: 200461236 [](ancestors = 200461236)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's totally up to you, and probably outside of scope for this PR :)
In reply to: 200461719 [](ancestors = 200461719,200461236)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind that. I think I ought to clean up the actual production code to conform to best practices as you suggest. The tests though -- the goal of those is a little different. In the actual code some obscurity to be assured of consistency is good, but I'm not sure obscurity works for tests.
In reply to: 200462334 [](ancestors = 200462334,200461719,200461236)
…re, other improvements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var subDataViewCursorFactory = new FeatureFloatVectorCursor.Factory( | ||
TrainUtils.CreateExamples(arrDv.GetDataView(), null, "Features", weight: "Weights"), CursOpt.Weight | CursOpt.Features); | ||
new RoleMappedData(arrDv.GetDataView(), null, DefaultColumnNames.Features,weight: DefaultColumnNames.Weight), CursOpt.Weight | CursOpt.Features); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
formatting: space before weight
parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thanks @TomFinley for bring these conveniences around core API.
@dotnet-bot test Windows_NT Debug |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* RoleMappedSchema/Data change to use constructors * Nuke all create methods, pointless "no-roles" constructor. * Nuke TrainUtils.CreateExamples/CreateExamplesOpt * Opportunistically improve code quality and reporting of Kmeans++
Fixes #445.
RoleMappedSchema
/RoleMappedData
by actual constructors, since that's how objects are generally created.Utils
classes.Create
andCreateOpt
idiom that required that we declare every creation method twice in favor of a simplerbool
parameter on the constructors.