Skip to content

Role mapped improvements #496

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jul 5, 2018
Merged

Conversation

TomFinley
Copy link
Contributor

Fixes #445.

  • Generally, favors creating RoleMappedSchema/RoleMappedData by actual constructors, since that's how objects are generally created.
  • Concentrated the actual globally useful "conveniences" inside the classes themselves, rather than in completely undiscoverable Utils classes.
  • Got rid of the Create and CreateOpt idiom that required that we declare every creation method twice in favor of a simpler bool parameter on the constructors.
  • Added documentation.

@@ -312,7 +312,6 @@ public static bool HasSlotNames(this ISchema schema, int col, int vectorSize)
public static void GetSlotNames(RoleMappedSchema schema, RoleMappedSchema.ColumnRole role, int vectorSize, ref VBuffer<DvText> slotNames)
{
Contracts.CheckValueOrNull(schema);
Contracts.CheckValue(role.Value, nameof(role));
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contracts.CheckValue(role.Value, nameof(role)); [](start = 12, length = 47)

you removing it because ColumnRole has only one constuctor in which we invoke Contracts.CheckNonEmpty, so Value always gonna have Value, right? #Closed

Copy link
Contributor Author

@TomFinley TomFinley Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. In the issue #445 when I said "Also a fair amount of code exists in the class to detect conditions that cannot possibly happen." this is partially what I am referring to. Indeed I think it's the only real reason to have a separate struct instead of just using string directly, so it's a little weird that we spent so much effort checking and rechecking. #Closed

/// <param name="schema">The schema over which roles are defined</param>
/// <param name="roles">The column role to column name mappings</param>
/// <param name="opt">Whether to consider the column names specified "optional" or not. If <c>false</c> then any non-empty
/// values for the column names that does not appear in <paramref name="schema"/> will result iin an exception being thrown,
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iin [](start = 102, length = 3)

iin #Closed

public static RoleMappedSchema Create(ISchema schema)
/// <param name="schema">The schema over which roles are defined</param>
/// <param name="opt">Whether to consider the column names specified "optional" or not. If <c>false</c> then any non-empty
/// values for the column names that does not appear in <paramref name="schema"/> will result iin an exception being thrown,
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iin [](start = 102, length = 3)

iin #Closed

@@ -363,45 +380,82 @@ public ColumnInfo GetUniqueColumn(ColumnRole role)
}

/// <summary>
/// Creates a RoleMappedSchema from the given schema with no column role assignments.
/// Constructor from the given schema and role/column-name pairs.
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

role/column-name [](start = 50, length = 16)

I can read it as role | column-name pairs or
(role or column) | name pair.
Which probably show my lack of English, but still can it be rephrased? #Closed

Copy link
Contributor Author

@TomFinley TomFinley Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm OK. I guess I could omit names since columns are always identified by name, that might make clear I'm not talking about the strings used to construct the roles in the first place. #Closed

if (role.Value != null && _map.TryGetValue(role.Value, out list))
return list;
return null;
return _map.TryGetValue(role.Value, out var list) ? list : null;
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return [](start = 12, length = 6)

it look strange to see all methods above get converted from return to => and this one still use return. #Closed

@@ -4,6 +4,7 @@

using System;
using System.Collections.Generic;
using System.Linq;
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using System.Linq [](start = 0, length = 17)

My VS highlight gray using System and using System.Linq for me and suggest to remove them as unused. #Closed

@@ -36,7 +37,7 @@ public static ColumnInfo CreateFromName(ISchema schema, string name, string desc
{
ColumnInfo colInfo;
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ColumnInfo colInfo; [](start = 12, length = 19)

can be moved into method call
out ColumnInfo colInfo #Closed

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Jul 5, 2018

        int index;

can be moved to method call.
(I wouldn't write this comment generally, but you done this thing in all other places in this file, so they look like outliers.) #Closed


Refers to: src/Microsoft.ML.Core/Data/RoleMappedSchema.cs:55 in 158b653. [](commit_id = 158b653, deletion_comment = False)

@@ -98,68 +112,99 @@ public sealed class RoleMappedSchema
private const string NameString = "Name";
private const string FeatureContributionsString = "FeatureContributions";

/// <summary>
/// Instances of this are the keys of a <see cref="RoleMappedSchema"/>. This class also some important commonly used
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some [](start = 95, length = 6)

this phrasing looks weird for me, but what do I know #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know lots Ivan!


In reply to: 200436663 [](ancestors = 200436663)

/// <summary>
/// Role for sample weights. Commonly used to point to a number to make trainers give more weight
/// to a particular example.
/// </summary>
public static ColumnRole Weight => WeightString;
public static ColumnRole Name => NameString;
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public static ColumnRole Name => NameString; [](start = 12, length = 44)

considering you make reference to this column (see line 205) it's probably make sense to add description for this as well. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes... also I think FeatureContributions should be done away with, I'm not sure why it is here. Maybe someone thought once upon a time that for a column role to be used, it has to be here.


In reply to: 200438589 [](ancestors = 200438589)

// REVIEW: Does this really belong here?
/// <summary>
/// Role for feature contributions. Useful for specific diagnostic functionality.
/// </summary>
public static ColumnRole FeatureContributions => FeatureContributionsString;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FeatureContributions [](start = 37, length = 20)

I don't really feel like this actually belongs here... certainly in ML.NET there are no usages, and the usages elsewhere are heavily localized so putting a custom role in the basic class is sort of odd... it would be better to have the score-roles.

@@ -98,68 +108,109 @@ public sealed class RoleMappedSchema
private const string NameString = "Name";
private const string FeatureContributionsString = "FeatureContributions";

/// <summary>
/// Instances of this are the keys of a <see cref="RoleMappedSchema"/>. This class also some holds important
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some holds [](start = 96, length = 10)

are you sure it's correct order? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Um no. Sorry. :) Me speak english pretty one day.


In reply to: 200456054 [](ancestors = 200456054)

var schema = TrainUtils.CreateRoleMappedSchemaOpt(input.Schema, featureColName, groupColName, customColumns);
ISchemaBoundMapper mapper;
var sc = GetScorerComponentAndMapper(predictor, scorer, schema, env, out mapper);
var schema = new RoleMappedSchema(input.Schema, label: null, feature: featureColName, group: groupColName, opt: true);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you lost customColumns here.
at least we used to pass them. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops


In reply to: 200457252 [](ancestors = 200457252)

@@ -379,8 +377,9 @@ public Dense(int count, string[] names)
if (size > 0)
Array.Copy(names, _names, size);

_schema = RoleMappedSchema.Create(new FeatureNameCollectionSchema(this),
RoleMappedSchema.CreatePair(RoleMappedSchema.ColumnRole.Feature, RoleMappedSchema.ColumnRole.Feature.Value));
// REVIEW: This seems wrong. The default feature column name is "Features" yet the role is named "Feature".
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// REVIEW: This seems wrong. The default feature column name is "Features" yet the role is named "Feature". [](start = 16, length = 107)

Should we move this review to RoleMappedSchema file? #ByDesign

Copy link
Contributor Author

@TomFinley TomFinley Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not really waht I mean. I think it's fine to have the role be "feature", but the column name be "features." What I meant was, this very specific usage of it in this file.


In reply to: 200457828 [](ancestors = 200457828)

@@ -127,11 +127,11 @@ protected MamlEvaluatorBase(ArgumentsBase args, IHostEnvironment env, string sco

// Get the label column information.
string lab = EvaluateUtils.GetColName(LabelCol, schema.Label, DefaultColumnNames.Label);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string lab [](start = 12, length = 10)

can we change it to var label?
it just looks odd to have lab and then few lines later weight. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind if I go the other way? I like to be explicit with my types (in this case not very obvious) unless it's something longer, or the type is immediately obvious to a reader. (Except int... don't get people that aren't explicit about int.)


In reply to: 200458796 [](ancestors = 200458796)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you can probably see from my comment, I tend to prefer var, but i don't have problem with explicit types, it was more about shortening label to lab, which for some reason bugs me enormously.


In reply to: 200460563 [](ancestors = 200460563,200458796)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see...


In reply to: 200462127 [](ancestors = 200462127,200460563,200458796)

@@ -185,7 +185,7 @@ public void Train(List<FeatureSubsetModel<IPredictorProducing<TOutput>>> models,
bldr.AddColumn("Features", NumberType.Float, features);

var view = bldr.GetDataView();
var rmd = RoleMappedData.Create(view, ColumnRole.Label.Bind("Label"), ColumnRole.Feature.Bind("Features"));
var rmd = new RoleMappedData(view, "Label", "Features");
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Label", "Features" [](start = 51, length = 19)

can we replace it with DefaultColumnNames.Label, DefaultColumnNames.Features
same for line 184 and line 185 #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good one. Also SmacSweeper...


In reply to: 200460321 [](ancestors = 200460321)

_dataForEvaluator = RoleMappedData.Create(builder.GetDataView(), RoleMappedSchema.ColumnRole.Label.Bind("Label"),
RoleMappedSchema.CreatePair(MetadataUtils.Const.ScoreValueKind.Score, "Score"));
_dataForEvaluator = new RoleMappedData(builder.GetDataView(), opt: false, RoleMappedSchema.ColumnRole.Label.Bind("Label"),
new RoleMappedSchema.ColumnRole(MetadataUtils.Const.ScoreValueKind.Score).Bind("Score"));
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DefaultColumnNames.Label, DefaultColumnNames.Score ? same for few lines above #Closed

@@ -902,7 +902,7 @@ public bool CanWeightComputationBeAvoided(Float instanceDistanceToBestOldCluster
arrDv.AddColumn("Features", PrimitiveType.FromKind(DataKind.R4), clusters);
arrDv.AddColumn("Weights", PrimitiveType.FromKind(DataKind.R4), totalWeights);
var subDataViewCursorFactory = new FeatureFloatVectorCursor.Factory(
TrainUtils.CreateExamples(arrDv.GetDataView(), null, "Features", weight: "Weights"), CursOpt.Weight | CursOpt.Features);
new RoleMappedData(arrDv.GetDataView(), null, "Features", weight: "Weights"), CursOpt.Weight | CursOpt.Features);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Features [](start = 71, length = 8)

DefaultColumnNames? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. And I see here instead of DefaultColumnNames.Weight which is "Weight" we instead used "Weights", which is not great. (Just inconsistent, probably doesn't change things.)


In reply to: 200460833 [](ancestors = 200460833)

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Jul 5, 2018

        dvBuilder.AddColumn("Features", NumberType.Float, features);

DefaultColumnNames? #Closed


Refers to: src/Microsoft.ML.Sweeper/Algorithms/SmacSweeper.cs:130 in b405fe0. [](commit_id = b405fe0, deletion_comment = False)

var rmd = RoleMappedData.Create(splitOutput.TrainData[0],
RoleMappedSchema.CreatePair(RoleMappedSchema.ColumnRole.Feature, "Features"),
RoleMappedSchema.CreatePair(RoleMappedSchema.ColumnRole.Label, "Label"));
var rmd = new RoleMappedData(splitOutput.TrainData[0], "Label", "Features");
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Features" [](start = 76, length = 10)

DefaultColumnNames? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little less excited about this for unit tests.


In reply to: 200461236 [](ancestors = 200461236)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's totally up to you, and probably outside of scope for this PR :)


In reply to: 200461719 [](ancestors = 200461719,200461236)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind that. I think I ought to clean up the actual production code to conform to best practices as you suggest. The tests though -- the goal of those is a little different. In the actual code some obscurity to be assured of consistency is good, but I'm not sure obscurity works for tests.


In reply to: 200462334 [](ancestors = 200462334,200461719,200461236)

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

var subDataViewCursorFactory = new FeatureFloatVectorCursor.Factory(
TrainUtils.CreateExamples(arrDv.GetDataView(), null, "Features", weight: "Weights"), CursOpt.Weight | CursOpt.Features);
new RoleMappedData(arrDv.GetDataView(), null, DefaultColumnNames.Features,weight: DefaultColumnNames.Weight), CursOpt.Weight | CursOpt.Features);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatting: space before weight parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zeahmed !


In reply to: 200496962 [](ancestors = 200496962)

Copy link
Contributor

@zeahmed zeahmed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks @TomFinley for bring these conveniences around core API.

@TomFinley
Copy link
Contributor Author

@dotnet-bot test Windows_NT Debug

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@TomFinley TomFinley merged commit f85e722 into dotnet:master Jul 5, 2018
@TomFinley TomFinley deleted the tfinley/RoleMap branch July 5, 2018 23:42
eerhardt pushed a commit to eerhardt/machinelearning that referenced this pull request Jul 27, 2018
* RoleMappedSchema/Data change to use constructors
* Nuke all create methods, pointless "no-roles" constructor.
* Nuke TrainUtils.CreateExamples/CreateExamplesOpt
* Opportunistically improve code quality and reporting of Kmeans++
@ghost ghost locked as resolved and limited conversation to collaborators Mar 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants