Skip to content

Added Feature Contribution Calculation Transform #1677

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Nov 28, 2018
Merged

Added Feature Contribution Calculation Transform #1677

merged 24 commits into from
Nov 28, 2018

Conversation

ganik
Copy link
Member

@ganik ganik commented Nov 20, 2018

fixes #1644

/// The What The Feature scorer is superset of a generic scorer.
/// It outputs score columns from Generic Scorer plus for given features provides vector of corresponding feature contributions.
/// </summary>
public sealed class WhatTheFeatureScorerTransform
Copy link
Member Author

@ganik ganik Nov 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WhatTheFeatureScorerTransform [](start = 24, length = 29)

Need different name, pls suggest #Resolved

Copy link
Contributor

@rogancarr rogancarr Nov 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go with FeatureImportanceCalculationTransform for now. #Resolved

Copy link
Contributor

@rogancarr rogancarr Nov 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last naming change: Please name this with the prefix FeatureContribution rather than FeatureImportance. This will make it less ambiguous what it's calculating. #Resolved

// REVIEW: the scorer currently ignores the 'suffix' argument from the base class. It should respect it.
}

public static IDataScorerTransform Create(IHostEnvironment env, Arguments args, IDataView data, ISchemaBoundMapper mapper, RoleMappedSchema trainSchema)
Copy link
Member Author

@ganik ganik Nov 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDataScorerTransform [](start = 22, length = 20)

Will add samples & docs #Resolved

@rogancarr
Copy link
Contributor

rogancarr commented Nov 20, 2018

@ganik Here is an example usage from @Zruty0

var trainer = ml….FastTree()
var trees = trainer.Fit(data);
var WtfModel = new WtfTransformer(trees.Model, trees.FeatureColumn, otherWtfScorerArgs);

@Zruty0 What should the high-level API look like, starting from the mlContext object? #Resolved

@rogancarr
Copy link
Contributor

rogancarr commented Nov 20, 2018

@ganik Here is an example usage from @Zruty0

var trainer = ml….FastTree()
var trees = trainer.Fit(data);
var WtfModel = new WtfTransformer(trees.Model, trees.FeatureColumn, otherWtfScorerArgs);

@Zruty0 What should the high-level API look like, starting from the mlContext object?

@ganik I had an offline conversation with @Zruty0 about this. We don't need to put the WtfScorer into the Catalog. We can use direct instantiation. If we change our minds, the high-level APIs will come out post 0.8. #Resolved

@ganik ganik changed the title WIP: Adding WhatTheFeature Scorer Added Feature Importance Calculation Transform Nov 26, 2018
using Microsoft.ML.Runtime.Model;
using Microsoft.ML.Runtime.Numeric;

[assembly: LoadableClass(typeof(IDataScorerTransform), typeof(FeatureImportanceCalculationTransform), typeof(FeatureImportanceCalculationTransform.Arguments),
Copy link
Contributor

@rogancarr rogancarr Nov 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove all references to "WTF" #Resolved

/// The Feature Importance scorer is superset of a generic scorer.
/// It outputs score columns from Generic Scorer plus for given features provides vector of corresponding feature contributions.
/// </summary>
public sealed class FeatureImportanceCalculationTransform
Copy link
Contributor

@rogancarr rogancarr Nov 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FeatureImportanceCalculationTransform => FeatureContributionCalculationTransform #Resolved

Bottom = 10,
Top = 10
};
var output = FeatureImportanceCalculationTransform.Create(Env, args, data, model.Model, model.FeatureColumn);
Copy link
Contributor

@TomFinley TomFinley Nov 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FeatureImportanceCalculationTransform.Create [](start = 25, length = 44)

Hi @ganik thanks for working on this. Please see #581 and #754, and read.

Central to the idea, as we see in #581, is that we are making the things that trains a transform, is a transform, and is the transformed data, three separate things (IEstimator, ITransformer, IDataView) rather than having them conflated in one big huge thing as we have done previously (e.g., the IDataLoader and IDataTransform interfaces).

We have had, by now, some dozens of issues and PRs meant for the work we are doing to do the conversion from IDataTransform based scheme, to the IEstimator and ITransformer based scheme, as I think you are aware since you have done some of this conversion yourself. (E.g., your work in #1254 where you successfully converted some of the old timeseries transformers over to estimator land.)

So let's review this idiom here. This idiom where you've created the transform that is also simultaneously the data is an old concept that was used in the internal tool, but we do not want people to be exposed to because it is incredibly confusing unless you're somehow expecting it. So: if you want to ship this, we should not expose this to users yet, until we structure it properly.

We might consider staging the work so that we internally ship this scorer, but not announce it in any way. #ByDesign

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally agreed. The right way to go is to implement estimator API. This is scheduled for 0.9. This particular change is a staged change that doesn't need to get announced.


In reply to: 236428108 [](ancestors = 236428108)

Copy link
Contributor

@rogancarr rogancarr Nov 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great feedback and a good plan. Getting this in 0.8 with the current design would let us address issues like #1675 without needing to wait for the refactoring to be finished. #Resolved

@rogancarr
Copy link
Contributor

rogancarr commented Nov 26, 2018

I've added a sample showing how this can be used. #Resolved

}
}

private sealed class WtfSchema : ISchema
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Nov 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wtf [](start = 29, length = 3)

I would suggest to search "Wtf" in this code, and replace it with something else. #Resolved

Copy link
Contributor

@rogancarr rogancarr Nov 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest FeatureContribution. #Resolved

@ganik ganik changed the title Added Feature Importance Calculation Transform Added Feature Contribution Calculation Transform Nov 27, 2018
{
// REVIEW: checking this a bit too late.
Host.Check(_whatTheFeature != null, "Predictor does not implement IWhatTheFeatureValueMapper");
return _whatTheFeature.GetWhatTheFeatureMapper<TSrc, TDst>(top, bottom, normalize);
Host.Check(_featureContribution != null, "Predictor does not implement IFeatureContributionMapper");
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Nov 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IFeatureContributionMapper [](start = 83, length = 26)

nameof maybe? #Resolved

[Argument(ArgumentType.AtMostOnce, HelpText = "Number of bottom contributions", SortOrder = 2)]
public int Bottom = 10;

[Argument(ArgumentType.AtMostOnce, HelpText = "Whether or not output of Features' contribution should be normalized", ShortName = "norm", SortOrder = 3)]
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Nov 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

' [](start = 92, length = 1)

that apos is inconsistent between norm and stringify #Resolved

using Microsoft.ML.Runtime.Numeric;

[assembly: LoadableClass(typeof(IDataScorerTransform), typeof(FeatureContributionCalculationTransform), typeof(FeatureContributionCalculationTransform.Arguments),
typeof(SignatureDataScorer), "Feature Importance Scorer", "wtf", "FeatureImportanceCalculationScorer", MetadataUtils.Const.ScoreColumnKind.FeatureContribution)]
Copy link
Contributor

@TomFinley TomFinley Nov 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"wtf" [](start = 62, length = 5)

I could be wrong, but I seem to recall that we would change the user facing names of these things. I think we ought to have SignatureLoadModel remain the same for backwards compatibility. #Resolved

Copy link
Contributor

@rogancarr rogancarr Nov 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Feature Importance Scorer", "wtf", "FeatureImportanceCalculationScorer"=>"Feature Contribution Transform", "FeatureContributionCalculationTransform"` #Resolved

using Microsoft.ML.Runtime.Numeric;

[assembly: LoadableClass(typeof(IDataScorerTransform), typeof(FeatureContributionCalculationTransform), typeof(FeatureContributionCalculationTransform.Arguments),
typeof(SignatureDataScorer), "Feature Importance Scorer", "wtf", "FeatureImportanceCalculationScorer", MetadataUtils.Const.ScoreColumnKind.FeatureContribution)]
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Nov 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wtf [](start = 63, length = 3)

I think @[email protected] is fine to get rid of "wtf" from command line and replace with something else like "FI" or "FIS" #Resolved

{
// Apparently, loader signature is limited in length to 24 characters.
internal const string MapperLoaderSignature = "WTFBindable";
private const string LoaderSignature = "WTFScorer";
Copy link
Contributor

@TomFinley TomFinley Nov 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private const string LoaderSignature = "WTFScorer"; [](start = 8, length = 51)

So, what is this for? #Resolved

namespace Microsoft.ML.Runtime.Data
{
/// <summary>
/// The Feature Importance scorer is superset of a generic scorer.
Copy link
Member

@wschin wschin Nov 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this doc string is not long enough. External users who reads this file might not know generic scorer and other concepts in ML.NET. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rogancarr could you take care of this? thx


In reply to: 236859712 [](ancestors = 236859712)

Copy link
Contributor

@rogancarr rogancarr Nov 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a basic description to the transform. As per @shauheen 's, advice, let's focus on the code now and finalize the doc strings after code complete. #Resolved

sb.Remove(sb.Length - 2, 2);
}

dst = new ReadOnlyMemory<char>(sb.ToString().ToCharArray());
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Nov 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new ReadOnlyMemory(sb.ToString().ToCharArray()); [](start = 30, length = 54)

sb.ToString.AsMemory() #Resolved

_env.Assert(count > index || count == 0 && slotNames.Length > index);
var slotName = slotNames.GetItemOrDefault(index);
return slotName.IsEmpty
? new ReadOnlyMemory<char>($"f{index}".ToCharArray())
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Nov 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new ReadOnlyMemory($"f{index}".ToCharArray()) [](start = 22, length = 51)

$"f{index}".AsMemory() #Resolved

{
_env.Assert(sb.Length >= 2);
sb.Remove(sb.Length - 2, 2);
}
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Nov 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a) why we doing this?
b) wouldn't it be easier to have if (sb.Length>=2) condition? #WontFix

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its not the same, assert here, so if its bigger than 0 then it should be bigger or equal than 2. Dont want to change this logic since this is how it is originally in internal repo


In reply to: 236893534 [](ancestors = 236893534)

map(in features, ref contributions);
var indices = new Span<int>();
var values = new Span<float>();
(contributions.IsDense ? Utils.GetIdentityPermutation(contributions.Length).AsSpan() : contributions.GetIndices()).CopyTo(indices);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Nov 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Utils.GetIdentityPermutation(contributions.Length).AsSpan() [](start = 49, length = 59)

what is going on here? #Resolved

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ganik

if (args.Top <= 0 || args.Top > MaxTopBottom)
throw env.Except($"Number of top contribution must be in range (0,{MaxTopBottom}]");
if (args.Bottom <= 0 || args.Bottom > MaxTopBottom)
throw env.Except($"Number of bottom contribution must be in range (0,{MaxTopBottom}]");
Copy link
Member

@singlis singlis Nov 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit, I would move this into a helper function. ValidateTopBottomArguments? #WontFix


public readonly IFeatureContributionMapper Predictor;
public readonly ISchemaBindableMapper GenericMapper;
public readonly bool Stringify;
Copy link
Member

@singlis singlis Nov 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit - I would group all the properties together (above VersionInfo). #Resolved


Func<IRow, int, VBuffer<ReadOnlyMemory<char>>, ValueGetter<ReadOnlyMemory<char>>> del = GetTextValueGetter<int>;
var meth = del.GetMethodInfo().GetGenericMethodDefinition().MakeGenericMethod(typeSrc.RawType);
return (Delegate)meth.Invoke(this, new object[] { input, colSrc, slotNames });
Copy link
Member

@singlis singlis Nov 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you can use Utils.MarshalInvoke here since there is only one generic argument that is needed in the function. #WontFix

/// ]]>
/// </format>
/// </example>
public sealed class FeatureContributionCalculationTransform
Copy link
Member

@singlis singlis Nov 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im guessing there will need to be an estimator too at some point. What would that be called? Does that affect the name of the transform at all. Note for Estimators have the format of ItemEstimator (ColumnCopyingEstimator). #ByDesign

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes there will be, name yet to be determined. Dont think name will be affected, also should be ok to change transform name if needed


In reply to: 236903151 [](ancestors = 236903151)

Copy link
Member

@singlis singlis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@rogancarr rogancarr merged commit 200eef7 into dotnet:master Nov 28, 2018
{
featureGetter(ref features);
map(in features, ref contributions);
var indices = new Span<int>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is pretty wrong. You are making an empty span, and then trying to copy things into it (which certainly will throw an exception if the thing you are trying to copy has any elements).

Are we testing this code? I don't see how it could be working.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the tests cover this part of the code?

@ganik ganik deleted the ganik/wf branch December 20, 2018 04:23
@ghost ghost locked as resolved and limited conversation to collaborators Mar 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WhatTheFeature Scorer is Missing in ML.NET
7 participants