-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Added Feature Contribution Calculation Transform #1677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/// The What The Feature scorer is superset of a generic scorer. | ||
/// It outputs score columns from Generic Scorer plus for given features provides vector of corresponding feature contributions. | ||
/// </summary> | ||
public sealed class WhatTheFeatureScorerTransform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WhatTheFeatureScorerTransform [](start = 24, length = 29)
Need different name, pls suggest #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's go with FeatureImportanceCalculationTransform
for now. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last naming change: Please name this with the prefix FeatureContribution
rather than FeatureImportance
. This will make it less ambiguous what it's calculating. #Resolved
// REVIEW: the scorer currently ignores the 'suffix' argument from the base class. It should respect it. | ||
} | ||
|
||
public static IDataScorerTransform Create(IHostEnvironment env, Arguments args, IDataView data, ISchemaBoundMapper mapper, RoleMappedSchema trainSchema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IDataScorerTransform [](start = 22, length = 20)
Will add samples & docs #Resolved
@ganik I had an offline conversation with @Zruty0 about this. We don't need to put the |
using Microsoft.ML.Runtime.Model; | ||
using Microsoft.ML.Runtime.Numeric; | ||
|
||
[assembly: LoadableClass(typeof(IDataScorerTransform), typeof(FeatureImportanceCalculationTransform), typeof(FeatureImportanceCalculationTransform.Arguments), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove all references to "WTF" #Resolved
/// The Feature Importance scorer is superset of a generic scorer. | ||
/// It outputs score columns from Generic Scorer plus for given features provides vector of corresponding feature contributions. | ||
/// </summary> | ||
public sealed class FeatureImportanceCalculationTransform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FeatureImportanceCalculationTransform
=> FeatureContributionCalculationTransform
#Resolved
Bottom = 10, | ||
Top = 10 | ||
}; | ||
var output = FeatureImportanceCalculationTransform.Create(Env, args, data, model.Model, model.FeatureColumn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FeatureImportanceCalculationTransform.Create [](start = 25, length = 44)
Hi @ganik thanks for working on this. Please see #581 and #754, and read.
Central to the idea, as we see in #581, is that we are making the things that trains a transform, is a transform, and is the transformed data, three separate things (IEstimator
, ITransformer
, IDataView
) rather than having them conflated in one big huge thing as we have done previously (e.g., the IDataLoader
and IDataTransform
interfaces).
We have had, by now, some dozens of issues and PRs meant for the work we are doing to do the conversion from IDataTransform
based scheme, to the IEstimator
and ITransformer
based scheme, as I think you are aware since you have done some of this conversion yourself. (E.g., your work in #1254 where you successfully converted some of the old timeseries transformers over to estimator land.)
So let's review this idiom here. This idiom where you've created the transform that is also simultaneously the data is an old concept that was used in the internal tool, but we do not want people to be exposed to because it is incredibly confusing unless you're somehow expecting it. So: if you want to ship this, we should not expose this to users yet, until we structure it properly.
We might consider staging the work so that we internally ship this scorer, but not announce it in any way. #ByDesign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
totally agreed. The right way to go is to implement estimator API. This is scheduled for 0.9. This particular change is a staged change that doesn't need to get announced.
In reply to: 236428108 [](ancestors = 236428108)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great feedback and a good plan. Getting this in 0.8 with the current design would let us address issues like #1675 without needing to wait for the refactoring to be finished. #Resolved
I've added a sample showing how this can be used. #Resolved |
} | ||
} | ||
|
||
private sealed class WtfSchema : ISchema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wtf [](start = 29, length = 3)
I would suggest to search "Wtf" in this code, and replace it with something else. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest FeatureContribution
. #Resolved
{ | ||
// REVIEW: checking this a bit too late. | ||
Host.Check(_whatTheFeature != null, "Predictor does not implement IWhatTheFeatureValueMapper"); | ||
return _whatTheFeature.GetWhatTheFeatureMapper<TSrc, TDst>(top, bottom, normalize); | ||
Host.Check(_featureContribution != null, "Predictor does not implement IFeatureContributionMapper"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IFeatureContributionMapper [](start = 83, length = 26)
nameof maybe? #Resolved
[Argument(ArgumentType.AtMostOnce, HelpText = "Number of bottom contributions", SortOrder = 2)] | ||
public int Bottom = 10; | ||
|
||
[Argument(ArgumentType.AtMostOnce, HelpText = "Whether or not output of Features' contribution should be normalized", ShortName = "norm", SortOrder = 3)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
' [](start = 92, length = 1)
that apos is inconsistent between norm and stringify #Resolved
using Microsoft.ML.Runtime.Numeric; | ||
|
||
[assembly: LoadableClass(typeof(IDataScorerTransform), typeof(FeatureContributionCalculationTransform), typeof(FeatureContributionCalculationTransform.Arguments), | ||
typeof(SignatureDataScorer), "Feature Importance Scorer", "wtf", "FeatureImportanceCalculationScorer", MetadataUtils.Const.ScoreColumnKind.FeatureContribution)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"wtf" [](start = 62, length = 5)
I could be wrong, but I seem to recall that we would change the user facing names of these things. I think we ought to have SignatureLoadModel
remain the same for backwards compatibility. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Feature Importance Scorer"
, "wtf", "FeatureImportanceCalculationScorer"=>
"Feature Contribution Transform", "FeatureContributionCalculationTransform"` #Resolved
using Microsoft.ML.Runtime.Numeric; | ||
|
||
[assembly: LoadableClass(typeof(IDataScorerTransform), typeof(FeatureContributionCalculationTransform), typeof(FeatureContributionCalculationTransform.Arguments), | ||
typeof(SignatureDataScorer), "Feature Importance Scorer", "wtf", "FeatureImportanceCalculationScorer", MetadataUtils.Const.ScoreColumnKind.FeatureContribution)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wtf [](start = 63, length = 3)
I think @[email protected] is fine to get rid of "wtf" from command line and replace with something else like "FI" or "FIS" #Resolved
{ | ||
// Apparently, loader signature is limited in length to 24 characters. | ||
internal const string MapperLoaderSignature = "WTFBindable"; | ||
private const string LoaderSignature = "WTFScorer"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private const string LoaderSignature = "WTFScorer"; [](start = 8, length = 51)
So, what is this for? #Resolved
namespace Microsoft.ML.Runtime.Data | ||
{ | ||
/// <summary> | ||
/// The Feature Importance scorer is superset of a generic scorer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this doc string is not long enough. External users who reads this file might not know generic scorer and other concepts in ML.NET. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a basic description to the transform. As per @shauheen 's, advice, let's focus on the code now and finalize the doc strings after code complete. #Resolved
sb.Remove(sb.Length - 2, 2); | ||
} | ||
|
||
dst = new ReadOnlyMemory<char>(sb.ToString().ToCharArray()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new ReadOnlyMemory(sb.ToString().ToCharArray()); [](start = 30, length = 54)
sb.ToString.AsMemory() #Resolved
_env.Assert(count > index || count == 0 && slotNames.Length > index); | ||
var slotName = slotNames.GetItemOrDefault(index); | ||
return slotName.IsEmpty | ||
? new ReadOnlyMemory<char>($"f{index}".ToCharArray()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new ReadOnlyMemory($"f{index}".ToCharArray()) [](start = 22, length = 51)
$"f{index}".AsMemory() #Resolved
{ | ||
_env.Assert(sb.Length >= 2); | ||
sb.Remove(sb.Length - 2, 2); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a) why we doing this?
b) wouldn't it be easier to have if (sb.Length>=2)
condition? #WontFix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its not the same, assert here, so if its bigger than 0 then it should be bigger or equal than 2. Dont want to change this logic since this is how it is originally in internal repo
In reply to: 236893534 [](ancestors = 236893534)
map(in features, ref contributions); | ||
var indices = new Span<int>(); | ||
var values = new Span<float>(); | ||
(contributions.IsDense ? Utils.GetIdentityPermutation(contributions.Length).AsSpan() : contributions.GetIndices()).CopyTo(indices); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Utils.GetIdentityPermutation(contributions.Length).AsSpan() [](start = 49, length = 59)
what is going on here? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ganik
if (args.Top <= 0 || args.Top > MaxTopBottom) | ||
throw env.Except($"Number of top contribution must be in range (0,{MaxTopBottom}]"); | ||
if (args.Bottom <= 0 || args.Bottom > MaxTopBottom) | ||
throw env.Except($"Number of bottom contribution must be in range (0,{MaxTopBottom}]"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit, I would move this into a helper function. ValidateTopBottomArguments? #WontFix
|
||
public readonly IFeatureContributionMapper Predictor; | ||
public readonly ISchemaBindableMapper GenericMapper; | ||
public readonly bool Stringify; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit - I would group all the properties together (above VersionInfo). #Resolved
|
||
Func<IRow, int, VBuffer<ReadOnlyMemory<char>>, ValueGetter<ReadOnlyMemory<char>>> del = GetTextValueGetter<int>; | ||
var meth = del.GetMethodInfo().GetGenericMethodDefinition().MakeGenericMethod(typeSrc.RawType); | ||
return (Delegate)meth.Invoke(this, new object[] { input, colSrc, slotNames }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you can use Utils.MarshalInvoke here since there is only one generic argument that is needed in the function. #WontFix
/// ]]> | ||
/// </format> | ||
/// </example> | ||
public sealed class FeatureContributionCalculationTransform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im guessing there will need to be an estimator too at some point. What would that be called? Does that affect the name of the transform at all. Note for Estimators have the format of ItemEstimator (ColumnCopyingEstimator). #ByDesign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes there will be, name yet to be determined. Dont think name will be affected, also should be ok to change transform name if needed
In reply to: 236903151 [](ancestors = 236903151)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ | ||
featureGetter(ref features); | ||
map(in features, ref contributions); | ||
var indices = new Span<int>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is pretty wrong. You are making an empty span, and then trying to copy things into it (which certainly will throw an exception if the thing you are trying to copy has any elements).
Are we testing this code? I don't see how it could be working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the tests cover this part of the code?
fixes #1644