-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Update Permutation Feature Importance Samples #3247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3247 +/- ##
==========================================
- Coverage 72.62% 72.62% -0.01%
==========================================
Files 807 807
Lines 145080 145080
Branches 16213 16213
==========================================
- Hits 105369 105368 -1
Misses 35294 35294
- Partials 4417 4418 +1
|
using System.Collections.Generic; | ||
using System.Linq; | ||
|
||
namespace Microsoft.ML.Samples.Dynamic.Trainers.BinaryClassification |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we tend to have namespace Samples.Dynamic
#Resolved
// Fit the pipeline to the data. | ||
var model = pipeline.Fit(data); | ||
|
||
// Compute the permutation metrics for the linear model using the normalized data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Compute the permutation metrics for the linear model using the normalized data. [](start = 12, length = 82)
put it before var permutationMetrics =
? #Resolved
Data.FeatureColumns[i], | ||
linearPredictor.Model.SubModel.Weights[i], | ||
auc[i].Mean, | ||
1.96 * auc[i].StandardError); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1.96 * auc[i].StandardError [](start = 20, length = 27)
const number magic! I'm scared and confused! #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Magic number to convert the interval to the 95%-percentile.
In reply to: 273720232 [](ancestors = 273720232)
// Expected output: | ||
// Feature Model Weight Change in RMSE 95% Confidence in the Mean Change in RMSE | ||
// Feature2 9.00 4.009 0.008304 | ||
// Feature1 4.48 1.901 0.003351 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would align Expected output and Feature.
One space instead of two.
Same across all files. #ByDesign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two spaces for printed verbatim output is the style across the Samples project.
In reply to: 273720618 [](ancestors = 273720618)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
private static double Sigmoid(double x) | ||
{ | ||
return 1.0 / (1.0 + Math.Exp(-1 * x)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you could remove the braces, since it's just one line, and use =>
public static readonly string[] FeatureColumns = new string[] { nameof(Feature1), nameof(Feature2) }; | ||
|
||
public static IEnumerable<Data> GenerateData(int nExamples = 10000, | ||
double bias = 0, double weight1 = 1, double weight2 = 2, int seed = 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a one line explanation of how you are generating the data?
|
||
public float Feature1 { get; set; } | ||
|
||
public float Feature2 { get; set; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually the class that defines the data does not do anything else. I like that pattern, because it makes it easy for users to understand what is happening.
I would suggest to move the GenerateData
method outside of the Data class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, please do address the comments about the data generation code.
* Updating PFI Docs. (cherry picked from commit e0c029c)
This PR updates the samples for
PermutationFeatureImportance
and adds samples forMulticlassClassification
andRanking
tasks.Fixes #3242