-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Explainability doc #2901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explainability doc #2901
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2901 +/- ##
==========================================
+ Coverage 72.7% 72.71% +0.01%
==========================================
Files 807 807
Lines 145172 145301 +129
Branches 16225 16227 +2
==========================================
+ Hits 105541 105662 +121
- Misses 35217 35223 +6
- Partials 4414 4416 +2
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the good work @jwood803!
docs/code/ExplainabilityCookBook.md
Outdated
|
||
All of these samples will use the [housing data](https://github.com/dotnet/machinelearning/blob/master/test/data/housing.txt) and will reference the below data schema class and pipeline. | ||
|
||
```csharp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should try to make sure that the cookbooks code compiles and runs properly even with the ongoing changes.
The way we do this is by adding all the code and utilities in a test in https://github.com/dotnet/machinelearning/blob/master/test/Microsoft.ML.Tests/Scenarios/Api/CookbookSamples/CookbookSamplesDynamicApi.cs
We then copy the most important parts of the CookbookSamplesDynamicApi.cs to the .md file.
If you could do that it would be great!
docs/code/ExplainabilityCookBook.md
Outdated
|
||
MLContext context = new MLContext(); | ||
|
||
IDataView data = context.Data.LoadFromTextFile("./housing.txt", new[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this code for loading and specifying the data class would be necessary in the .cs file to run your tests. However, I think we should not add it to the .md file. We have other sections of the cookbook and samples in which we illustrate that.
docs/code/ExplainabilityCookBook.md
Outdated
var model = pipeline.Fit(data); | ||
``` | ||
|
||
## How do I look at the global feature importance? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest not to write these under a separate cookbook, but rather to add them to the main MlNetCookBook.md under a separate model explainability section after the model inspection section: "I want to look at my model's coefficients".
docs/code/MlNetCookBook.md
Outdated
@@ -578,6 +578,48 @@ var biases = modelParameters.GetBiases(); | |||
|
|||
``` | |||
|
|||
## How do I look at the global feature importance? | |||
The below snippet shows how to get a glimpse of the the feature importance, or how much each column of data impacts the performance of the model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
column of data [](start = 93, length = 14)
"feature" rather than "column of data". The end features in the model might not be exactly the input columns. #Resolved
docs/code/MlNetCookBook.md
Outdated
|
||
foreach (var metricsStatistics in featureImportance) | ||
{ | ||
Console.WriteLine($"Root Mean Squared - {metricsStatistics.Rms.Mean}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Console.WriteLine($"Root Mean Squared - {metricsStatistics.Rms.Mean}"); [](start = 4, length = 71)
Explain a bit above about what this is calculating. It's not giving the RMS, but the difference in RMS for each feature if the feature were to be replaced with a random value.
Also, I would print "Feature I: Difference in RMS" rather than just the RMS.
docs/code/MlNetCookBook.md
Outdated
} | ||
``` | ||
|
||
## How do I get a model's weights to look at the global feature importance? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this above PFI, as it's the most naïve way we have to ask this question. #Resolved
docs/code/MlNetCookBook.md
Outdated
``` | ||
|
||
## How do I get a model's weights to look at the global feature importance? | ||
The below snippet shows how to get a model's weights to help determine the feature importance of the model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The below [](start = 0, length = 9)
Note that for a linear model, the weights are only an approximation. It helps to standardize the variables before the fit, so that they are all on the same scale, and even then, the linear regression solution does not account for correlations between the variables, and therefore this isn't a great measure of explainability.
docs/code/MlNetCookBook.md
Outdated
var linearModel = model.LastTransformer.Model; | ||
|
||
var weights = new VBuffer<float>(); | ||
linearModel.GetFeatureWeights(ref weights); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add for trees as well -- see the functional tests.
docs/code/MlNetCookBook.md
Outdated
linearModel.GetFeatureWeights(ref weights); | ||
``` | ||
|
||
## How do I look at the feature importance per row? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Local feature importance"
"row" => "example" (the language we've shifted to using) #Resolved
docs/code/MlNetCookBook.md
Outdated
|
||
var shuffledSubset = context.Data.TakeRows(context.Data.ShuffleRows(featureContributionData), 10); | ||
|
||
var preview = shuffledSubset.Preview(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
manually print the rows of data after casting to an enumerable. You can copy / paste from the FCC Sample.
This looks great! Just a few comments. |
docs/code/MlNetCookBook.md
Outdated
``` | ||
|
||
## How do I look at the global feature importance? | ||
The below snippet shows how to get a glimpse of the the feature importance, or how much each feature impacts the performance of the model. It also outputs the difference in root mean squared for each feature as though the feature were replaced with a random value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also outputs the difference in root mean squared for each feature as though the feature were replaced with a random value. [](start = 139, length = 125)
"Permutation Feature Importance works by computing the change in the evaluation metrics when each feature is replaced by a random value. In this case, we are investigating the change in the root mean squared error".
docs/code/MlNetCookBook.md
Outdated
|
||
foreach (var metricsStatistics in featureImportance) | ||
{ | ||
Console.WriteLine($"Feature I: Difference in RMS - {metricsStatistics.Rms.Mean}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I [](start = 32, length = 1)
Sorry, I meant the number of the feature, like 0, 1, 2, ... "Feature 0: " "Feature 1: " etc.
``` | ||
|
||
## How do I look at the local feature importance per example? | ||
The below snippet shows how to get feature importance for each example of data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feature importance for each example of data [](start = 35, length = 43)
Can you link to the appropriate place in docs for more information for all of these? Maybe we don't actually need to go into major details here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The best doc I could find is this one. Is this ok to link to in each of these sections or would there be a doc for each of these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I was thinking we could link to the code samples in the repo. But this is a moving target, so let's revisit later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! Apologies for the misunderstanding. Was there anything else I missed for the PR? Just making sure no one is waiting for me to make more updates 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial draft to add explainability documentation.
Fix for #2438.