-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Add V1 Introspective Training Tests #2859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add V1 Introspective Training Tests #2859
Conversation
6d3fdb6
to
55e7966
Compare
@@ -9,11 +9,17 @@ | |||
using Microsoft.ML.Trainers.FastTree; | |||
using Microsoft.ML.Trainers; | |||
using Xunit; | |||
using Microsoft.ML.Functional.Tests.Datasets; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sort usings #Resolved
/// Verify that a numerical array has no NaNs or infinities. | ||
/// </summary> | ||
/// <param name="array">An array of doubles.</param> | ||
public static void AssertFiniteNumbers(double[] array, int ignoreElementAt = -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AssertFiniteNumbers [](start = 27, length = 19)
Where is this function being used? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. I put it in Common because I imagine that I'll use it again. Although ignoreElementAt
is definitely a binning-only kind of thing.
In reply to: 262705673 [](ancestors = 262705673,262695483)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
|
||
/// <summary> | ||
/// I can take an existing model file and inspect what transformers were included in the pipeline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can take an existing model file [](start = 12, length = 33)
You are not taking a model file. You are constructing the pipeline in the test. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I am updating the summary. I changed this test to just look at pipelines, and not necessarily at serialization / deserialization. There will be model-file-specific tests that test serialization and deserialization, so I decided to not test that here.
In reply to: 262709151 [](ancestors = 262709151)
var column = currentSchema.GetColumnOrNull(expectedColumn); | ||
Assert.Null(column); | ||
} | ||
i++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems a bit complex and overkill. We only have two transforms in the chain, so this will run for the first transform and will check that the outputschema does not contain Score
. #Resolved
// Transform the data. | ||
var transformedData = model.Transform(data); | ||
|
||
// Verify that the slotnames cane be used to backtrack by confirming that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can #Resolved
} | ||
|
||
[Fact] | ||
public void InspectNestedPipeline() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InspectNestedPipeline [](start = 20, length = 21)
Missing summary. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After you address the comments I think it's ready to go!
var model = pipeline.Fit(data); | ||
|
||
// Extract the normalizer from the trained pipeline. | ||
// TODO #2854: Extract the normalizer parameters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2854 [](start = 21, length = 4)
See issue, and sample on normalizers I think we can extract the parameters. #Resolved
public float HoursPerWeek { get; set; } | ||
|
||
/// <summary> | ||
/// The list of columns commonly used as numerical features. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// The list of columns commonly used as numerical features. | |
/// The list of columns commonly used as categorical features. | |
``` #Resolved |
29371f4
to
4f7d8f5
Compare
Codecov Report
@@ Coverage Diff @@
## master #2859 +/- ##
=========================================
Coverage ? 71.72%
=========================================
Files ? 812
Lines ? 142678
Branches ? 16124
=========================================
Hits ? 102330
Misses ? 35936
Partials ? 4412
|
This PR adds tests to cover the Introspective Training scenarios we want fully supported in V1.
I can take an existing model file and inspect what transformers were included in the pipeline
I can inspect the coefficients (weights and bias) of a linear model without much work. Easy to find via auto-complete.
I can inspect the normalization coefficients of a normalizer in my pipeline without much work. Easy to find via auto-complete.
I can inspect the trees of a boosted decision tree model without much work. Easy to find via auto-complete.
I can inspect the topics after training an LDA transform. Easy to find via auto-complete.
I can inspect a categorical transform and see which feature values map to which key values. Easy to find via auto-complete.
P1: I can access the GAM feature histograms through APIs
Fixes: #2498