Skip to content

Add V1 Introspective Training Tests #2859

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

rogancarr
Copy link
Contributor

This PR adds tests to cover the Introspective Training scenarios we want fully supported in V1.

I can take an existing model file and inspect what transformers were included in the pipeline
I can inspect the coefficients (weights and bias) of a linear model without much work. Easy to find via auto-complete.
I can inspect the normalization coefficients of a normalizer in my pipeline without much work. Easy to find via auto-complete.
I can inspect the trees of a boosted decision tree model without much work. Easy to find via auto-complete.
I can inspect the topics after training an LDA transform. Easy to find via auto-complete.
I can inspect a categorical transform and see which feature values map to which key values. Easy to find via auto-complete.
P1: I can access the GAM feature histograms through APIs

Fixes: #2498

@rogancarr rogancarr requested review from artidoro and sfilipi March 5, 2019 20:54
@rogancarr rogancarr force-pushed the 2817_introspective_training_scenarios branch from 6d3fdb6 to 55e7966 Compare March 5, 2019 21:15
@@ -9,11 +9,17 @@
using Microsoft.ML.Trainers.FastTree;
using Microsoft.ML.Trainers;
using Xunit;
using Microsoft.ML.Functional.Tests.Datasets;
Copy link
Member

@singlis singlis Mar 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort usings #Resolved

/// Verify that a numerical array has no NaNs or infinities.
/// </summary>
/// <param name="array">An array of doubles.</param>
public static void AssertFiniteNumbers(double[] array, int ignoreElementAt = -1)
Copy link
Member

@singlis singlis Mar 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AssertFiniteNumbers [](start = 27, length = 19)

Where is this function being used? #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used here IntrospectGamShapeFunctions


In reply to: 262695483 [](ancestors = 262695483)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. I put it in Common because I imagine that I'll use it again. Although ignoreElementAt is definitely a binning-only kind of thing.


In reply to: 262705673 [](ancestors = 262705673,262695483)

Copy link
Member

@singlis singlis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

}

/// <summary>
/// I can take an existing model file and inspect what transformers were included in the pipeline.
Copy link
Contributor

@artidoro artidoro Mar 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can take an existing model file [](start = 12, length = 33)

You are not taking a model file. You are constructing the pipeline in the test. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I am updating the summary. I changed this test to just look at pipelines, and not necessarily at serialization / deserialization. There will be model-file-specific tests that test serialization and deserialization, so I decided to not test that here.


In reply to: 262709151 [](ancestors = 262709151)

var column = currentSchema.GetColumnOrNull(expectedColumn);
Assert.Null(column);
}
i++;
Copy link
Contributor

@artidoro artidoro Mar 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems a bit complex and overkill. We only have two transforms in the chain, so this will run for the first transform and will check that the outputschema does not contain Score. #Resolved

// Transform the data.
var transformedData = model.Transform(data);

// Verify that the slotnames cane be used to backtrack by confirming that
Copy link
Contributor

@artidoro artidoro Mar 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can #Resolved

}

[Fact]
public void InspectNestedPipeline()
Copy link
Contributor

@artidoro artidoro Mar 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InspectNestedPipeline [](start = 20, length = 21)

Missing summary. #Resolved

Copy link
Contributor

@artidoro artidoro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After you address the comments I think it's ready to go!

var model = pipeline.Fit(data);

// Extract the normalizer from the trained pipeline.
// TODO #2854: Extract the normalizer parameters.
Copy link
Contributor

@artidoro artidoro Mar 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2854 [](start = 21, length = 4)

See issue, and sample on normalizers I think we can extract the parameters. #Resolved

public float HoursPerWeek { get; set; }

/// <summary>
/// The list of columns commonly used as numerical features.
Copy link
Member

@wschin wschin Mar 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// The list of columns commonly used as numerical features.
/// The list of columns commonly used as categorical features.
``` #Resolved

@rogancarr rogancarr force-pushed the 2817_introspective_training_scenarios branch from 29371f4 to 4f7d8f5 Compare March 6, 2019 20:50
@codecov
Copy link

codecov bot commented Mar 6, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@0075757). Click here to learn what that means.
The diff coverage is 99.65%.

@@            Coverage Diff            @@
##             master    #2859   +/-   ##
=========================================
  Coverage          ?   71.72%           
=========================================
  Files             ?      812           
  Lines             ?   142678           
  Branches          ?    16124           
=========================================
  Hits              ?   102330           
  Misses            ?    35936           
  Partials          ?     4412
Flag Coverage Δ
#Debug 71.72% <99.65%> (?)
#production 67.9% <ø> (?)
#test 85.99% <99.65%> (?)
Impacted Files Coverage Δ
...osoft.ML.Functional.Tests/IntrospectiveTraining.cs 100% <100%> (ø)
...st/Microsoft.ML.Functional.Tests/Datasets/Adult.cs 100% <100%> (ø)
test/Microsoft.ML.TestFramework/Datasets.cs 100% <100%> (ø)
test/Microsoft.ML.Functional.Tests/Evaluation.cs 100% <100%> (ø)
test/Microsoft.ML.Functional.Tests/Validation.cs 100% <100%> (ø)
test/Microsoft.ML.Functional.Tests/Common.cs 98.06% <94.44%> (ø)

@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants