Skip to content

Create model file V1 scenario tests #2899

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Mar 25, 2019

Conversation

rogancarr
Copy link
Contributor

As laid out in #2498 , we need scenarios to cover the Model Files functionality we want fully supported in V1.

This PR adds tests for the following scenarios:

  • I can train a model and save it as a file. This model includes the learner as well as the transforms
  • I can use a model file in a completely different process to make predictions
  • I can easily figure out which NuGets (and versions) I need to score an ML.NET model
  • I can export ML.NET models to ONNX (limited to the existing internal functionality)

Fixes #2896

@codecov
Copy link

codecov bot commented Mar 9, 2019

Codecov Report

Merging #2899 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #2899      +/-   ##
==========================================
- Coverage   72.53%   72.52%   -0.01%     
==========================================
  Files         805      806       +1     
  Lines      144243   144500     +257     
  Branches    16175    16191      +16     
==========================================
+ Hits       104620   104800     +180     
- Misses      35222    35292      +70     
- Partials     4401     4408       +7
Flag Coverage Δ
#Debug 72.52% <100%> (-0.01%) ⬇️
#production 68.13% <ø> (-0.02%) ⬇️
#test 88.78% <100%> (+0.02%) ⬆️
Impacted Files Coverage Δ
...soft.ML.Functional.Tests/Datasets/CommonColumns.cs 100% <ø> (ø) ⬆️
test/Microsoft.ML.Functional.Tests/ModelFiles.cs 96.07% <100%> (ø)
...crosoft.ML.StaticPipe/EvaluatorStaticExtensions.cs 86.15% <0%> (-13.85%) ⬇️
.../Microsoft.ML.Data/Model/ModelOperationsCatalog.cs 88.42% <0%> (-3.31%) ⬇️
...ft.ML.Data/Evaluators/BinaryClassifierEvaluator.cs 77.19% <0%> (-2.09%) ⬇️
src/Microsoft.ML.Data/TrainCatalog.cs 82.91% <0%> (-1.28%) ⬇️
...soft.ML.Data/DataLoadSave/DataOperationsCatalog.cs 72.92% <0%> (-0.32%) ⬇️
...StandardTrainers/Standard/LinearModelParameters.cs 60.05% <0%> (-0.27%) ⬇️
...soft.ML.Data/DataLoadSave/Text/TextLoaderCursor.cs 84.7% <0%> (-0.21%) ⬇️
...soft.ML.Data/DataView/DataViewConstructionUtils.cs 85.09% <0%> (-0.18%) ⬇️
... and 21 more

@rogancarr
Copy link
Contributor Author

Waiting for #2858 to be checked in; will incorporate those tests.

internal sealed class ScoreColumn
{
public float Score { get; set; }
}
Copy link
Member

@sfilipi sfilipi Mar 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this go somewhere, not on a separate file? Where the other data models are? There might even be one already. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I've consolidated all the single-column classes into one file. I had been following the Apache REEF style by habit.


In reply to: 265712606 [](ancestors = 265712606)

public void DetermineNugetVersionFromModel()
{
var modelFile = GetDataPath(@"backcompat" + Path.DirectorySeparatorChar + @"keep-model.zip");
var versionFileName = @"TrainingInfo\Version.txt"; // Can't find this cross plat.
Copy link
Member

@sfilipi sfilipi Mar 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\ [](start = 48, length = 1)

Path.DirectorySeparatorChar? #Resolved

Copy link
Contributor Author

@rogancarr rogancarr Mar 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strangely enough, since the zip archive was made on Windows, we need to specify a '\' here or the tests will fail on Linux and Mac. I've updated this cryptic comment to explain it better.


In reply to: 265712860 [](ancestors = 265712860)

[Fact]
public void DetermineNugetVersionFromModel()
{
var modelFile = GetDataPath(@"backcompat" + Path.DirectorySeparatorChar + @"keep-model.zip");
Copy link
Member

@sfilipi sfilipi Mar 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ [](start = 40, length = 1)

don't think you need the @

you could also go with interpoaltion:

$"backcompat{Path.DirectorySeparatorChar}keep-model.zip" #Resolved

@rogancarr rogancarr requested review from sfilipi and yaeldekel March 22, 2019 20:52

var modelPath = DeleteOutputPath("fitPipelineSaveModelAndPredict.zip");
// Save model to a file.
using (var file = File.Create(modelPath))
Copy link

@yaeldekel yaeldekel Mar 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create [](start = 35, length = 6)

There are overloads that take the file path, would this be an appropriate place to use these? #Resolved

/// </summary>
internal sealed class FeatureContributionOutput
{
public float[] FeatureContributions { get; set; }
}

/// <summary>
/// A class to hold the Score column.
/// A class to hold a feature column.
Copy link

@yaeldekel yaeldekel Mar 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feature [](start = 26, length = 7)

Is it a feature column or a score column? #Resolved

// Load model from a file.
ITransformer serializedModel;
using (var file = File.OpenRead(modelPath))
serializedModel = mlContext.Model.Load(file, out var serializedSchema);
Copy link

@yaeldekel yaeldekel Mar 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serializedSchema [](start = 69, length = 16)

You can verify that this is the same as data.Schema. #Resolved

Copy link

@yaeldekel yaeldekel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@@ -30,6 +32,84 @@ private class InputData
public float[] Features { get; set; }
}

/// <summary>
/// Model Files: The (minimum) nuget version can be found in the model file.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model file [](start = 73, length = 10)

Is this an old model file or a current one? If it's current, shouldn't we create the model on the fly (similar to the other scenario below) instead of reading a static keep-model.zip file?

/// 1. I can train a model and save it to a file, including transforms.
/// 2. Training and prediction happen in different processes (or even different machines).
/// The actual test will not run in different processes, but will simulate the idea that the
/// "communication pipe" is just a serialized model of some form.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not necessary for tests files, but as FYI lists need to be in xml style as well.

Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@rogancarr rogancarr merged commit f342403 into dotnet:master Mar 25, 2019
@rogancarr rogancarr deleted the 2896_model_file_scenarios branch March 25, 2019 18:58
@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants