Skip to content

Remove model saving/loading inconsistencies #3044

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 22, 2019
Merged

Conversation

TomFinley
Copy link
Contributor

Fixes #3025.

/// <param name="stream">A writeable, seekable stream to save to.</param>
public void Save<TSource>(IDataLoader<TSource> model, Stream stream)
public void Save<TSource>(ITransformer model, IDataLoader<TSource> loader, Stream stream)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Major things on the top of my mind, the save/load methods are still inconsistent w.r.t. where they put the file or stream argument... here we see, it is the last argument, whereas during loading it is the first argument.

One possibility is we leave it as is, since in this case it is where the model is going (so maybe last is most clear), whereas during loading it is where the model is coming from.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of like this possibility.
If we're going with the "intuitive" ordering, then maybe loader should come before model?


In reply to: 267790407 [](ancestors = 267790407)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I considered it. Actually intuitive cuts two ways here, since this Save method interacts with intellisense in a way that I feel makes this arrangement most natural. But I don't insist on it.

/// <param name="filePath">Path where model should be saved.</param>
public void Save<TSource>(IDataLoader<TSource> loader, ITransformer model, string filePath)
public void Save<TSource>(ITransformer model, IDataLoader<TSource> loader, string filePath)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also curious about whether we agree that what is effectively the descriptor of input to the model should come after the model itself. The API had been inconsistent, in that in one save method we had loader/model (so input to model came first), whereas in another we had model/schema (so input to model came second).

In all cases I feel like the load methods should always just return ITransform, so maybe just always putting ITransform
front-and-center for saving was probably the best approach, so I resolved the inconsistency that way, but I don't insist on it.

I kept it as a separate commit in case we wanted to go the other way, maybe... so in that case, the schema.

@TomFinley TomFinley requested a review from codemzs March 21, 2019 14:47
@TomFinley TomFinley changed the title Model Remove model saving/loading inconsistencies Mar 21, 2019
@codecov
Copy link

codecov bot commented Mar 21, 2019

Codecov Report

Merging #3044 into master will decrease coverage by <.01%.
The diff coverage is 96.21%.

@@            Coverage Diff             @@
##           master    #3044      +/-   ##
==========================================
- Coverage   72.52%   72.51%   -0.01%     
==========================================
  Files         804      804              
  Lines      144150   144143       -7     
  Branches    16179    16176       -3     
==========================================
- Hits       104542   104530      -12     
- Misses      35195    35203       +8     
+ Partials     4413     4410       -3
Flag Coverage Δ
#Debug 72.51% <96.21%> (-0.01%) ⬇️
#production 68.15% <85.71%> (-0.01%) ⬇️
#test 88.73% <100%> (ø) ⬆️
Impacted Files Coverage Δ
test/Microsoft.ML.Functional.Tests/ModelLoading.cs 95.19% <100%> (+1.49%) ⬆️
...s/Scenarios/Api/CookbookSamples/CookbookSamples.cs 99.49% <100%> (-0.01%) ⬇️
...enarios/Api/Estimators/TrainSaveModelAndPredict.cs 95.23% <100%> (-0.22%) ⬇️
...s/Api/CookbookSamples/CookbookSamplesDynamicApi.cs 93.49% <100%> (-0.05%) ⬇️
.../Microsoft.ML.Data/Model/ModelOperationsCatalog.cs 91.73% <85.71%> (-5.54%) ⬇️
src/Microsoft.ML.Transforms/Text/LdaTransform.cs 89.26% <0%> (-0.63%) ⬇️
...Microsoft.ML.Data/DataLoadSave/TransformerChain.cs 87.66% <0%> (+0.64%) ⬆️
...rosoft.ML.Data/DataLoadSave/CompositeDataLoader.cs 91.66% <0%> (+2.08%) ⬆️

@glebuk
Copy link
Contributor

glebuk commented Mar 21, 2019

Should we also update ConvertToOnnx methods in the same way? #WontFix

@@ -16,7 +16,7 @@

namespace Microsoft.ML.Functional.Tests
{
public partial class ModelLoadingTests : BaseTestClass
public partial class ModelLoadingTests : TestDataPipeBase
{
private MLContext _ml;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_ml [](start = 26, length = 3)

TestDataPipeBase may have an MLContext field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good point thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have two commits in my next push -- one to resolve other comments, and one to do this rename, since it winds up being a little annoying and distracting in this file.

Copy link
Contributor Author

@TomFinley TomFinley Mar 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, no sorry, I don't have time to do this right now and need to run to another meeting, so I'll just leave as is and fix later (since I think other more "public facing" changes might be blocking the PR more than this). I'll do it later this evening, if I have time and approvals aren't forthcoming. (If I do get approvals I'll probably just check this in, since I'd view that as more useful, probably?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK! I have finally done this, sorry for not doing this earlier @yaeldekel but it was one of these situations where i had literally 30 seconds before I had to go somewhere and the change would take me 1 minute. 😄 But I just got that 1 minute.

var t = _ml.Model.LoadWithDataLoader(fs, out IDataLoader<IMultiStreamSource> l);
var t = _ml.Model.LoadWithDataLoader(fs, out loadedCompositeLoader);
// This is a bit strange, as it seems to test that it can reload from the same
// stream twice opened only once, which as far as I know is not really a requirement

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also tests that you can use the same file to load using both LoadWithDataLoader and Load (which loads with the schema). Maybe we should open a separate stream for that then, to make sure that if it fails we know that it's a real problem and not just a problem with reading twice from the same stream?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, we already have overloads for loading that take file names, don't we?


In reply to: 267891929 [](ancestors = 267891929)

Copy link
Contributor Author

@TomFinley TomFinley Mar 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! I changed some tests to use those so both paths are tested, but I agree this seems like a fine place to use them.

Copy link
Contributor Author

@TomFinley TomFinley Mar 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Um, sorry, nevermind -- this is a little annoying dealing with all the local variables and changing of scope. Let's just leave it for now... if this was a sample I'd probably agree (and I hope you see I did change some of those), but since it's just test code, I think it's fine...

@TomFinley
Copy link
Contributor Author

TomFinley commented Mar 21, 2019

Should we also update ConvertToOnnx methods in the same way?

Hi @glebuk. Definitely not in scope, and possibly more involved than you might think. Anyway, ONNX is one of the nugets to be released in preview mode. #Closed

/// <param name="loader">The loader that was used to create data to train the model</param>
/// <param name="model">The trained model to be saved</param>
/// <param name="model">The trained model to be saved. Note that this can be <see langword="null"/>, as a shorthand
/// for an empty data set. Upon loading with <see cref="LoadWithDataLoader(Stream, out IDataLoader{IMultiStreamSource})"/>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data set [](start = 25, length = 8)

Oops. "transformer chain" here too? And in the other two overloads of Save too.

Copy link
Contributor Author

@TomFinley TomFinley Mar 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D'oh! Who would have guessed that when you have documentation duplicated in multiple places, if you fix it one place you should fix it all places? Not me for sure! (Yet strangely this did occur to me here.)

Copy link

@yaeldekel yaeldekel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@TomFinley TomFinley merged commit 75fc055 into dotnet:master Mar 22, 2019
@TomFinley TomFinley deleted the Model branch March 26, 2019 16:35
@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants