-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Remove model saving/loading inconsistencies #3044
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/// <param name="stream">A writeable, seekable stream to save to.</param> | ||
public void Save<TSource>(IDataLoader<TSource> model, Stream stream) | ||
public void Save<TSource>(ITransformer model, IDataLoader<TSource> loader, Stream stream) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Major things on the top of my mind, the save/load methods are still inconsistent w.r.t. where they put the file or stream argument... here we see, it is the last argument, whereas during loading it is the first argument.
One possibility is we leave it as is, since in this case it is where the model is going (so maybe last is most clear), whereas during loading it is where the model is coming from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of like this possibility.
If we're going with the "intuitive" ordering, then maybe loader
should come before model
?
In reply to: 267790407 [](ancestors = 267790407)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I considered it. Actually intuitive cuts two ways here, since this Save
method interacts with intellisense in a way that I feel makes this arrangement most natural. But I don't insist on it.
/// <param name="filePath">Path where model should be saved.</param> | ||
public void Save<TSource>(IDataLoader<TSource> loader, ITransformer model, string filePath) | ||
public void Save<TSource>(ITransformer model, IDataLoader<TSource> loader, string filePath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also curious about whether we agree that what is effectively the descriptor of input to the model should come after the model itself. The API had been inconsistent, in that in one save method we had loader/model (so input to model came first), whereas in another we had model/schema (so input to model came second).
In all cases I feel like the load methods should always just return ITransform
, so maybe just always putting ITransform
front-and-center for saving was probably the best approach, so I resolved the inconsistency that way, but I don't insist on it.
I kept it as a separate commit in case we wanted to go the other way, maybe... so in that case, the schema.
Codecov Report
@@ Coverage Diff @@
## master #3044 +/- ##
==========================================
- Coverage 72.52% 72.51% -0.01%
==========================================
Files 804 804
Lines 144150 144143 -7
Branches 16179 16176 -3
==========================================
- Hits 104542 104530 -12
- Misses 35195 35203 +8
+ Partials 4413 4410 -3
|
Should we also update ConvertToOnnx methods in the same way? #WontFix |
@@ -16,7 +16,7 @@ | |||
|
|||
namespace Microsoft.ML.Functional.Tests | |||
{ | |||
public partial class ModelLoadingTests : BaseTestClass | |||
public partial class ModelLoadingTests : TestDataPipeBase | |||
{ | |||
private MLContext _ml; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_ml [](start = 26, length = 3)
TestDataPipeBase
may have an MLContext
field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good point thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will have two commits in my next push -- one to resolve other comments, and one to do this rename, since it winds up being a little annoying and distracting in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, no sorry, I don't have time to do this right now and need to run to another meeting, so I'll just leave as is and fix later (since I think other more "public facing" changes might be blocking the PR more than this). I'll do it later this evening, if I have time and approvals aren't forthcoming. (If I do get approvals I'll probably just check this in, since I'd view that as more useful, probably?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK! I have finally done this, sorry for not doing this earlier @yaeldekel but it was one of these situations where i had literally 30 seconds before I had to go somewhere and the change would take me 1 minute. 😄 But I just got that 1 minute.
var t = _ml.Model.LoadWithDataLoader(fs, out IDataLoader<IMultiStreamSource> l); | ||
var t = _ml.Model.LoadWithDataLoader(fs, out loadedCompositeLoader); | ||
// This is a bit strange, as it seems to test that it can reload from the same | ||
// stream twice opened only once, which as far as I know is not really a requirement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also tests that you can use the same file to load using both LoadWithDataLoader
and Load
(which loads with the schema). Maybe we should open a separate stream for that then, to make sure that if it fails we know that it's a real problem and not just a problem with reading twice from the same stream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, we already have overloads for loading that take file names, don't we?
In reply to: 267891929 [](ancestors = 267891929)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! I changed some tests to use those so both paths are tested, but I agree this seems like a fine place to use them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Um, sorry, nevermind -- this is a little annoying dealing with all the local variables and changing of scope. Let's just leave it for now... if this was a sample I'd probably agree (and I hope you see I did change some of those), but since it's just test code, I think it's fine...
Hi @glebuk. Definitely not in scope, and possibly more involved than you might think. Anyway, ONNX is one of the nugets to be released in preview mode. #Closed |
/// <param name="loader">The loader that was used to create data to train the model</param> | ||
/// <param name="model">The trained model to be saved</param> | ||
/// <param name="model">The trained model to be saved. Note that this can be <see langword="null"/>, as a shorthand | ||
/// for an empty data set. Upon loading with <see cref="LoadWithDataLoader(Stream, out IDataLoader{IMultiStreamSource})"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data set [](start = 25, length = 8)
Oops. "transformer chain" here too? And in the other two overloads of Save
too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
D'oh! Who would have guessed that when you have documentation duplicated in multiple places, if you fix it one place you should fix it all places? Not me for sure! (Yet strangely this did occur to me here.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…h ITransformer always first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixes #3025.