-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Add API to save/load models with their input schema #2850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…neric on IPredictorProducing<>.
… to save/load it in ModelOperationsCatalog.
@@ -58,19 +57,19 @@ | |||
"Naive Calibration Executor", | |||
NaiveCalibrator.LoaderSignature)] | |||
|
|||
[assembly: LoadableClass(typeof(ValueMapperCalibratedModelParameters<IPredictorProducing<float>, ICalibrator>), null, typeof(SignatureLoadModel), | |||
[assembly: LoadableClass(typeof(ValueMapperCalibratedModelParameters<object, ICalibrator>), null, typeof(SignatureLoadModel), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
object [](start = 69, length = 6)
This is a bit stronger medicine than what I was suggesting! (I don't object per se, but it did entail a bit more work.) I suppose it's a bit more "honest" than the old way -- marker interfaces are more or less a promise you can't keep -- but they are still useful on internal code where you have complete control over the implementation (as we do as they are internal).
So I think this should be fine, I just wonder, are we fully comfortable with this? #Resolved
@@ -45,13 +45,34 @@ protected SubCatalogBase(ModelOperationsCatalog owner) | |||
/// <param name="stream">A writeable, seekable stream to save to.</param> | |||
public void Save(ITransformer model, Stream stream) => model.SaveTo(Environment, stream); | |||
|
|||
public void Save<TSource>(IDataLoader<TSource> model, Stream stream) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need xml comments for new public API. #Resolved
/// <summary> | ||
/// Load the model from the stream. | ||
/// </summary> | ||
/// <param name="stream">A readable, seekable stream to load from.</param> | ||
/// <returns>The loaded model.</returns> | ||
public ITransformer Load(Stream stream) => TransformerChain.LoadFrom(Environment, stream); | ||
|
||
public IDataLoader<IMultiStreamSource> LoadAsCompositeDataLoader(Stream stream) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IDataLoader [](start = 15, length = 11)
The subtle differences between usage here and usage of the above might be a bit hard of a pill to swallow. #Resolved
/// <summary> | ||
/// Load the model from the stream. | ||
/// </summary> | ||
/// <param name="stream">A readable, seekable stream to load from.</param> | ||
/// <returns>The loaded model.</returns> | ||
public ITransformer Load(Stream stream) => TransformerChain.LoadFrom(Environment, stream); | ||
|
||
public IDataLoader<IMultiStreamSource> LoadAsCompositeDataLoader(Stream stream) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this return CompositeDataLoader
instead of IDataLoader
? The name is: LoadAsCompositeDataLoader
, which makes me think it should return a CompositeDataLoader
#Resolved
@@ -45,13 +45,34 @@ protected SubCatalogBase(ModelOperationsCatalog owner) | |||
/// <param name="stream">A writeable, seekable stream to save to.</param> | |||
public void Save(ITransformer model, Stream stream) => model.SaveTo(Environment, stream); | |||
|
|||
public void Save<TSource>(IDataLoader<TSource> model, Stream stream) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This I think this is not correct... we need to save both the loader (or failing that, input schema) as well as the transform model to a stream. Not two separate things. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recall the mistake that was made with predictor models and transform models in entry-points, where we viewed these as single things saved to a stream, as opposed to having to capture the entire pipeline. This mistake was inherited by ML.NET v0.1, which led to all sorts of grief when people realized, yes, saving the loader is really freakin' important. So I think we're just making the same mistake again. A model isn't just a transform model, or a loader, or an input schema, it I think needs to capture all of them, and I think we need a change a bit more fundamental than merely adding a few methods here and there. We could have done that after v1.0. I think the problem is that the API here is just misleading.
In reply to: 262638162 [](ancestors = 262638162)
Were we going to try to address the problem of saving schemas? Not all data sources we train a transform chain with will be coming from a loader. #Resolved |
Closing this PR and replacing it with #2858 that was created from the correct fork. #Resolved |
Fixes #2735.
Currently, this PR adds a public API that saves and load an IDataLoader. Later iterations will add APIs to save/load the schema directly.