Skip to content

Add API to save/load models with their input schema #2850

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

yaeldekel
Copy link

Fixes #2735.

Currently, this PR adds a public API that saves and load an IDataLoader. Later iterations will add APIs to save/load the schema directly.

@eerhardt
Copy link
Member

eerhardt commented Mar 5, 2019

    public void SaveTo(IHostEnvironment env, Stream outputStream)

Do we want this public API on CompositeDataLoader? #Resolved


Refers to: src/Microsoft.ML.Data/DataLoadSave/CompositeDataLoader.cs:104 in 2cf7ee4. [](commit_id = 2cf7ee4, deletion_comment = False)

@@ -58,19 +57,19 @@
"Naive Calibration Executor",
NaiveCalibrator.LoaderSignature)]

[assembly: LoadableClass(typeof(ValueMapperCalibratedModelParameters<IPredictorProducing<float>, ICalibrator>), null, typeof(SignatureLoadModel),
[assembly: LoadableClass(typeof(ValueMapperCalibratedModelParameters<object, ICalibrator>), null, typeof(SignatureLoadModel),
Copy link
Contributor

@TomFinley TomFinley Mar 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

object [](start = 69, length = 6)

This is a bit stronger medicine than what I was suggesting! (I don't object per se, but it did entail a bit more work.) I suppose it's a bit more "honest" than the old way -- marker interfaces are more or less a promise you can't keep -- but they are still useful on internal code where you have complete control over the implementation (as we do as they are internal).

So I think this should be fine, I just wonder, are we fully comfortable with this? #Resolved

@@ -45,13 +45,34 @@ protected SubCatalogBase(ModelOperationsCatalog owner)
/// <param name="stream">A writeable, seekable stream to save to.</param>
public void Save(ITransformer model, Stream stream) => model.SaveTo(Environment, stream);

public void Save<TSource>(IDataLoader<TSource> model, Stream stream)
Copy link
Member

@eerhardt eerhardt Mar 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need xml comments for new public API. #Resolved

/// <summary>
/// Load the model from the stream.
/// </summary>
/// <param name="stream">A readable, seekable stream to load from.</param>
/// <returns>The loaded model.</returns>
public ITransformer Load(Stream stream) => TransformerChain.LoadFrom(Environment, stream);

public IDataLoader<IMultiStreamSource> LoadAsCompositeDataLoader(Stream stream)
Copy link
Contributor

@TomFinley TomFinley Mar 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDataLoader [](start = 15, length = 11)

The subtle differences between usage here and usage of the above might be a bit hard of a pill to swallow. #Resolved

/// <summary>
/// Load the model from the stream.
/// </summary>
/// <param name="stream">A readable, seekable stream to load from.</param>
/// <returns>The loaded model.</returns>
public ITransformer Load(Stream stream) => TransformerChain.LoadFrom(Environment, stream);

public IDataLoader<IMultiStreamSource> LoadAsCompositeDataLoader(Stream stream)
Copy link
Member

@eerhardt eerhardt Mar 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this return CompositeDataLoader instead of IDataLoader? The name is: LoadAsCompositeDataLoader, which makes me think it should return a CompositeDataLoader #Resolved

@@ -45,13 +45,34 @@ protected SubCatalogBase(ModelOperationsCatalog owner)
/// <param name="stream">A writeable, seekable stream to save to.</param>
public void Save(ITransformer model, Stream stream) => model.SaveTo(Environment, stream);

public void Save<TSource>(IDataLoader<TSource> model, Stream stream)
Copy link
Contributor

@TomFinley TomFinley Mar 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This I think this is not correct... we need to save both the loader (or failing that, input schema) as well as the transform model to a stream. Not two separate things. #Resolved

Copy link
Contributor

@TomFinley TomFinley Mar 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall the mistake that was made with predictor models and transform models in entry-points, where we viewed these as single things saved to a stream, as opposed to having to capture the entire pipeline. This mistake was inherited by ML.NET v0.1, which led to all sorts of grief when people realized, yes, saving the loader is really freakin' important. So I think we're just making the same mistake again. A model isn't just a transform model, or a loader, or an input schema, it I think needs to capture all of them, and I think we need a change a bit more fundamental than merely adding a few methods here and there. We could have done that after v1.0. I think the problem is that the API here is just misleading.


In reply to: 262638162 [](ancestors = 262638162)

@TomFinley
Copy link
Contributor

TomFinley commented Mar 5, 2019

Were we going to try to address the problem of saving schemas? Not all data sources we train a transform chain with will be coming from a loader. #Resolved

@yaeldekel
Copy link
Author

yaeldekel commented Mar 5, 2019

Closing this PR and replacing it with #2858 that was created from the correct fork. #Resolved

@yaeldekel yaeldekel closed this Mar 5, 2019
@yaeldekel yaeldekel deleted the loadmodel branch March 5, 2019 20:34
@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants