Skip to content

Add API to save/load models with their input schema #2735

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
yaeldekel opened this issue Feb 26, 2019 · 3 comments
Closed

Add API to save/load models with their input schema #2735

yaeldekel opened this issue Feb 26, 2019 · 3 comments
Assignees
Labels
API Issues pertaining the friendly API
Milestone

Comments

@yaeldekel
Copy link

Reasons for this are listed in issue #2663.

Currently, ModelOperationsCatalog offers the following API:

public void Save(ITransformer model, Stream stream) 
public ITransformer Load(Stream stream)

So when using a loaded model, users have to create the IDataView to be passed to the ITransformer themselves by creating a new TextLoader, (or another way?).
I suggest adding these new APIs to ModelOperationsCatalog:

public void Save<TSource>(IDataReader<TSource> model, Stream stream);
public void Save<TSource>(IDataReader<TSource> reader, ITransformer model, Stream stream);
public IDataReader<TSource> Load<TSource>(Stream stream);

The last one would return a CompositeDataReader containing the loader and the ITransformer chain, so we could also add new APIs to DataOperationsCatalog to only load the reader:

public TextLoader CreateTextLoader(Stream stream);

Another option is to add an API that creates a PredictionEngine from a Stream, or an API that creates a SchemaDefinition from a Stream (that way users can use the existing API to create a PredictionEngine).

@TomFinley, what do you think?

@TomFinley
Copy link
Contributor

Hi @yaeldekel, this this seems like a good first step. We will need at least what you've proposed here I think, so adding these would certainly not be harmful. I'd also add that saving the input schema itself in the case where the loaded data is, say, programmatically defined may also be necessary. (Sometimes you aren't loading using a loader at all, but this does not mean preserving the input schema is any less important.)

There's also a few more interesting things. You'll note the presence of this IDataReader<TSource> reader. Yet, IDataReader<in TSource> does not depend on ICanSaveModel (we made transformers descend from it, per @artidoro's #2431. It seems like it probably should.

It also seems to me that the presence of this <in TSource> is an interesting wrinkle; back when we had pipelines based on the old IDataLoader, there was no confusion about what the type of that was -- always an IMultiStreamSource. Yet in the new world, it could be practically anything. Or it could be nothing at all, but we might still want to be able to load the schema. (But this seems to me to have devolved into the dual of the point I made in the first paragraph. It seems like we'll need a way to load a model file and get only what had been the GetOutputSchema() on the reader if specified, or the data view schema if that was what was saved with the pipeline.)

Anyway: think the work you've proposed is a positive first step, and I think we should give it a shot, But it seems to me we need to develop this idea more fully. Those are just the most obvious holes in the idea I see right off the bat, there may be more, or solutions might become more obvious once we start practically working on it, as I find is often the case.

@eerhardt
Copy link
Member

eerhardt commented Mar 1, 2019

If this is strictly "adding" APIs, I don't think this is "Project 13" work. We can add those APIs after v1.

Do you view this as something that cannot be fixed after v1?

@TomFinley
Copy link
Contributor

If this is strictly "adding" APIs, I don't think this is "Project 13" work. We can add those APIs after v1.

Do you view this as something that cannot be fixed after v1?

I consider the APIs need to change, since they are saving "incomplete" models. So I'd like to remove and rework the APIs in their current form, since they are leading people into "pits of failure."

@shauheen shauheen added the API Issues pertaining the friendly API label Mar 7, 2019
@shauheen shauheen added this to the 0319 milestone Mar 13, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API
Projects
None yet
Development

No branches or pull requests

4 participants