Add API to save/load models with their input schema #2735

yaeldekel · 2019-02-26T16:29:04Z

Reasons for this are listed in issue #2663.

Currently, ModelOperationsCatalog offers the following API:

public void Save(ITransformer model, Stream stream) 
public ITransformer Load(Stream stream)

So when using a loaded model, users have to create the IDataView to be passed to the ITransformer themselves by creating a new TextLoader, (or another way?).
I suggest adding these new APIs to ModelOperationsCatalog:

public void Save<TSource>(IDataReader<TSource> model, Stream stream);
public void Save<TSource>(IDataReader<TSource> reader, ITransformer model, Stream stream);
public IDataReader<TSource> Load<TSource>(Stream stream);

The last one would return a CompositeDataReader containing the loader and the ITransformer chain, so we could also add new APIs to DataOperationsCatalog to only load the reader:

public TextLoader CreateTextLoader(Stream stream);

Another option is to add an API that creates a PredictionEngine from a Stream, or an API that creates a SchemaDefinition from a Stream (that way users can use the existing API to create a PredictionEngine).

@TomFinley, what do you think?

The text was updated successfully, but these errors were encountered:

TomFinley · 2019-02-26T19:02:03Z

Hi @yaeldekel, this this seems like a good first step. We will need at least what you've proposed here I think, so adding these would certainly not be harmful. I'd also add that saving the input schema itself in the case where the loaded data is, say, programmatically defined may also be necessary. (Sometimes you aren't loading using a loader at all, but this does not mean preserving the input schema is any less important.)

There's also a few more interesting things. You'll note the presence of this IDataReader<TSource> reader. Yet, IDataReader<in TSource> does not depend on ICanSaveModel (we made transformers descend from it, per @artidoro's #2431. It seems like it probably should.

It also seems to me that the presence of this <in TSource> is an interesting wrinkle; back when we had pipelines based on the old IDataLoader, there was no confusion about what the type of that was -- always an IMultiStreamSource. Yet in the new world, it could be practically anything. Or it could be nothing at all, but we might still want to be able to load the schema. (But this seems to me to have devolved into the dual of the point I made in the first paragraph. It seems like we'll need a way to load a model file and get only what had been the GetOutputSchema() on the reader if specified, or the data view schema if that was what was saved with the pipeline.)

Anyway: think the work you've proposed is a positive first step, and I think we should give it a shot, But it seems to me we need to develop this idea more fully. Those are just the most obvious holes in the idea I see right off the bat, there may be more, or solutions might become more obvious once we start practically working on it, as I find is often the case.

eerhardt · 2019-03-01T22:19:12Z

If this is strictly "adding" APIs, I don't think this is "Project 13" work. We can add those APIs after v1.

Do you view this as something that cannot be fixed after v1?

TomFinley · 2019-03-06T16:18:39Z

If this is strictly "adding" APIs, I don't think this is "Project 13" work. We can add those APIs after v1.

Do you view this as something that cannot be fixed after v1?

I consider the APIs need to change, since they are saving "incomplete" models. So I'd like to remove and rework the APIs in their current form, since they are leading people into "pits of failure."

yaeldekel mentioned this issue Feb 26, 2019

SlotNames for TextLoader are lost #2663

Closed

artidoro mentioned this issue Feb 27, 2019

Rename IDataLoader, IDataReader and IDataReaderEstimator #2731

Merged

yaeldekel self-assigned this Feb 27, 2019

This was referenced Mar 5, 2019

Add API to save/load models with their input schema #2850

Closed

Add save/load APIs for IDataLoader #2858

Merged

shauheen added the API Issues pertaining the friendly API label Mar 7, 2019

rogancarr mentioned this issue Mar 9, 2019

Two Ways to Save a Model #2897

Closed

shauheen added this to the 0319 milestone Mar 13, 2019

yaeldekel closed this as completed in #2858 Mar 18, 2019

This was referenced Mar 19, 2019

About models, ITransform, and IDataLoader, saving/loading #3025

Closed

Added an extension method for saving statically typed model (#1286) #2924

Closed

ghost locked as resolved and limited conversation to collaborators Mar 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add API to save/load models with their input schema #2735

Add API to save/load models with their input schema #2735

yaeldekel commented Feb 26, 2019

TomFinley commented Feb 26, 2019

eerhardt commented Mar 1, 2019

TomFinley commented Mar 6, 2019

Add API to save/load models with their input schema #2735

Add API to save/load models with their input schema #2735

Comments

yaeldekel commented Feb 26, 2019

TomFinley commented Feb 26, 2019

eerhardt commented Mar 1, 2019

TomFinley commented Mar 6, 2019