-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Three major concepts: Estimators, Transformers and Data #581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
And this is approximately how we can dissect the pipeline (before or after training) and recompose it. Here, I'm going to strip out a loader and make it into a prediction engine. ITransformer<IMultiStreamSource> loader;
IEnumerable<IDataTransformer> steps;
(loader, steps) = model.GetParts();
var engine = new MyPredictionEngine<IrisData, IrisPrediction>(env, loader.GetOutputSchema(), steps);
IrisPrediction prediction = engine.Predict(new IrisData()
{
SepalLength = 5.1f,
SepalWidth = 3.3f,
PetalLength = 1.6f,
PetalWidth = 0.2f,
}); And this is how I can take out a normalizer, because I'm crazy: var bogusEngine = new MyPredictionEngine<IrisData, IrisPrediction>(env, loader.GetOutputSchema(), new[] { steps.First(), steps.Last() });
IrisPrediction bogusPrediction = bogusEngine.Predict(new IrisData()
{
SepalLength = 5.1f,
SepalWidth = 3.3f,
PetalLength = 1.6f,
PetalWidth = 0.2f,
}); |
Hi @Zruty0 , this seems positive. The "estimator" logic is what we consider the "ideal" solution to #267. It also could be a declarative structure. I would like input from @interesaaat and @tcondie, if they can be persuaded to provide notes. Also separating out the conflation between model and data would avoid #580. Strong typing seems like a problem in the current proposal. To take an example: a linear trainer produces a linear predictor ( |
Regarding MakeTextLoaderArgs: Inside the constructor, you can do such a thing as which ensures that you'll always have an correctly initialized object to deal with for further processing: |
Oh that's an interesting idea @alexdegroot ... that way you don't have to expose the details of constructing this little object at all. Hmmm. Something about that is very appealing. |
Yes @alexdegroot , this sounds like a great idea to me. I am a little bit suspicious of introducing yet another level of indirection into the API (data -> |
There's also a chance to do both, simply inject the object as argument or use the arg mutating delegate. When it comes to consistency, I'd opt for a single way to produce Args across all these objects. If you want to do things fluent, then basically you should never have to leave your stack of calls. As bonus you can simply comment out a few lines. |
This might be just a type-o, but what is the difference between |
Will the difference of For example the TextTransform is trainable if using the dictionary method, but not when using hashing. I'm unsure how NAHandleTransform is coded, but simply replacing the default-value for the datatype doesn't need a trainable transform whereas replacing w/ the mean-value would. |
@eerhardt , it's not a typo.
|
@justinormont , generally speaking, yes. For both trainable and non-trainable, there will also be corresponding For example, if you try to instantiate a For |
Do we need two completely separate It would be unfortunate if we had two parallel "schema" type graphs, and developers had to duplicate code to inspect/construct/etc the two different schema graphs we had. |
The current framework is separating the 'relaxed schema' into a separate collection. I don't really like it, and I would much rather have one, but it would be a lot of work to reconcile the two: mainly the existing schema handling code that somehow needs to define what it will do with a relaxed schema. |
Regarding the two schema types: It seems to me that |
@Zruty0 @TomFinley - how much work is left for this issue? Do you think this can be closed? |
Yep, I think we can close it. |
This is still an incomplete proposal, but I played for a bit with what I had, and it looks promising to me so far.
The general idea is that we narrow our 'zoo' of components (transforms, predictors, scorers, loaders etc) down to three kinds:
IDataView
with schema, like before.Obviously, a chain of transformers can itself behave as a transformer, and a chain of estimators can behave like estimators.
We also introduce a 'data reader' (and its estimator), responsible for bringing the data 'from outside' (think loaders):
I have gone through the motions of creating a 'pipeline estimator' and 'pipeline transformer' objects, which then allows me to write this code to train and test:
Here, the only catch is the 'MakeTextLoaderArgs', which is an obnoxiously long way to define the original schema of the text loader. But it is obviously subject to improvement.
The full 'playground' is available at https://github.com/Zruty0/machinelearning/tree/feature/estimators
The text was updated successfully, but these errors were encountered: