-
Notifications
You must be signed in to change notification settings - Fork 1.9k
API: Binary Classification Training Context #949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Having the thing be an actual instance is pretty appealing as it turns out, since then you can "smuggle" to the component instantiating object an It may be that contexts can also have on them things like estimators, e.g., a |
The idea that a 'task specific context' serves as a 'catalog of things that are reasonable to do if you solve this type of problem' appears powerful at a first glance. pipeline.Add(context.Transforms.ImageOperations.GlobalContrastNormalizer("ImagePixels")) // env smuggled via context and static typing could have pipeline.Append(row => (Image: context.Transforms.ImageOperations.GlobalContrastNormalizer(row.ImagePixels), ...)) and the same var reader = context.DataReaders.TextLoader(ctx=>(label: ctx.LoadBool(0), features: ctx.LoadFloat(1, 10)));
// rather than static TextLoader.CreateReader
var estimator = context.LearningPipeline.StartWith(reader)
.Append(row=> (features: row.features.Normalize(), row.label)
// rather than extension reader.MakeNewPipeline
var trainedModel = estimator.Fit(reader.Read("data.tsv"));
trainedModel.SaveTo("model.zip");
var scoringModel = context.LoadModel("model.zip");
// rather than static TransformerChain.LoadFrom I agree that this is very promising, and let's make some baby steps in this direction to validate. |
The one thing that is kind of annoying is that we have three ways of doing a single thing, for practically everything. We had already for the sake of statically typed pipelines "two" ways of doing things, and now we have "three" ways, one for the static, and two for the dynamic. So let's take evaluation as one example. We have the following ways:
SDCA is another example:
This improves discoverability of components a lot, but it will entail some degree of duplication, especially in documentation, which is really actually the most annoying part of this so far. |
There seems to be something appealing about a convenience object whose purpose is to help "guide" people on the path to a successful experiment. So for example, someone might have a pipeline where they featurize, then learn, then evaluate on a test set. Each of these is of course naturally implemented in separate classes, which is good. But it also means that the ingredients necessary to compose a successful experiment are naturally spread hither and yon.
You might imagine that in addition to the components, there might be some sort of "task context" object, like for example, a
BinaryClassifierContext
. This might have common facilities: for example, a common way to "browse" binary classifier trainers, and to evaluate binary classification outputs.There is something appealing about doing this:
vs. this
The latter case is certainly no less powerful, but if I imagine someone tooling around in intellisense, the sheer number of things you'll get by including the key namespaces and saying
new
is absolutely dizzying, vs. this context which can be very, very focused.In the case of static pipelines the story is a little bit better, "we provide extension methods on
Scalar<bool>
", which is OK if you know that, but if you don't happen to know that, I see no reasonable way you could discover that without reading documentation and samples. (Of course for that matter I see ). But requiring knowledge at the level of, "if you want to do something related to binary classifiers, please saynew BinaryClassifierContext
" or something, that seems kind of reasonable to me.This hypothetical
Context
object would contain at least two things: the first is a property. (It must be an actual instance because the only way external assemblies could "add" their learners to it would be via extension methods.) The second is one or moreEvaluate
methods to produce metrics.These "objects" do have state in the sense that they must have an
IHostEnvironment
, but aside from this are more or less like "namespaces," with the important difference possibly that you can't have a top level function as a namespace. (Though perhaps we don't care about doing functions.) There was some thought that if we also defined pipelines through them we could avoid having environments in the dynamic pipelines altogether (as we already do for static pipelines), but how this would be accomplished is not clear to me.Also because the only reasonable way things can add themselves is via an extension method, this
Trainers
object would have to be an actual instance... now then, it needn't actually be instantiable -- one can call extension methods on thenull
of an object as well as anything so long as we don't want to get any information out of it -- but that is a little awkward. If we could just put extension methods on, say, a static class or something that would be nice, but we can't.Work Item
The first thing I will do is create a binary classification training context object, as an exploration of the idea. If we like the idea, we can extend it to the other tasks as well.
The text was updated successfully, but these errors were encountered: