Skip to content

State of CalibratorPredictorBase v1 #2378

@TomFinley

Description

@TomFinley

There have been some issues concerning calibrator estimators (#1871 and #1622) but not calibrators themselves.

So, calibrated models are basically wrappers for model that have.

They are ultimately something akin to CalibratedPredictorBase. The trouble with CalibratedPredictorBase is this property:

So, consider this code.

var pipeline = mlContext.Transforms.Concatenate("Features", featureNames)
.Append(mlContext.Transforms.Normalize("Features"))
.Append(mlContext.Regression.Trainers.OrdinaryLeastSquares(
labelColumn: labelName, featureColumn: "Features"));
var model = pipeline.Fit(data);
// Extract the model from the pipeline
var linearPredictor = model.LastTransformer;
var weights = PfiHelper.GetLinearModelWeights(linearPredictor.Model);

Focus on the last part, where we're able to get the feature weights.

What is this SubPredictor? It is this:

public IPredictorProducing<float> SubPredictor { get; }

Great news: it has a definite type! Bad news: that is just a marker interface. As a mechanism for the API, it is as useless as if it were just, say, of type object (which, incidentally, I will have to do anyway as part of #2251). For that reason, we see lots of code that looks like this:

var linearModel = transformerChain.LastTransformer.Model.SubPredictor as LinearBinaryModelParameters;

var predictor = calibratedPredictor.SubPredictor as ICanSaveInIniFormat;

This reason is, that object by itself is not useful: to get the actual model parameters, you have to do a "magical cast" to somehow get it into the right format. This sort of worked in command-line land or entry-point land, since everything was more or less dynamically typed anyway.

It might be desirable that, when training a logistic regression binary classification model, I should be able to, in a type safe fashion, be able to inspect the model weights without having to perform any "magical casts."

The most obvious solution to me is the following: calibrators becomes some sort of class which involves generics on both the "subpredictor" model parameters, as well as the calibrator. Then things like logistic regression would return instances of that generic class, or else some class derived from that generic class if we decide that must be an abstract class for some reason.7

The alternative is: we accept that "magical casts" are desirable, which I would not like since it is a little silly since I view the above "desirable" state is perfectly reasonable. But some people really hate generics.

I believe @yaeldekel had some thoughts on this, perhaps others do as well.

Metadata

Metadata

Assignees

Labels

APIIssues pertaining the friendly API

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions