-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
There have been some issues concerning calibrator estimators (#1871 and #1622) but not calibrators themselves.
So, calibrated models are basically wrappers for model that have.
They are ultimately something akin to CalibratedPredictorBase
. The trouble with CalibratedPredictorBase
is this property:
So, consider this code.
Lines 21 to 29 in 578c188
var pipeline = mlContext.Transforms.Concatenate("Features", featureNames) | |
.Append(mlContext.Transforms.Normalize("Features")) | |
.Append(mlContext.Regression.Trainers.OrdinaryLeastSquares( | |
labelColumn: labelName, featureColumn: "Features")); | |
var model = pipeline.Fit(data); | |
// Extract the model from the pipeline | |
var linearPredictor = model.LastTransformer; | |
var weights = PfiHelper.GetLinearModelWeights(linearPredictor.Model); |
Focus on the last part, where we're able to get the feature weights.
What is this SubPredictor
? It is this:
public IPredictorProducing<float> SubPredictor { get; } |
Great news: it has a definite type! Bad news: that is just a marker interface. As a mechanism for the API, it is as useless as if it were just, say, of type object
(which, incidentally, I will have to do anyway as part of #2251). For that reason, we see lots of code that looks like this:
var linearModel = transformerChain.LastTransformer.Model.SubPredictor as LinearBinaryModelParameters; |
var predictor = calibratedPredictor.SubPredictor as ICanSaveInIniFormat; |
This reason is, that object by itself is not useful: to get the actual model parameters, you have to do a "magical cast" to somehow get it into the right format. This sort of worked in command-line land or entry-point land, since everything was more or less dynamically typed anyway.
It might be desirable that, when training a logistic regression binary classification model, I should be able to, in a type safe fashion, be able to inspect the model weights without having to perform any "magical casts."
The most obvious solution to me is the following: calibrators becomes some sort of class which involves generics on both the "subpredictor" model parameters, as well as the calibrator. Then things like logistic regression would return instances of that generic class, or else some class derived from that generic class if we decide that must be an abstract class for some reason.7
The alternative is: we accept that "magical casts" are desirable, which I would not like since it is a little silly since I view the above "desirable" state is perfectly reasonable. But some people really hate generics.
I believe @yaeldekel had some thoughts on this, perhaps others do as well.