-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Configurable Threshold for binary models #2969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configurable Threshold for binary models #2969
Conversation
return new TransformerChain<BinaryPredictionTransformer<TModel>>(transformers.ToArray()); | ||
} | ||
|
||
public BinaryPredictionTransformer<TModel> ChangeModelThreshold<TModel>(BinaryPredictionTransformer<TModel> model, float threshold) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs documentation
public BinaryPredictionTransformer<TModel> ChangeModelThreshold<TModel>(BinaryPredictionTransformer<TModel> model, float threshold) | ||
where TModel : class | ||
{ | ||
if (model.Threshold == threshold) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (model.Threshold == threshold) [](start = 12, length = 33)
do you want to warn here? #WontFix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should provide the same warning that C# does when you have a variable like int a = 5
and then assign 5
to it later.
In reply to: 265862991 [](ancestors = 265862991)
if (chain.LastTransformer.Threshold == threshold) | ||
return chain; | ||
List<ITransformer> transformers = new List<ITransformer>(); | ||
var predictionTransformer = chain.LastTransformer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chain.LastTransformer [](start = 40, length = 21)
I don't like the assumption that the predictor is the last one, it might not be.
IMO the only API existing for this should be the second one.
If we have to have this API, i think we should minimally take in the index of the predicitonTransformer, in the pipeline, and check whether that transformer is a binaryTransformer. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My idea was to provide helper function for user since work with transform chain is kinda painful, at least from my point.
But I can have one method.
In reply to: 268206172 [](ancestors = 268206172,266034490)
{ | ||
} | ||
|
||
class Answer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Answer [](start = 14, length = 6)
Prediction or DataWithPrediction #Resolved
//predictor.Threshold = 0.01; // Not possible | ||
var mlContext = new MLContext(seed: 1); | ||
|
||
var data = mlContext.Data.LoadFromTextFile<TweetSentiment>(GetDataPath(TestDatasets.Sentiment.trainFilename), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we try not to load file everywhere? It will be faster to just use in-memory data. #WontFix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have standard test datasets saved to files that we use in tests. #ByDesign
/// <param name="chain">Chain of transformers.</param> | ||
/// <param name="threshold">New threshold.</param> | ||
/// <returns></returns> | ||
public TransformerChain<BinaryPredictionTransformer<TModel>> ChangeModelThreshold<TModel>(TransformerChain<BinaryPredictionTransformer<TModel>> chain, float threshold) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public TransformerChain<BinaryPredictionTransformer<TModel>> ChangeModelThreshold<TModel>(TransformerChain<BinaryPredictionTransformer<TModel>> chain, float threshold) | |
public TransformerChain<BinaryPredictionTransformer<TModel>> ChangeDecisionThreshold<TModel>(TransformerChain<BinaryPredictionTransformer<TModel>> chain, float threshold) |
Maybe? #WontFix
/// <param name="chain">Chain of transformers.</param> | ||
/// <param name="threshold">New threshold.</param> | ||
/// <returns></returns> | ||
public TransformerChain<BinaryPredictionTransformer<TModel>> ChangeModelThreshold<TModel>(TransformerChain<BinaryPredictionTransformer<TModel>> chain, float threshold) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this should be a new function. Could we add a parameter, threshold
, to all binary trainers? #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, we can add that as parameter to binary trainer. Question is if you train your model, how you gonna change threshold? Retrain model?
I think this method has right to live.
In reply to: 266062138 [](ancestors = 266062138)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retrain looks fine to me. I really don't feel adding a helper function is a good idea. This is not a Transformer, so I expect it will become a orphan in the future. Like FFM, PFI and so on don't care about it because it's not a standard binary classifier.
In reply to: 266088129 [](ancestors = 266088129,266062138)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this should be a new function. Could we add a parameter,
threshold
, to all binary trainers? #Pending
Historically we have found that adding options to "all" trainers just invites inconsistency and is a nightmare from a maintainability perspective. For those reasons we no longer do that. So I strongly object to that. There is also the larger, more practical problem that choosing the right threshold is something that you can only really do once you have investigated it -- that is, it is very often a post training operation, not something you do pre-training.
This sort of "composable" nature of IDataView
is actually I think something we need to reiterate, since it was the key to making our development efforts scale; and that composability is built around having simple, comprehensible units of computation. Not big bundled components that tried to do everything themselves. We already tried that way, and life was a lot worse and more inconsistent before we had it, and reverting to the "old ways" of every conceivable functionality bundled into a single operation would just reintroduce the old problems that led us to move to many operations of simple operators in the first place. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I think about it more, there's something about this idea of getting ITransformer
implementors from existing ITransformer
implementors that I find very appealing. Not just for this (which is a worthy use of this idea), but many other scenarios as well.
So for example, certain regressor algorithms are parametric w.r.t. their labels (in fact, most are). But there's a problem with merely normalizing the label, because then the predicted label is according to that same scale. In sklearn
you could accomplish this fairly easily via the inverse_transform
method on their equivalent of what we call a normalizer, the StandardScalar
. So imagine you could get from a NormalizerTransformer
another NormalizerTransformer
that provides the inverse offset and scaling for any affine normalization, and whatnot. That would be pretty nice, would it not be?
So far from discouraging this pattern, I think we should do more of it. #Resolved
var predictionTransformer = chain.LastTransformer; | ||
foreach (var transform in chain) | ||
{ | ||
if (transform != predictionTransformer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
predictionTransformer [](start = 33, length = 21)
Can we change this just a little please? I would prefer that we just add all transforms except the last unconditionally, which would be a fairly easy thing to do.
Edit: Actually no @sfilipi is right, I think operating over chains is misguided now that I see her argument... #Resolved
Helper function for |
I'm a bit struggle to understand what should it do. In reply to: 475655779 [](ancestors = 475655779) |
Codecov Report
@@ Coverage Diff @@
## master #2969 +/- ##
==========================================
+ Coverage 72.52% 72.52% +<.01%
==========================================
Files 807 807
Lines 144474 144513 +39
Branches 16192 16195 +3
==========================================
+ Hits 104780 104815 +35
- Misses 35288 35289 +1
- Partials 4406 4409 +3
|
Well, we can add it later if you like. But the idea is, an extension method on top of Just so you could add this thing to an estimator pipeline. But, if it's unclear to you, we can always add it later. It is not essential, it was just something we had talked about doing. In reply to: 476308360 [](ancestors = 476308360,475655779) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Ivanidzo4ka. I might prefer to have the similar mechanisms for ITrainerEstimator
or IEstimator
but I don't insist on it.
{ | ||
if (model.Threshold == threshold) | ||
return model; | ||
return new BinaryPredictionTransformer<TModel>(Environment, model.Model, model.TrainSchema, model.FeatureColumnName, threshold, model.ThresholdColumn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model.ThresholdColumn [](start = 140, length = 21)
Technically, Issue #2465 was that we should be able to set the Threshold
and ThresholdColumn
properties of the BinaryPredictionTransformer
. That said, I think that the actual use cases that we care about are changing the Threshold
; the ThresholdColumn
seems more important when we are creating a new BinaryPredictionTransformer
. I actually don't think we need to modify that property. I'll update the issue that just setting a Threshold
would be nice.
(Note that in practice, we have a Score
column and a Probability
column; modulo floating point error, these are a 1:1 mapping, so we will get the same results no matter which one we threshold on. Setting a custom threshold column seems more like a backdoor for crazy things: e.g. creating a custom score based on a few models and / or heuristics and then modifying a BinaryPredictionTransformer
to score that column instead.) #Resolved
{ | ||
} | ||
|
||
class Prediction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class Prediction [](start = 8, length = 16)
Please add to Datasets/CommonColumns.cs
, and call it PredictionColumns
.
transformers.Add(transform); | ||
} | ||
transformers.Add(mlContext.BinaryClassification.ChangeModelThreshold(model.LastTransformer, 0.7f)); | ||
var newModel = new TransformerChain<BinaryPredictionTransformer<CalibratedModelParametersBase<LinearBinaryModelParameters, PlattCalibrator>>>(transformers.ToArray()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new TransformerChain<BinaryPredictionTransformer<CalibratedModelParametersBase<LinearBinaryModelParameters, PlattCalibrator>>> [](start = 27, length = 126)
Would it be better to do an in-place change rather than making a whole new chain? The new TransformerChain
that comes back after still has references to the previous objects anyways. The only new thing here is BinaryPredictionTransformer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to do an in-place change rather than making a whole new chain?
No. We rely upon ITransformer
s not being mutable objects in many, many places.
The reason why they must be immutable is one of the "practical corollaries" to the IDataView design principles. We rely on several places on the ITransformer
implementors being consistent. (Not least inside the ITransformer
implementations themselves!) We often form chains of ITransformer
s (pipelines, in other words), where each transform is the result of fitting to the result of the last. From a practical, user facing perspective, if these were suddenly to become mutable objects in their behavior w.r.t. transformation, the basic assumption that underlies why that is reliable at all would be compromised. So if you have a chain of transformers A
, B
, C
, if you can suddenly mutate B
's behavior, the assumptions under which C
was fit no longer hold, in ways that are impossible to detect. (Transformers when fit often depend on the distributional behavior of their inputs.)
Now then, nothing prevents someone from forming another transform B'
derived from B
, taking the original A
and C
, and forming the chain A
, B'
, and C
. But I think this is understood to be a generally more risky operation. We have, and I think users have, I think a strong assumption that if ITransformer
(including chains) works once, it shall continue to work of they don't do anything to it, something that would break if we were to allow them to be mutable.
|
||
Common.AssertMetrics(metrics); | ||
|
||
// Todo #2465: Allow the setting of threshold and thresholdColumn for scoring. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Todo #2465: [](start = 15, length = 11)
Thank you! #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good as is — will approve after talking about the in-place vs. return issue.
Update: Spoke Offline. Tom will update with the design principles that lead to this behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚡️
@@ -256,6 +256,14 @@ internal BinaryClassificationTrainers(BinaryClassificationCatalog catalog) | |||
Evaluate(x.Scores, labelColumnName), x.Scores, x.Fold)).ToArray(); | |||
} | |||
|
|||
public BinaryPredictionTransformer<TModel> ChangeModelThreshold<TModel>(BinaryPredictionTransformer<TModel> model, float threshold) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should put XML comments on all public members.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixes #2465