Skip to content

Configurable Threshold for binary models #2969

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

Ivanidzo4ka
Copy link
Contributor

fixes #2465

return new TransformerChain<BinaryPredictionTransformer<TModel>>(transformers.ToArray());
}

public BinaryPredictionTransformer<TModel> ChangeModelThreshold<TModel>(BinaryPredictionTransformer<TModel> model, float threshold)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs documentation

public BinaryPredictionTransformer<TModel> ChangeModelThreshold<TModel>(BinaryPredictionTransformer<TModel> model, float threshold)
where TModel : class
{
if (model.Threshold == threshold)
Copy link
Member

@sfilipi sfilipi Mar 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (model.Threshold == threshold) [](start = 12, length = 33)

do you want to warn here? #WontFix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should provide the same warning that C# does when you have a variable like int a = 5 and then assign 5 to it later.


In reply to: 265862991 [](ancestors = 265862991)

if (chain.LastTransformer.Threshold == threshold)
return chain;
List<ITransformer> transformers = new List<ITransformer>();
var predictionTransformer = chain.LastTransformer;
Copy link
Member

@sfilipi sfilipi Mar 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chain.LastTransformer [](start = 40, length = 21)

I don't like the assumption that the predictor is the last one, it might not be.

IMO the only API existing for this should be the second one.

If we have to have this API, i think we should minimally take in the index of the predicitonTransformer, in the pipeline, and check whether that transformer is a binaryTransformer. #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point @sfilipi. I think you're probably right about this.


In reply to: 266034490 [](ancestors = 266034490)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea was to provide helper function for user since work with transform chain is kinda painful, at least from my point.
But I can have one method.


In reply to: 268206172 [](ancestors = 268206172,266034490)

{
}

class Answer
Copy link
Member

@sfilipi sfilipi Mar 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answer [](start = 14, length = 6)

Prediction or DataWithPrediction #Resolved

//predictor.Threshold = 0.01; // Not possible
var mlContext = new MLContext(seed: 1);

var data = mlContext.Data.LoadFromTextFile<TweetSentiment>(GetDataPath(TestDatasets.Sentiment.trainFilename),
Copy link
Member

@wschin wschin Mar 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we try not to load file everywhere? It will be faster to just use in-memory data. #WontFix

Copy link
Contributor

@rogancarr rogancarr Mar 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have standard test datasets saved to files that we use in tests. #ByDesign

/// <param name="chain">Chain of transformers.</param>
/// <param name="threshold">New threshold.</param>
/// <returns></returns>
public TransformerChain<BinaryPredictionTransformer<TModel>> ChangeModelThreshold<TModel>(TransformerChain<BinaryPredictionTransformer<TModel>> chain, float threshold)
Copy link
Member

@wschin wschin Mar 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public TransformerChain<BinaryPredictionTransformer<TModel>> ChangeModelThreshold<TModel>(TransformerChain<BinaryPredictionTransformer<TModel>> chain, float threshold)
public TransformerChain<BinaryPredictionTransformer<TModel>> ChangeDecisionThreshold<TModel>(TransformerChain<BinaryPredictionTransformer<TModel>> chain, float threshold)

Maybe? #WontFix

/// <param name="chain">Chain of transformers.</param>
/// <param name="threshold">New threshold.</param>
/// <returns></returns>
public TransformerChain<BinaryPredictionTransformer<TModel>> ChangeModelThreshold<TModel>(TransformerChain<BinaryPredictionTransformer<TModel>> chain, float threshold)
Copy link
Member

@wschin wschin Mar 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this should be a new function. Could we add a parameter, threshold, to all binary trainers? #Pending

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, we can add that as parameter to binary trainer. Question is if you train your model, how you gonna change threshold? Retrain model?
I think this method has right to live.


In reply to: 266062138 [](ancestors = 266062138)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retrain looks fine to me. I really don't feel adding a helper function is a good idea. This is not a Transformer, so I expect it will become a orphan in the future. Like FFM, PFI and so on don't care about it because it's not a standard binary classifier.


In reply to: 266088129 [](ancestors = 266088129,266062138)

Copy link
Contributor

@TomFinley TomFinley Mar 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this should be a new function. Could we add a parameter, threshold, to all binary trainers? #Pending

Historically we have found that adding options to "all" trainers just invites inconsistency and is a nightmare from a maintainability perspective. For those reasons we no longer do that. So I strongly object to that. There is also the larger, more practical problem that choosing the right threshold is something that you can only really do once you have investigated it -- that is, it is very often a post training operation, not something you do pre-training.

This sort of "composable" nature of IDataView is actually I think something we need to reiterate, since it was the key to making our development efforts scale; and that composability is built around having simple, comprehensible units of computation. Not big bundled components that tried to do everything themselves. We already tried that way, and life was a lot worse and more inconsistent before we had it, and reverting to the "old ways" of every conceivable functionality bundled into a single operation would just reintroduce the old problems that led us to move to many operations of simple operators in the first place. #Resolved

Copy link
Contributor

@TomFinley TomFinley Mar 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I think about it more, there's something about this idea of getting ITransformer implementors from existing ITransformer implementors that I find very appealing. Not just for this (which is a worthy use of this idea), but many other scenarios as well.

So for example, certain regressor algorithms are parametric w.r.t. their labels (in fact, most are). But there's a problem with merely normalizing the label, because then the predicted label is according to that same scale. In sklearn you could accomplish this fairly easily via the inverse_transform method on their equivalent of what we call a normalizer, the StandardScalar. So imagine you could get from a NormalizerTransformer another NormalizerTransformer that provides the inverse offset and scaling for any affine normalization, and whatnot. That would be pretty nice, would it not be?

So far from discouraging this pattern, I think we should do more of it. #Resolved

var predictionTransformer = chain.LastTransformer;
foreach (var transform in chain)
{
if (transform != predictionTransformer)
Copy link
Contributor

@TomFinley TomFinley Mar 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

predictionTransformer [](start = 33, length = 21)

Can we change this just a little please? I would prefer that we just add all transforms except the last unconditionally, which would be a fairly easy thing to do.

Edit: Actually no @sfilipi is right, I think operating over chains is misguided now that I see her argument... #Resolved

@TomFinley
Copy link
Contributor

Helper function for IEstimator<...>?

@Ivanidzo4ka
Copy link
Contributor Author

I'm a bit struggle to understand what should it do.
Can you help me?


In reply to: 475655779 [](ancestors = 475655779)

@codecov
Copy link

codecov bot commented Mar 25, 2019

Codecov Report

Merging #2969 into master will increase coverage by <.01%.
The diff coverage is 96.22%.

@@            Coverage Diff             @@
##           master    #2969      +/-   ##
==========================================
+ Coverage   72.52%   72.52%   +<.01%     
==========================================
  Files         807      807              
  Lines      144474   144513      +39     
  Branches    16192    16195       +3     
==========================================
+ Hits       104780   104815      +35     
- Misses      35288    35289       +1     
- Partials     4406     4409       +3
Flag Coverage Δ
#Debug 72.52% <96.22%> (ø) ⬆️
#production 68.14% <60%> (-0.01%) ⬇️
#test 88.78% <100%> (+0.01%) ⬆️
Impacted Files Coverage Δ
...Microsoft.ML.Data/Scorers/PredictionTransformer.cs 97.02% <ø> (ø) ⬆️
test/Microsoft.ML.Functional.Tests/Prediction.cs 100% <100%> (ø) ⬆️
src/Microsoft.ML.Data/TrainCatalog.cs 82.35% <60%> (-0.57%) ⬇️
src/Microsoft.ML.Maml/MAML.cs 24.75% <0%> (-1.46%) ⬇️
...StandardTrainers/Standard/LinearModelParameters.cs 60.05% <0%> (-0.27%) ⬇️
...soft.ML.Data/DataLoadSave/Text/TextLoaderCursor.cs 84.7% <0%> (-0.21%) ⬇️
...ML.Transforms/Text/StopWordsRemovingTransformer.cs 86.1% <0%> (-0.16%) ⬇️
src/Microsoft.ML.Transforms/Text/LdaTransform.cs 89.89% <0%> (+0.62%) ⬆️

@TomFinley
Copy link
Contributor

Well, we can add it later if you like. But the idea is, an extension method on top of ITrainerEstimator or IEstimator, that is producing one of these BinaryPredictionTransformer<TModel> things, and is just a thin wrapper (the implementation of this wrapped thing would be minimal, private, and really only a few lines long.) It would store the ITrainerEstimator is is wrapping, and pass through everything on it, except for the Fit method, which would get the fit method then call the method you've added to apply.

Just so you could add this thing to an estimator pipeline. But, if it's unclear to you, we can always add it later. It is not essential, it was just something we had talked about doing.


In reply to: 476308360 [](ancestors = 476308360,475655779)

Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Ivanidzo4ka. I might prefer to have the similar mechanisms for ITrainerEstimator or IEstimator but I don't insist on it.

{
if (model.Threshold == threshold)
return model;
return new BinaryPredictionTransformer<TModel>(Environment, model.Model, model.TrainSchema, model.FeatureColumnName, threshold, model.ThresholdColumn);
Copy link
Contributor

@rogancarr rogancarr Mar 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model.ThresholdColumn [](start = 140, length = 21)

Technically, Issue #2465 was that we should be able to set the Threshold and ThresholdColumn properties of the BinaryPredictionTransformer. That said, I think that the actual use cases that we care about are changing the Threshold; the ThresholdColumn seems more important when we are creating a new BinaryPredictionTransformer. I actually don't think we need to modify that property. I'll update the issue that just setting a Threshold would be nice.

(Note that in practice, we have a Score column and a Probability column; modulo floating point error, these are a 1:1 mapping, so we will get the same results no matter which one we threshold on. Setting a custom threshold column seems more like a backdoor for crazy things: e.g. creating a custom score based on a few models and / or heuristics and then modifying a BinaryPredictionTransformer to score that column instead.) #Resolved

{
}

class Prediction
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class Prediction [](start = 8, length = 16)

Please add to Datasets/CommonColumns.cs, and call it PredictionColumns.

transformers.Add(transform);
}
transformers.Add(mlContext.BinaryClassification.ChangeModelThreshold(model.LastTransformer, 0.7f));
var newModel = new TransformerChain<BinaryPredictionTransformer<CalibratedModelParametersBase<LinearBinaryModelParameters, PlattCalibrator>>>(transformers.ToArray());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new TransformerChain<BinaryPredictionTransformer<CalibratedModelParametersBase<LinearBinaryModelParameters, PlattCalibrator>>> [](start = 27, length = 126)

Would it be better to do an in-place change rather than making a whole new chain? The new TransformerChain that comes back after still has references to the previous objects anyways. The only new thing here is BinaryPredictionTransformer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to do an in-place change rather than making a whole new chain?

No. We rely upon ITransformers not being mutable objects in many, many places.

The reason why they must be immutable is one of the "practical corollaries" to the IDataView design principles. We rely on several places on the ITransformer implementors being consistent. (Not least inside the ITransformer implementations themselves!) We often form chains of ITransformers (pipelines, in other words), where each transform is the result of fitting to the result of the last. From a practical, user facing perspective, if these were suddenly to become mutable objects in their behavior w.r.t. transformation, the basic assumption that underlies why that is reliable at all would be compromised. So if you have a chain of transformers A, B, C, if you can suddenly mutate B's behavior, the assumptions under which C was fit no longer hold, in ways that are impossible to detect. (Transformers when fit often depend on the distributional behavior of their inputs.)

Now then, nothing prevents someone from forming another transform B' derived from B, taking the original A and C, and forming the chain A, B', and C. But I think this is understood to be a generally more risky operation. We have, and I think users have, I think a strong assumption that if ITransformer (including chains) works once, it shall continue to work of they don't do anything to it, something that would break if we were to allow them to be mutable.


Common.AssertMetrics(metrics);

// Todo #2465: Allow the setting of threshold and thresholdColumn for scoring.
Copy link
Contributor

@rogancarr rogancarr Mar 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todo #2465: [](start = 15, length = 11)

Thank you! #Resolved

rogancarr
rogancarr previously approved these changes Mar 25, 2019
Copy link
Contributor

@rogancarr rogancarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good as is — will approve after talking about the in-place vs. return issue.

Update: Spoke Offline. Tom will update with the design principles that lead to this behavior.

@rogancarr rogancarr dismissed their stale review March 25, 2019 19:26

revoking review

Copy link
Contributor

@rogancarr rogancarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚡️

@Ivanidzo4ka Ivanidzo4ka merged commit 0b638bf into dotnet:master Mar 25, 2019
@@ -256,6 +256,14 @@ internal BinaryClassificationTrainers(BinaryClassificationCatalog catalog)
Evaluate(x.Scores, labelColumnName), x.Scores, x.Fold)).ToArray();
}

public BinaryPredictionTransformer<TModel> ChangeModelThreshold<TModel>(BinaryPredictionTransformer<TModel> model, float threshold)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should put XML comments on all public members.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot set the threshold on a binary predictor
6 participants