-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Update calibrator estimators to be more suitable. #2526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -20,7 +19,8 @@ namespace Microsoft.ML | |||
/// </summary> | |||
public abstract class TrainCatalogBase | |||
{ | |||
protected internal readonly IHost Host; | |||
[BestFriend] | |||
private protected readonly IHost Host; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private [](start = 8, length = 7)
is this attribute valid on a private field? ("A private protected member is accessible by types derived from the containing class, but only within its containing assembly.") #ByDesign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a private field. See the modifier private protected
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This field seems to be also used in RecommenderCatalog.cs (which is in a separate assembly Microsoft.ML.Recommender).
So the [BestFriend] attribute is valid for both internal
and private protected
access modifiers. Am i right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a very quirky behaviour of C# 7.2 The reference doc does not do us any favors when saying that "private protected is ...only within its containing assembly." Apparently they treat private protected as "internal" from the point of view of the InternalsVisibleTo attribute. This is actual behaviour and under the hood the CLR marks the private protected as protectedAndInternal accesibility level. Very confusing and undocumented. But the bottom line is that "private protected" field is only visible to derived classes in the same assembly and any assembly that is a friend to it.
In reply to: 256224446 [](ancestors = 256224446)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand, but it is confusing. Perhaps we can update the BestFriend attribute xml docs to say
///
/// Intended to be applied to types and members marked as internal OR PRIVATE PROTECTED to indicate that friend access of ...
In reply to: 256221035 [](ancestors = 256221035)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change "marked as internal" to "with internal scope," but I do not intend to push that change unless I have to update the PR anyway for some other reason. The root of your confusion, as far as I can tell, is not realizing that private protected
implied the internal scope in its definition. I agree it's confusing, but the XML docs here are definitely not the place to resolve this confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, merged dotnet/dotnet-api-docs#1844 to clarify .NET docs.
Codecov Report
@@ Coverage Diff @@
## master #2526 +/- ##
==========================================
+ Coverage 71.25% 71.25% +<.01%
==========================================
Files 797 797
Lines 141270 141271 +1
Branches 16112 16115 +3
==========================================
+ Hits 100659 100665 +6
+ Misses 36151 36148 -3
+ Partials 4460 4458 -2
|
@@ -280,65 +263,67 @@ protected override Delegate MakeGetter(Row input, int iinfo, Func<int, bool> act | |||
/// The PlattCalibratorEstimator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PlattCalibratorEstimator. [](start = 8, length = 29)
/// <summary>
/// Creates Platt calibrator by fitting a logistic regression model to a classifier's scores
///</summary>
///<remarks>
/// Fits two parameters slope and offset for p(x) = 1 / (1 + exp(-slope * x + offset). Where P is probability and x is score.
///<remarks>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm. I'll just fix the words a little, but this needs a lot more work than just this. I'm pretty sure those remarks are going to be utterly incomprehensible when written out also. I'm not quite equal to that task.
} | ||
|
||
// Factory method for SignatureLoadModel. | ||
internal NaiveCalibratorTransformer(IHostEnvironment env, ModelLoadContext ctx) | ||
: base(env, ctx, LoadName) | ||
{ | ||
|
||
} | ||
} | ||
|
||
/// <summary> | ||
/// The PavCalibratorEstimator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PavCalibratorEstimator [](start = 12, length = 22)
/// <summary>
/// Estimator to fit <see cref="PavCalibrator"/>
///</summary>
// Licensed to the .NET Foundation under one or more agreements. | ||
// The .NET Foundation licenses this file to you under the MIT license. | ||
// See the LICENSE file in the project root for more information. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we put this header to samples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
namespace Microsoft.ML.Calibrator
Just curious, are you happy with this namespace? Don't you think it should be ML.Training?
Maybe? That seems plausible enough. Or its own Microsoft.ML.Training.Calibrator
namespace. Who knows. I think @sfilipi is working on the 'master plan' for namespaces, but I could be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea why it put my reply here when I was replying to your other thing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's go with two words (same with PAV also)... the information that the model being trained is this or that specific type is something they can just get from the type signature. My favorite constructor documentation is The documentation in this file is really poor. Maybe we'll be able to fix it up later. |
* Internalize infrastructure only interface ICalibratorTrainer. * Update calibrator estimators so they are a suitable replacement for calibrator trainers in the public surface, e.g., no longer take IPredictor.
f00b206
to
ac3a9e0
Compare
if this is public, does the other one need to be public as well? Do the trainers call that one? Refers to: src/Microsoft.ML.Data/Prediction/Calibrator.cs:936 in ac3a9e0. [](commit_id = ac3a9e0, deletion_comment = False) |
LabelColumn = TrainerUtils.MakeBoolScalarLabel(labelColumn); | ||
else | ||
env.CheckParam(!calibratorTrainer.NeedsTraining, nameof(labelColumn), "For trained calibrators, " + nameof(labelColumn) + " must be specified."); | ||
ScoreColumn = TrainerUtils.MakeR4ScalarColumn(scoreColumn); // Do we fanthom this being named anything else (renaming column)? Complete metadata? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Do we fanthom this being named anything else (renaming column)? Complete metadata? [](start = 72, length = 85)
want to keep this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean the comment? I didn't quite know what the original author had in mind, so I just fixed the obvious typo and moved on.
/// <param name="featureColumn">The feature column name.</param> | ||
/// <param name="weightColumn">The weight column name.</param> | ||
/// <param name="scoreColumn">The score column name. This is consumed both when this estimator | ||
/// is fit and when the estimator is consumed.</param> | ||
public FixedPlattCalibratorEstimator(IHostEnvironment env, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public [](start = 8, length = 6)
are we leaving the ctors of the calibratorEstimators public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sfilipi, that's a good point. I don't see anything other than the other utility method that calls it. (Maybe Nimus or the internal tools, who knows what they do.) This one is kind of public, but it's a public member on an internal class, so I'm not too excited about it? Especially since I have that big, big green squash and merge button... tempting me... |
@@ -75,7 +75,7 @@ public static void Calibration() | |||
|
|||
// Let's train a calibrator estimator on this scored dataset. The trained calibrator estimator produces a transformer | |||
// that can transform the scored data by adding a new column names "Probability". | |||
var calibratorEstimator = new PlattCalibratorEstimator(mlContext, model, "Sentiment", "Features"); | |||
var calibratorEstimator = new PlattCalibratorEstimator(mlContext, "Sentiment", "Score"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var calibratorEstimator = new PlattCalibratorEstimator(mlContext, "Sentiment", "Score"); [](start = 12, length = 88)
Can you give a bit more detail what's happening here? The SDCA trainer produces a transformer, which is calibrated, then you pull the original model out and recalibrate it? If so, this sample is a bit confusing because it's not necessarily something we would need to do, at least not as outlined here. I would at least spell out the steps precisely.
This is a bit weird that we can do useful things with the debugger. I'm a bit worried people will use the debugger utils in production code, not for debugging. Refers to: docs/samples/Microsoft.ML.Samples/Dynamic/Calibrator.cs:98 in ac3a9e0. [](commit_id = ac3a9e0, deletion_comment = False) |
{ | ||
} | ||
|
||
public interface ICalibratorTrainer | ||
/// <summary> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
///
[](start = 4, length = 13)
Technically, this clarification is a <remark> and <summary> is a short description of what it does.
@@ -842,6 +864,64 @@ internal static class CalibratorUtils | |||
return CreateCalibratedPredictor(env, (IPredictorProducing<float>)predictor, trainedCalibrator); | |||
} | |||
|
|||
public static ICalibrator TrainCalibrator(IHostEnvironment env, IChannel ch, ICalibratorTrainer caliTrainer, IDataView scored, string labelColumn, string scoreColumn, string weightColumn = null, int maxRows = _maxCalibrationExamples) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string labelColumn, string scoreColumn, string weightColumn = null [](start = 135, length = 66)
Defaults?
env.CheckValue(ch, nameof(ch)); | ||
ch.CheckValue(scored, nameof(scored)); | ||
ch.CheckValue(caliTrainer, nameof(caliTrainer)); | ||
ch.CheckParam(!caliTrainer.NeedsTraining || !string.IsNullOrWhiteSpace(labelColumn), nameof(labelColumn), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!caliTrainer.NeedsTraining || !string.IsNullOrWhiteSpace(labelColumn) [](start = 26, length = 69)
Nit: This is a bit confusing to read. Maybe ! the And version?
Oh, nevermind. Saw that you already merged this :) |
Fixes #2515, contributes towards #1871 and indirectly towards #2251.
Note that I did not add a calibrator catalog, since:
Calibrator estimators now are configured by the score/label/optionally weight column, except for the fixed Platt estimator, which only takes score (since it is not trained).
Note that ultimately the trainer estimator constructors will be internal, pending #1871.