Skip to content

Update calibrator estimators to be more suitable. #2526

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 13, 2019

Conversation

TomFinley
Copy link
Contributor

Fixes #2515, contributes towards #1871 and indirectly towards #2251.

  • Internalize infrastructure only interface ICalibratorTrainer.
  • Update calibrator estimators so they are a suitable replacement for calibrator trainers in the public surface, e.g., no longer take IPredictor.

Note that I did not add a calibrator catalog, since:

  1. I believe @sfilipi already has assigned herself the issue Follow up on Calibrator estimators #1871 to do so, and
  2. Real reason, I am lazy! 😄

Calibrator estimators now are configured by the score/label/optionally weight column, except for the fixed Platt estimator, which only takes score (since it is not trained).

Note that ultimately the trainer estimator constructors will be internal, pending #1871.

@TomFinley TomFinley added the API Issues pertaining the friendly API label Feb 13, 2019
@TomFinley TomFinley self-assigned this Feb 13, 2019
@@ -20,7 +19,8 @@ namespace Microsoft.ML
/// </summary>
public abstract class TrainCatalogBase
{
protected internal readonly IHost Host;
[BestFriend]
private protected readonly IHost Host;
Copy link
Contributor

@glebuk glebuk Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private [](start = 8, length = 7)

is this attribute valid on a private field? ("A private protected member is accessible by types derived from the containing class, but only within its containing assembly.") #ByDesign

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a private field. See the modifier private protected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field seems to be also used in RecommenderCatalog.cs (which is in a separate assembly Microsoft.ML.Recommender).

So the [BestFriend] attribute is valid for both internal and private protected access modifiers. Am i right ?

Copy link
Contributor

@glebuk glebuk Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a very quirky behaviour of C# 7.2 The reference doc does not do us any favors when saying that "private protected is ...only within its containing assembly." Apparently they treat private protected as "internal" from the point of view of the InternalsVisibleTo attribute. This is actual behaviour and under the hood the CLR marks the private protected as protectedAndInternal accesibility level. Very confusing and undocumented. But the bottom line is that "private protected" field is only visible to derived classes in the same assembly and any assembly that is a friend to it.


In reply to: 256224446 [](ancestors = 256224446)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand, but it is confusing. Perhaps we can update the BestFriend attribute xml docs to say
///


/// Intended to be applied to types and members marked as internal OR PRIVATE PROTECTED to indicate that friend access of ...


In reply to: 256221035 [](ancestors = 256221035)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change "marked as internal" to "with internal scope," but I do not intend to push that change unless I have to update the PR anyway for some other reason. The root of your confusion, as far as I can tell, is not realizing that private protected implied the internal scope in its definition. I agree it's confusing, but the XML docs here are definitely not the place to resolve this confusion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, merged dotnet/dotnet-api-docs#1844 to clarify .NET docs.

@codecov
Copy link

codecov bot commented Feb 13, 2019

Codecov Report

Merging #2526 into master will increase coverage by <.01%.
The diff coverage is 90.55%.

@@            Coverage Diff             @@
##           master    #2526      +/-   ##
==========================================
+ Coverage   71.25%   71.25%   +<.01%     
==========================================
  Files         797      797              
  Lines      141270   141271       +1     
  Branches    16112    16115       +3     
==========================================
+ Hits       100659   100665       +6     
+ Misses      36151    36148       -3     
+ Partials     4460     4458       -2
Flag Coverage Δ
#Debug 71.25% <90.55%> (ø) ⬆️
#production 67.58% <87.75%> (ø) ⬆️
#test 85.36% <100%> (ø) ⬆️

@Ivanidzo4ka
Copy link
Contributor

namespace Microsoft.ML.Calibrator

Just curious, are you happy with this namespace? Don't you think it should be ML.Training?


Refers to: src/Microsoft.ML.Data/Prediction/CalibratorCatalog.cs:25 in f00b206. [](commit_id = f00b206, deletion_comment = False)

@@ -280,65 +263,67 @@ protected override Delegate MakeGetter(Row input, int iinfo, Func<int, bool> act
/// The PlattCalibratorEstimator.
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PlattCalibratorEstimator. [](start = 8, length = 29)

/// <summary>
/// Creates Platt calibrator by fitting a logistic regression model to a classifier's scores
///</summary>
///<remarks>
/// Fits two parameters slope and offset for p(x) = 1 / (1 + exp(-slope * x + offset). Where P is probability and x is score.
///<remarks>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. I'll just fix the words a little, but this needs a lot more work than just this. I'm pretty sure those remarks are going to be utterly incomprehensible when written out also. I'm not quite equal to that task.

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Feb 13, 2019

/// The naive binning-based calibratorEstimator.

two words? or maybe

/// <summary>
/// Estimator  to fit <see cref="NaiveCalibrator"/>
///</summary>

Refers to: src/Microsoft.ML.Data/Prediction/CalibratorCatalog.cs:350 in f00b206. [](commit_id = f00b206, deletion_comment = False)

}

// Factory method for SignatureLoadModel.
internal NaiveCalibratorTransformer(IHostEnvironment env, ModelLoadContext ctx)
: base(env, ctx, LoadName)
{

}
}

/// <summary>
/// The PavCalibratorEstimator.
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PavCalibratorEstimator [](start = 12, length = 22)

/// <summary>
/// Estimator  to fit <see cref="PavCalibrator"/>
///</summary>

// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we put this header to samples

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

namespace Microsoft.ML.Calibrator

Just curious, are you happy with this namespace? Don't you think it should be ML.Training?

Maybe? That seems plausible enough. Or its own Microsoft.ML.Training.Calibrator namespace. Who knows. I think @sfilipi is working on the 'master plan' for namespaces, but I could be wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea why it put my reply here when I was replying to your other thing...

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@TomFinley
Copy link
Contributor Author

/// The naive binning-based calibratorEstimator.

two words? or maybe

/// <summary>
/// Estimator  to fit <see cref="NaiveCalibrator"/>
///</summary>

Refers to: src/Microsoft.ML.Data/Prediction/CalibratorCatalog.cs:350 in f00b206. [](commit_id = f00b206, deletion_comment = False)

Let's go with two words (same with PAV also)... the information that the model being trained is this or that specific type is something they can just get from the type signature.

My favorite constructor documentation is public Foo() is <summary>Constructor for <see cref="Foo"/>.</summary>. 😆 Oh thank you doc writer, my cup runneth over.

The documentation in this file is really poor. Maybe we'll be able to fix it up later.

* Internalize infrastructure only interface ICalibratorTrainer.
* Update calibrator estimators so they are a suitable replacement
  for calibrator trainers in the public surface, e.g., no longer
  take IPredictor.
@sfilipi
Copy link
Member

sfilipi commented Feb 13, 2019

    public static ICalibrator TrainCalibrator(IHostEnvironment env, IChannel ch, ICalibratorTrainer caliTrainer, IPredictor predictor, RoleMappedData data, int maxRows = _maxCalibrationExamples)

if this is public, does the other one need to be public as well? Do the trainers call that one?


Refers to: src/Microsoft.ML.Data/Prediction/Calibrator.cs:936 in ac3a9e0. [](commit_id = ac3a9e0, deletion_comment = False)

LabelColumn = TrainerUtils.MakeBoolScalarLabel(labelColumn);
else
env.CheckParam(!calibratorTrainer.NeedsTraining, nameof(labelColumn), "For trained calibrators, " + nameof(labelColumn) + " must be specified.");
ScoreColumn = TrainerUtils.MakeR4ScalarColumn(scoreColumn); // Do we fanthom this being named anything else (renaming column)? Complete metadata?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Do we fanthom this being named anything else (renaming column)? Complete metadata? [](start = 72, length = 85)

want to keep this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean the comment? I didn't quite know what the original author had in mind, so I just fixed the obvious typo and moved on.

/// <param name="featureColumn">The feature column name.</param>
/// <param name="weightColumn">The weight column name.</param>
/// <param name="scoreColumn">The score column name. This is consumed both when this estimator
/// is fit and when the estimator is consumed.</param>
public FixedPlattCalibratorEstimator(IHostEnvironment env,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public [](start = 8, length = 6)

are we leaving the ctors of the calibratorEstimators public?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. However, until #1871 is completed with the proper catalog for calibrators added, this should remain public. Incidentally this was what I meant by this sentence at the end of my PR description:

Note that ultimately the trainer estimator constructors will be internal, pending #1871.

Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@TomFinley
Copy link
Contributor Author

    public static ICalibrator TrainCalibrator(IHostEnvironment env, IChannel ch, ICalibratorTrainer caliTrainer, IPredictor predictor, RoleMappedData data, int maxRows = _maxCalibrationExamples)

if this is public, does the other one need to be public as well? Do the trainers call that one?

Hi @sfilipi, that's a good point. I don't see anything other than the other utility method that calls it. (Maybe Nimus or the internal tools, who knows what they do.)

This one is kind of public, but it's a public member on an internal class, so I'm not too excited about it? Especially since I have that big, big green squash and merge button... tempting me...

@TomFinley TomFinley merged commit 1663bc4 into dotnet:master Feb 13, 2019
@rogancarr
Copy link
Contributor

public class CalibratorExample

Calibrator


Refers to: docs/samples/Microsoft.ML.Samples/Dynamic/Calibrator.cs:11 in ac3a9e0. [](commit_id = ac3a9e0, deletion_comment = False)

@rogancarr
Copy link
Contributor

    public static void Calibration()

Example


Refers to: docs/samples/Microsoft.ML.Samples/Dynamic/Calibrator.cs:13 in ac3a9e0. [](commit_id = ac3a9e0, deletion_comment = False)

@@ -75,7 +75,7 @@ public static void Calibration()

// Let's train a calibrator estimator on this scored dataset. The trained calibrator estimator produces a transformer
// that can transform the scored data by adding a new column names "Probability".
var calibratorEstimator = new PlattCalibratorEstimator(mlContext, model, "Sentiment", "Features");
var calibratorEstimator = new PlattCalibratorEstimator(mlContext, "Sentiment", "Score");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var calibratorEstimator = new PlattCalibratorEstimator(mlContext, "Sentiment", "Score"); [](start = 12, length = 88)

Can you give a bit more detail what's happening here? The SDCA trainer produces a transformer, which is calibrated, then you pull the original model out and recalibrate it? If so, this sample is a bit confusing because it's not necessarily something we would need to do, at least not as outlined here. I would at least spell out the steps precisely.

@rogancarr
Copy link
Contributor

rogancarr commented Feb 13, 2019

    private static void PrintRowViewValues(Data.DataDebuggerPreview data)

This is a bit weird that we can do useful things with the debugger. I'm a bit worried people will use the debugger utils in production code, not for debugging.


Refers to: docs/samples/Microsoft.ML.Samples/Dynamic/Calibrator.cs:98 in ac3a9e0. [](commit_id = ac3a9e0, deletion_comment = False)

@rogancarr
Copy link
Contributor

        //Preview of finalData.RowView

Expected output:


Refers to: docs/samples/Microsoft.ML.Samples/Dynamic/Calibrator.cs:88 in ac3a9e0. [](commit_id = ac3a9e0, deletion_comment = False)

@rogancarr
Copy link
Contributor

        // Score - 0.458968    Probability 0.4670409

No break, two spaces indent


Refers to: docs/samples/Microsoft.ML.Samples/Dynamic/Calibrator.cs:90 in ac3a9e0. [](commit_id = ac3a9e0, deletion_comment = False)

{
}

public interface ICalibratorTrainer
/// <summary>
Copy link
Contributor

@rogancarr rogancarr Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

///

[](start = 4, length = 13)

Technically, this clarification is a <remark> and <summary> is a short description of what it does.

@@ -842,6 +864,64 @@ internal static class CalibratorUtils
return CreateCalibratedPredictor(env, (IPredictorProducing<float>)predictor, trainedCalibrator);
}

public static ICalibrator TrainCalibrator(IHostEnvironment env, IChannel ch, ICalibratorTrainer caliTrainer, IDataView scored, string labelColumn, string scoreColumn, string weightColumn = null, int maxRows = _maxCalibrationExamples)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string labelColumn, string scoreColumn, string weightColumn = null [](start = 135, length = 66)

Defaults?

env.CheckValue(ch, nameof(ch));
ch.CheckValue(scored, nameof(scored));
ch.CheckValue(caliTrainer, nameof(caliTrainer));
ch.CheckParam(!caliTrainer.NeedsTraining || !string.IsNullOrWhiteSpace(labelColumn), nameof(labelColumn),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!caliTrainer.NeedsTraining || !string.IsNullOrWhiteSpace(labelColumn) [](start = 26, length = 69)

Nit: This is a bit confusing to read. Maybe ! the And version?

@rogancarr
Copy link
Contributor

Oh, nevermind. Saw that you already merged this :)

@TomFinley TomFinley deleted the NoPredictorCali branch February 20, 2019 23:15
@ghost ghost locked as resolved and limited conversation to collaborators Mar 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Calibrator estimators as successor to ICalibratorTrainer
6 participants