Skip to content

Calibrators catalog introduction #2766

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Mar 5, 2019

Conversation

Ivanidzo4ka
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka commented Feb 27, 2019

should fix #1871

/// </summary>
public CalibratorsSubCatalog Calibrators { get; }

public sealed class CalibratorsSubCatalog : CatalogInstantiatorBase
Copy link
Contributor

@TomFinley TomFinley Feb 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CalibratorsSubCatalog [](start = 28, length = 21)

If I look at other classes derived from CatalogInstantiatorBase, I don't see anything like SubCatalog. How about just CalibratorsCatalog? Why do we need the sub in there?

Don't forget to like and sub. #Resolved


namespace Microsoft.ML
{
public static class CalibratorsCatalog
Copy link
Contributor

@TomFinley TomFinley Feb 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CalibratorsCatalog [](start = 24, length = 18)

So, maybe we can just have CalibratorsCatalog be a class, with actual methods, that is, not extension methods, given that it's in the same assembly. We sometimes do extension methods, but we often also put methods directly on the thing when the "scope" of what is to be done is fairly limited, or there isn't some great question of broader consistency, or somesuch. #Resolved

Copy link
Contributor Author

@Ivanidzo4ka Ivanidzo4ka Feb 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by saying that you mean something like mlContext.Calirbrators.Platt() or you mean CalibratorsCatalog.Platt(MlContext context, ...)
Second feels outlier to way how we currently handle everything else. #Resolved

Copy link
Contributor

@TomFinley TomFinley Feb 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I am saying that these should be actual methods on the catalog class itself. #Resolved

Copy link
Contributor

@TomFinley TomFinley Feb 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me be more explicit. I could write the following code:

public class Foo { }

public static class FooExtensions
{
    public static void Bar(this Foo foo, ...) { ... }
    public static void Biz(this Foo foo, ...) { ... }
    public static void Blam(this Foo foo, ...) { ... }
}

As in fact we do when we want to have separation between multiple things. But, in this particular case, everything we'd want to do with this catalog lives in the same assembly where the assembly is. So I could have written the following.

public class Foo
{
    public void Bar(...) { ... }
    public void Biz(...) { ... }
    public void Blam(...) { ... }
}

I would argue the second is easier and more clear. I understand the reasons why we sometimes phrase these catalog operations as extension methods, what I'm saying is, in this case, it is not necessary or helpful to do so. If you disagree that's fine I guess, it just seems strange. #Resolved

Copy link
Contributor Author

@Ivanidzo4ka Ivanidzo4ka Feb 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I'm sorry you had to write down this explanation for me. I've been slightly tired lately and with all other distractions it looks like I lost ability to focus and understand other people. #Resolved

Copy link
Contributor

@TomFinley TomFinley Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha ha, that's fine Ivan. I remember yesterday when I was so exhausted and sick I confused the heck out of people with what were (if I look back on them) some insane ramblings. As I perhaps do even when I am whole and well, though I hope not. 😄 #Resolved

@codecov
Copy link

codecov bot commented Feb 28, 2019

Codecov Report

Merging #2766 into master will decrease coverage by 0.03%.
The diff coverage is 35%.

@@            Coverage Diff             @@
##           master    #2766      +/-   ##
==========================================
- Coverage    71.7%   71.66%   -0.04%     
==========================================
  Files         809      809              
  Lines      142491   142433      -58     
  Branches    16116    16120       +4     
==========================================
- Hits       102173   102078      -95     
- Misses      35885    35923      +38     
+ Partials     4433     4432       -1
Flag Coverage Δ
#Debug 71.66% <35%> (-0.04%) ⬇️
#production 67.9% <31.57%> (-0.03%) ⬇️
#test 85.86% <100%> (-0.04%) ⬇️
Impacted Files Coverage Δ
src/Microsoft.ML.Data/Prediction/Calibrator.cs 72.67% <ø> (ø) ⬆️
.../Microsoft.ML.Data/Prediction/CalibratorCatalog.cs 90.71% <100%> (ø) ⬆️
...ML.Tests/TrainerEstimators/CalibratorEstimators.cs 100% <100%> (ø) ⬆️
src/Microsoft.ML.Data/TrainCatalog.cs 85.71% <23.52%> (-4.37%) ⬇️
src/Microsoft.ML.TensorFlow/TensorflowCatalog.cs 83.33% <0%> (-16.67%) ⬇️
src/Microsoft.ML.Data/Transforms/CatalogUtils.cs 85.71% <0%> (-14.29%) ⬇️
...rosoft.ML.TensorFlow/TensorFlow/TensorflowUtils.cs 57.92% <0%> (-7.45%) ⬇️
src/Microsoft.ML.Transforms/GcnTransform.cs 75.78% <0%> (-1.8%) ⬇️
...c/Microsoft.ML.CpuMath/CpuMathUtils.netstandard.cs 93.4% <0%> (-0.35%) ⬇️
...oft.ML.Transforms/Text/TextFeaturizingEstimator.cs 88.73% <0%> (-0.28%) ⬇️
... and 48 more

/// </summary>
public CalibratorsCatalog Calibrators { get; }

public sealed class CalibratorsCatalog : CatalogInstantiatorBase
Copy link
Member

@sfilipi sfilipi Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CalibratorsCatalog [](start = 28, length = 18)

i need xml doc too. #Pending

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest none of other CatalogInstantiatorBase inherited classes has one. And I don't plan to fix it right now. Documentation can wait till next month.


In reply to: 261052357 [](ancestors = 261052357)

{
}
/// <summary>
/// Adds probability column by training naive binning-based calbirator.
Copy link
Member

@sfilipi sfilipi Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calbirator [](start = 72, length = 10)

spelling here and below. #Resolved

{
}
/// <summary>
/// Adds probability column by training naive binning-based calbirator.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naive binning-based [](start = 52, length = 19)

can we link to any online resources? @yaeldekel, does this apply: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4410090/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt they're the same. That article seems to have been published in 2015, whereas this calibrator has AFAIK existed at least since 2013, and probably goes back a fair bit further than that. I imagine Misha didn't just conjure up the method out of absolute thin air, but whatever work he based the implementation on has probably be lost to the sands of time.


In reply to: 261052618 [](ancestors = 261052618)

return new NaiveCalibratorEstimator(Owner.Environment, labelColumnName, scoreColumnName);
}
/// <summary>
/// Adds probability column by training platt calbirator.
Copy link
Member

@sfilipi sfilipi Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

training platt [](start = 43, length = 14)

https://en.wikipedia.org/wiki/Platt_scaling link #Resolved

// Let's train a calibrator estimator on this scored dataset. The trained calibrator estimator produces a transformer
// that can transform the scored data by adding a new column names "Probability".
var calibratorEstimator = mlContext.BinaryClassification.Calibrators.Platt();
var calibratorTransformer = calibratorEstimator.Fit(scoredData);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scoredData [](start = 64, length = 10)

we should probably train the calibrator on another dataset, not the testSet, in those samples. So have training set, testing set, and a calibrator training set

using System;
using System.Linq;

namespace Microsoft.ML.Samples.Dynamic.Trainers.BinaryClassification.Calibrators
Copy link
Member

@sfilipi sfilipi Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Microsoft.ML.Samples.Dynamic.Trainers.BinaryClassification.Calibrators [](start = 10, length = 70)

please Microsoft.ML.Samples.Dynamic

no need to be too fancy #Pending

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. I told same thing to @shmoradims and he told me I'm silly. I don't want be part of any fights regarding samples and how we organize them. So I would just follow what he is doing.


In reply to: 261054779 [](ancestors = 261054779)

Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@Ivanidzo4ka
Copy link
Contributor Author

@TomFinley would be nice if you can take a look one more time, thx!

@Ivanidzo4ka Ivanidzo4ka requested review from yaeldekel and wschin March 4, 2019 00:32

}

/// <summary>
/// The <see cref="ITransformer"/> implementation obtained by training a <see cref="PavCalibratorEstimator"/>
/// The <see cref="ITransformer"/> implementation obtained by training a <see cref="PoolAdjacentViolatorsCalibratorEstimator"/>
Copy link
Contributor

@TomFinley TomFinley Mar 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PoolAdjacentViolatorsCalibratorEstimator [](start = 88, length = 40)

What is the difference between PoolAdjacentViolators and PairAdjacentViolators? They both seem to be an expansion of the single acronym that used to just be PAV, so I'm wondering if this distinction is on purpose, or if it's a mistake or something. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to Isotonic to avert confusion


In reply to: 262238146 [](ancestors = 262238146)

@TomFinley
Copy link
Contributor

TomFinley commented Mar 4, 2019

public sealed class PavCalibrator : ICalibrator, ICanSaveInBinaryFormat

So, I see we have this PAV calibrator that are I guess the model parameters from pair-adjacent calibrator transformers which are in turn trained from pool adjacent calibrator estimators? Right? I kind of feel like consistency here would be less confusing. #Resolved


Refers to: src/Microsoft.ML.Data/Prediction/Calibrator.cs:1801 in 553aeb1. [](commit_id = 553aeb1, deletion_comment = False)

Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Ivanidzo4ka !

@TomFinley TomFinley merged commit 86d5dda into dotnet:master Mar 5, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Follow up on Calibrator estimators
3 participants