-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Calibrators catalog introduction #2766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calibrators catalog introduction #2766
Conversation
/// </summary> | ||
public CalibratorsSubCatalog Calibrators { get; } | ||
|
||
public sealed class CalibratorsSubCatalog : CatalogInstantiatorBase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CalibratorsSubCatalog [](start = 28, length = 21)
If I look at other classes derived from CatalogInstantiatorBase
, I don't see anything like SubCatalog
. How about just CalibratorsCatalog
? Why do we need the sub in there?
Don't forget to like and sub. #Resolved
|
||
namespace Microsoft.ML | ||
{ | ||
public static class CalibratorsCatalog |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CalibratorsCatalog [](start = 24, length = 18)
So, maybe we can just have CalibratorsCatalog
be a class, with actual methods, that is, not extension methods, given that it's in the same assembly. We sometimes do extension methods, but we often also put methods directly on the thing when the "scope" of what is to be done is fairly limited, or there isn't some great question of broader consistency, or somesuch. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by saying that you mean something like mlContext.Calirbrators.Platt()
or you mean CalibratorsCatalog.Platt(MlContext context, ...)
Second feels outlier to way how we currently handle everything else. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. I am saying that these should be actual methods on the catalog class itself. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me be more explicit. I could write the following code:
public class Foo { }
public static class FooExtensions
{
public static void Bar(this Foo foo, ...) { ... }
public static void Biz(this Foo foo, ...) { ... }
public static void Blam(this Foo foo, ...) { ... }
}
As in fact we do when we want to have separation between multiple things. But, in this particular case, everything we'd want to do with this catalog lives in the same assembly where the assembly is. So I could have written the following.
public class Foo
{
public void Bar(...) { ... }
public void Biz(...) { ... }
public void Blam(...) { ... }
}
I would argue the second is easier and more clear. I understand the reasons why we sometimes phrase these catalog operations as extension methods, what I'm saying is, in this case, it is not necessary or helpful to do so. If you disagree that's fine I guess, it just seems strange. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I'm sorry you had to write down this explanation for me. I've been slightly tired lately and with all other distractions it looks like I lost ability to focus and understand other people. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha ha, that's fine Ivan. I remember yesterday when I was so exhausted and sick I confused the heck out of people with what were (if I look back on them) some insane ramblings. As I perhaps do even when I am whole and well, though I hope not. 😄 #Resolved
Codecov Report
@@ Coverage Diff @@
## master #2766 +/- ##
==========================================
- Coverage 71.7% 71.66% -0.04%
==========================================
Files 809 809
Lines 142491 142433 -58
Branches 16116 16120 +4
==========================================
- Hits 102173 102078 -95
- Misses 35885 35923 +38
+ Partials 4433 4432 -1
|
/// </summary> | ||
public CalibratorsCatalog Calibrators { get; } | ||
|
||
public sealed class CalibratorsCatalog : CatalogInstantiatorBase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CalibratorsCatalog [](start = 28, length = 18)
i need xml doc too. #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest none of other CatalogInstantiatorBase
inherited classes has one. And I don't plan to fix it right now. Documentation can wait till next month.
In reply to: 261052357 [](ancestors = 261052357)
{ | ||
} | ||
/// <summary> | ||
/// Adds probability column by training naive binning-based calbirator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
calbirator [](start = 72, length = 10)
spelling here and below. #Resolved
{ | ||
} | ||
/// <summary> | ||
/// Adds probability column by training naive binning-based calbirator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naive binning-based [](start = 52, length = 19)
can we link to any online resources? @yaeldekel, does this apply: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4410090/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt they're the same. That article seems to have been published in 2015, whereas this calibrator has AFAIK existed at least since 2013, and probably goes back a fair bit further than that. I imagine Misha didn't just conjure up the method out of absolute thin air, but whatever work he based the implementation on has probably be lost to the sands of time.
In reply to: 261052618 [](ancestors = 261052618)
return new NaiveCalibratorEstimator(Owner.Environment, labelColumnName, scoreColumnName); | ||
} | ||
/// <summary> | ||
/// Adds probability column by training platt calbirator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
training platt [](start = 43, length = 14)
https://en.wikipedia.org/wiki/Platt_scaling link #Resolved
// Let's train a calibrator estimator on this scored dataset. The trained calibrator estimator produces a transformer | ||
// that can transform the scored data by adding a new column names "Probability". | ||
var calibratorEstimator = mlContext.BinaryClassification.Calibrators.Platt(); | ||
var calibratorTransformer = calibratorEstimator.Fit(scoredData); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scoredData [](start = 64, length = 10)
we should probably train the calibrator on another dataset, not the testSet, in those samples. So have training set, testing set, and a calibrator training set
using System; | ||
using System.Linq; | ||
|
||
namespace Microsoft.ML.Samples.Dynamic.Trainers.BinaryClassification.Calibrators |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Microsoft.ML.Samples.Dynamic.Trainers.BinaryClassification.Calibrators [](start = 10, length = 70)
please Microsoft.ML.Samples.Dynamic
no need to be too fancy #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. I told same thing to @shmoradims and he told me I'm silly. I don't want be part of any fights regarding samples and how we organize them. So I would just follow what he is doing.
In reply to: 261054779 [](ancestors = 261054779)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomFinley would be nice if you can take a look one more time, thx! |
|
||
} | ||
|
||
/// <summary> | ||
/// The <see cref="ITransformer"/> implementation obtained by training a <see cref="PavCalibratorEstimator"/> | ||
/// The <see cref="ITransformer"/> implementation obtained by training a <see cref="PoolAdjacentViolatorsCalibratorEstimator"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PoolAdjacentViolatorsCalibratorEstimator [](start = 88, length = 40)
What is the difference between PoolAdjacentViolators
and PairAdjacentViolators
? They both seem to be an expansion of the single acronym that used to just be PAV, so I'm wondering if this distinction is on purpose, or if it's a mistake or something. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I see we have this PAV calibrator that are I guess the model parameters from pair-adjacent calibrator transformers which are in turn trained from pool adjacent calibrator estimators? Right? I kind of feel like consistency here would be less confusing. #Resolved Refers to: src/Microsoft.ML.Data/Prediction/Calibrator.cs:1801 in 553aeb1. [](commit_id = 553aeb1, deletion_comment = False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Ivanidzo4ka !
should fix #1871