-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Cleanup calibrators #2601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup calibrators #2601
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2601 +/- ##
=========================================
Coverage ? 71.5%
=========================================
Files ? 800
Lines ? 142059
Branches ? 16148
=========================================
Hits ? 101580
Misses ? 36008
Partials ? 4471
|
@@ -1180,7 +1180,7 @@ private void SaveCore(ModelSaveContext ctx) | |||
ctx.Writer.Write(sizeof(float)); | |||
ctx.Writer.Write(BinSize); | |||
ctx.Writer.Write(Min); | |||
ctx.Writer.WriteSingleArray(BinProbs); | |||
ctx.Writer.WriteSingleArray(BinProbs as float[]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as float[] [](start = 48, length = 11)
Let's not please... let's do a backing private field of float[]
that we actually operate over. This will have a couple of advantages, most notably not having this bad code smell, but also incidentally making access to this structure much faster since it won't have to work through the virtual method table on every single usage.
Edit: Looked at the rest of the PR, everything else looks great except for this #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make completely clear.
We will have:
public IReadOnlyList<float> BinProbs => _binProbs;
private readonly float[] _binProbs;
And for loading and saving we will use _binProbs
Does it sound good? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And other operators too, yes. The cast is ugly, but working through the interface on something meant to process every single example will unnecessarily work through the virtual table.
Similar to usual reasons of, say, declaring something as IList
if you know it is List
, etc. etc. etc. #Resolved
@@ -1205,11 +1207,6 @@ internal static int GetBinIdx(float output, float min, float binSize, int numBin | |||
return binIdx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh. This code makes my eyes bleed but if no one has complained after all these years we can probably let it stand for a bit. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Ivanidzo4ka !!
@@ -1218,8 +1215,91 @@ public string GetSummary() | |||
[BestFriend] | |||
internal abstract class CalibratorTrainerBase : ICalibratorTrainer | |||
{ | |||
public sealed class DataStore : IEnumerable<DataStore.DataItem> | |||
{ | |||
public readonly struct DataItem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public readonly struct DataItem [](start = 12, length = 31)
Really vague name for a class visible throughout the assembly. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I see it is a nested class inside the internal CalibratorTrainerBase
? So someone would need to have referenced CalibratorTrainerBase
to see it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A) it's part of internal class.
B) I'm making it private protected.
C) I'm just moving this code around to hide it, I don't have much intent in improving it's right now
In reply to: 258183106 [](ancestors = 258183106)
@@ -1218,8 +1215,91 @@ public string GetSummary() | |||
[BestFriend] | |||
internal abstract class CalibratorTrainerBase : ICalibratorTrainer | |||
{ | |||
public sealed class DataStore : IEnumerable<DataStore.DataItem> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public sealed class DataStore [](start = 8, length = 29)
Really vague name. Can we think of something more precise?
private bool _dataSorted; | ||
|
||
public DataStore() | ||
: this(1000000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1000000 [](start = 23, length = 7)
Any reason?
{ | ||
} | ||
|
||
public DataStore(int capacity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public DataStore(int capacity) [](start = 12, length = 30)
Maybe instead of two constructors, we just do int capacity=defaultValue
?
|
||
_capacity = capacity; | ||
_data = new DataItem[Math.Min(4, capacity)]; | ||
// REVIEW: Horrifying. At a point when we have the IHost stuff plumbed through |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Horrifying [](start = 27, length = 10)
The horror
|
||
public void AddToStore(float score, bool isPositive, float weight) | ||
{ | ||
// Can't calibrate NaN scores. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Can't calibrate NaN scores. [](start = 15, length = 31)
Isn't it BAD to get NaN scores?
@@ -1485,15 +1565,21 @@ private static VersionInfo GetVersionInfo() | |||
|
|||
private readonly IHost _host; | |||
|
|||
/// <summary> | |||
/// Slope value for this calibrator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slope value for this calibrator. [](start = 12, length = 32)
Maybe use the phrases bias
and weight
instead of Slope
and Offset
? This is more consistent with our public API.
@@ -1556,14 +1642,15 @@ private void SaveCore(ModelSaveContext ctx) | |||
} | |||
} | |||
|
|||
/// <summary> Given a classifier output, produce the probability.</summary> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/ <summary [](start = 10, length = 10)
break pls
/// <item><description><see cref="Values"/>[0], if x < <see cref="Mins"/>[0]</description></item> | ||
/// <item><description><see cref="Values"/>[n], if x > <see cref="Maxes"/>[n]</description></item> | ||
///</list> | ||
/// </remarks> | ||
public sealed class PavCalibrator : ICalibrator, ICanSaveInBinaryFormat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not too hard, can you provide a link to any resources about this kind of calibration? You could also file an issue and assign it to the docs folks.
@@ -1851,6 +1947,7 @@ private void SaveCore(ModelSaveContext ctx) | |||
_host.CheckDecode(valuePrev <= 1); | |||
} | |||
|
|||
/// <summary> Given a classifier output, produce the probability.</summary> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[](start = 12, length = 9)
break
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the style was to break the xml tags onto their own lines, but I could be mistaken.
In reply to: 258188999 [](ancestors = 258188999,258187676)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few nits, although I realize it's nits on copy/pasted code, so feel free to defer those to future maintenance fixes. |
Fixes #2589