-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Exposing the confusion matrix #3250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I am sending this as not final, because i wanted to get early feedback on how much to expose, and where to place it. I still have to document it and add tests before checking in. I am not adding support to build the weighted confusion matrix, or adding capabilities to sample a subset of classes for it thinking that it might be enough for V1. The infrastructure is in there, let me know if it is desired. #Resolved |
Codecov Report
@@ Coverage Diff @@
## master #3250 +/- ##
=========================================
Coverage ? 72.71%
=========================================
Files ? 808
Lines ? 145289
Branches ? 16237
=========================================
Hits ? 105640
Misses ? 35232
Partials ? 4417
|
{ | ||
public double[] PrecisionSums { get; } | ||
public double[] RecallSums { get; } | ||
public ReadOnlyMemory<char>[] Labels => PredictedLabelNames.ToArray(); //sefilipi: return as string[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ReadOnlyMemory [](start = 15, length = 14)
Remember, as before, that we provide access to the columns, but do not copy out the annotations data. Remember the discussion around having names.
Remember also, as we've found in #3101 that the labels may not even be text, or even exist. #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked a bit on how is the ConfusionMatrix idv build, and it looks like an indicator for the predictedLabels will always be there, and it will be of the ReadOnlyMemory<char>
type.
https://github.com/dotnet/machinelearning/blob/master/src/Microsoft.ML.Data/Evaluators/MulticlassClassificationEvaluator.cs#L222
https://github.com/dotnet/machinelearning/blob/master/src/Microsoft.ML.Data/Evaluators/BinaryClassifierEvaluator.cs#L361
The metrics aggregators have the class names in the ctor arguments, and if the class names are missing, those get generated from the size of the Score column.
https://github.com/dotnet/machinelearning/blob/master/src/Microsoft.ML.Data/Evaluators/MulticlassClassificationEvaluator.cs#L114
https://github.com/dotnet/machinelearning/blob/master/src/Microsoft.ML.Data/Evaluators/BinaryClassifierEvaluator.cs#L184
So given that, i left the labels as ReadOnlyMemory<char>
instead of providing the annotations of the Count column, and just renamed it to PredictedClassIndicators, since they are not going to be the class names all the time.
Let me know if I am missing something else.
In reply to: 273615151 [](ancestors = 273615151)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked a bit on how is the ConfusionMatrix idv build, and it looks like an indicator for the predictedLabels will always be there, and it will be of the
ReadOnlyMemory<char>
type.
Yes, but AFAIK that is mostly because it existed in order to provide a string to write to console, for the console application, back when this was purely a command line tool and not an API. So it has to have some string description, that doesn't mean that it's a good string description. I think in fact it'll just render the number as a string in case there is none. So... not something we want to expose as part of the API! #Closed
|
||
public override string ToString() => MetricWriter.GetConfusionTableAsString(this, false); | ||
|
||
public double GetCountForClassPair(string predictedLabel, string actualLabel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string predictedLabel, string actualLabel [](start = 43, length = 41)
As with all the remainder, we index slots by numeric index. They may optionally have a string label (or not), but that is not a requirement, and what the string label is (again, if it even exists) cannot be used as any sort of primary key. #Closed
Binary = binary; | ||
} | ||
|
||
public override string ToString() => MetricWriter.GetConfusionTableAsString(this, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MetricWriter.GetConfusionTableAsString [](start = 45, length = 38)
This is dangerous. Do we consider ToString
overrides if they change, breaking changes? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, it is a breaking change.
I'll create a separate API for it.
"Changing the returned values for a property, field, return or 'out' value, such as the value returned from ToString"
on
https://github.com/dotnet/corefx/blob/master/Documentation/coding-guidelines/breaking-change-rules.md
In reply to: 273615670 [](ancestors = 273615670)
{ | ||
public sealed class ConfusionMatrix | ||
{ | ||
public double[] PrecisionSums { get; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double[] [](start = 15, length = 8)
Arrays are mutable -- never ever make them part of the public surface of an API that's intended to be an immutable structure. #Closed
{ | ||
public double[] PrecisionSums { get; } | ||
public double[] RecallSums { get; } | ||
public ReadOnlyMemory<char>[] Labels => PredictedLabelNames.ToArray(); //sefilipi: return as string[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ToArray(); [](start = 68, length = 10)
THis should not exist anyway, but note that we generally there is an expectation in C# APIs that property accesses are "lightweight" in terms of their memory footprint. #Closed
@@ -2,6 +2,7 @@ | |||
// The .NET Foundation licenses this file to you under the MIT license. | |||
// See the LICENSE file in the project root for more information. | |||
|
|||
using System.Linq; | |||
using Microsoft.ML.Runtime; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I look through this file, it's unclear to me which of the changes require this. #Closed
var weightColumn = confusionDataView.Schema.GetColumnOrNull(MetricKinds.ColumnNames.Weight); | ||
if (getWeighted) | ||
host.CheckParam(weightColumn.HasValue, nameof(getWeighted), "There is no Weight column in the confusionMatrix data view."); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, is this something we anticipate might actually happen? We control the creation of this matrix, as far as I am aware, and this is a purely internal utility.
There are two possible answer to your questions, "no" in which case this should have been an assert (with the message as a comment, not a string!), or "yes" in which case this is something the user will potentially have to deal with, in which case having something that user might have a prayer of understanding would be required. (I do not view this message as actionable or even diagnostic in any way from a user's perspective.)
Same for other check below... #Closed
|
||
public double GetCountForClassPair(string predictedLabel, string actualLabel) | ||
{ | ||
int predictedLabelIndex = PredictedLabelNames.IndexOf(predictedLabel.AsMemory()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
predictedLabel.AsMemory() [](start = 66, length = 25)
This API is not what we want anyway (since it relies on strings to index when there's no requirement that the strings even exist much less be unique), but in point of interest, please note that this code absolutely will not work, at least, not reliably in and sensible or predictable fashion.
var m = "hello my friend".AsMemory();
var tokens = new List<ReadOnlyMemory<char>>();
tokens.Add(m.Slice(0, 5));
tokens.Add(m.Slice(6, 2));
tokens.Add(m.Slice(9, 6));
foreach (var token in tokens)
Console.WriteLine($"'{token}'");
Console.WriteLine(tokens.IndexOf("friend".AsMemory()));
Console.WriteLine(tokens.IndexOf(tokens[2]));
If compiled and run on .NET core 2.1, you get this output.
'hello'
'my'
'friend'
-1
2
So, relying on ReadOnlyMemory<char>
's equality w.r.t. other similar items just is not a safe thing to do at all. You'll also recall all the bugs and barriers we had to fix back when we did the initial switch from our own DvText
to ReadOnlyMemory<char>
. #Closed
@@ -440,7 +440,7 @@ public static IEnumerable<SchemaShape.Column> GetTrainerOutputAnnotation(bool is | |||
/// <param name="labelColumn">Label column.</param> | |||
public static IEnumerable<SchemaShape.Column> AnnotationsForMulticlassScoreColumn(SchemaShape.Column? labelColumn = null) | |||
{ | |||
var cols = new List<SchemaShape.Column>(); | |||
var cols = new List<SchemaShape.Column>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I see the only change in this file is this incorrect indentation. Perhaps we want to fix this? #Closed
using System; | ||
using System.Collections.Generic; | ||
using System.Collections.Immutable; | ||
using Microsoft.ML.Runtime; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the header lines #Resolved
/// <summary> | ||
/// The calculated value of <a href="https://en.wikipedia.org/wiki/Precision_and_recall#Precision">precision</a> for each class. | ||
/// </summary> | ||
public ImmutableArray<double> PerClassPrecision { get; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ImmutableArray [](start = 15, length = 14)
That's fine, note that elsewhere in similar situations we've been using IReadOnlyList
, but I guess this works too? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also prefer to expose IReadOnlyList just for the sake of consistency. (I don't think we use ImmutableArray in public interface in metrics at all).
In reply to: 275005362 [](ancestors = 275005362)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So @sfilipi I notice this was marked as resolved, if you disagree with me and @Ivanidzo4ka that's fine, but would you mind sharing justification? It is somewhat inconsistent?
In reply to: 275082837 [](ancestors = 275082837,275005362)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ivan has commented after i marked it as resolved, and i marked it as resolved because i thought your comment was just an observation, not a suggestion for change. I'll swap it.
We do use the ImmutableArrays in the Normalizers.
In reply to: 275472667 [](ancestors = 275472667,275082837,275005362)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did the change. Just wanted to mention that the reason i used arrays in the first place, is because i believe arrays are preferred over lists when it comes to small collections that don't get updated frequently.
In reply to: 275504885 [](ancestors = 275504885,275472667,275082837,275005362)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sfilipi. You may be right, but for things like this I feel like using the more "consistent" solution with what we've done elsewhere, may be more desirable. (Even if one or two places we do use immutable arrays. See also VectorDataViewType.Dimensions
, but that one was deliberate.)
It may be that in future we may decide using immutable arrays is more desirable. Incidentally this is allowed by the rules is "Returning a value of a more derived type for a property, field, return or out value", so in principle we could change what we think of as read only lists to immutable arrays.
In reply to: 275554556 [](ancestors = 275554556,275504885,275472667,275082837,275005362)
/// <param name="predictedClassIndicatorIndex">The index of the predicted label indicator, in the <see cref="PredictedClassesIndicators"/>.</param> | ||
/// <param name="actualClassIndicatorIndex">The index of the actual label indicator, in the <see cref="PredictedClassesIndicators"/>.</param> | ||
/// <returns></returns> | ||
public double GetCountForClassPair(uint predictedClassIndicatorIndex, uint actualClassIndicatorIndex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actualClassIndicatorIndex [](start = 83, length = 25)
Indices are usually int
, as far as I understand. I understand that as far as they are key types they are represented as int
, but when it comes down to, say, doing slot indexing or whatever, they're int
s. What you think? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, might be nice if there was a special property to indicate the number of classes, maybe. I guess they can get it from the above lists, but, might be nice otherwise.
In reply to: 275005689 [](ancestors = 275005689)
|
||
private readonly IHost _host; | ||
|
||
internal ConfusionMatrix(IHost host, double[] precision, double[] recall, double[][] confusionTableCounts, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
precision [](start = 54, length = 9)
Might be nice to have some asserts here to make poor readers of the code (like me!) aware of the intent here -- I assume precision
and recall
have the same length, that confusionTableCounts
is a jagged array with that same length along both dimensions, but these are just guesses based on my understanding of the setting, which many people coming to the code fresh would not have. And even I'm not sure. #Closed
/// It contains information about the predicted classes, if that information is available. | ||
/// It can be the classes names, or their indices. | ||
/// </summary> | ||
public DataViewSchema.Annotations ClassIndicators { get; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public DataViewSchema.Annotations ClassIndicators { get; } [](start = 7, length = 59)
@[email protected] is exposing this desirable? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to not? AFAIK you get the class indicators out of the column they pass to the evaluate method. So why store it here? Why not just say, "get the indicators?" What does this API buy you? So my initial reaction is, "no, why is this object so complicated?"
In reply to: 275067993 [](ancestors = 275067993)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, if I think about a confusion matrix, all I really want is the matrix, maybe a few summary statistics. Making this API for this limited object responsible for querying the metadata on the predicted label column when we already have APIs to do that that are more general seems like it is just making the story harder than it needs to be.
In reply to: 275473328 [](ancestors = 275473328,275067993)
/// The indicators of the predicted classes. | ||
/// It might be the classes names, or just indices of the predicted classes, if the name mapping is missing. | ||
/// </summary> | ||
internal IReadOnlyList<ReadOnlyMemory<char>> PredictedClassesIndicators; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PredictedClassesIndicators [](start = 53, length = 26)
if the above is ok, i can remove this, and just pass in the annotations everywhere. #Resolved
// if there is a Weight column, return the weighted confusionMatrix as well, from this function. | ||
if (isWeighted) | ||
{ | ||
confusionMatrix = GetConfusionTableAsType(host, confusionDataView, binary, sample, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
confusionMatrix = GetConfusionTableAsType(host, confusionDataView, binary, sample, false); [](start = 16, length = 90)
why we doing this again?
We already did that in line 1359 #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return confusionTableString; | ||
} | ||
|
||
public static ConfusionMatrix GetConfusionTableAsType(IHost host, IDataView confusionDataView, bool binary = true, int sample = -1, bool getWeighted = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetConfusionTableAsType [](start = 38, length = 23)
Maybe it make sense to rename this method to GetConfusionMatrix
and GetConfusionTable
to GetConfusionMatrixAsString
or GetConfusionString
? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetConfusionMatrixAsString exists. So I used GetConfusionTableAsFormattedString.
and I hate it already. lmk if anything better comes to mind.
In reply to: 275080299 [](ancestors = 275080299)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you at least rename this one to GetConfusionMatrix
?
In reply to: 275088372 [](ancestors = 275088372,275080299)
@@ -1553,13 +1581,13 @@ private static string GetFoldMetricsAsString(IHostEnvironment env, IDataView dat | |||
} | |||
|
|||
// Get a string representation of a confusion table. | |||
private static string GetConfusionTableAsString(double[][] confusionTable, double[] rowSums, double[] columnSums, | |||
List<ReadOnlyMemory<char>> predictedLabelNames, string prefix = "", bool sampled = false, bool binary = true) | |||
internal static string GetConfusionTableAsString(ConfusionMatrix confusionMatrix, bool isWeighted) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetConfusionTableAsString [](start = 31, length = 25)
Mix of ConfusionMatrix
and ConfusionTable
makes me
...
http://i.imgur.com/lgSDkvC.gif
...
Confuse!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -1591,10 +1619,11 @@ private static string GetFoldMetricsAsString(IHostEnvironment env, IDataView dat | |||
else | |||
rowLabelFormat = string.Format("{{1,{0}}} ||", paddingLen); | |||
|
|||
var confusionTable = confusionMatrix.ConfusionTableCounts; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var confusionTable = confusionMatrix.ConfusionTableCounts; [](start = 12, length = 58)
SO MUCH CONFUSION!
no, seriously, how confusionTable can be ConfusionTableCounts in ConfusionMatrix? #Resolved
var sb = new StringBuilder(); | ||
if (numLabels == 2 && binary) | ||
if (numLabels == 2 && confusionMatrix.Binary) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IsBinary
? #Resolved
@@ -74,6 +74,8 @@ public class BinaryClassificationMetrics | |||
/// </remarks> | |||
public double AreaUnderPrecisionRecallCurve { get; } | |||
|
|||
public ConfusionMatrix ConfusionMatrix { get; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ConfusionMatrix [](start = 31, length = 15)
need documentation #Closed
@@ -82,9 +82,12 @@ public sealed class MulticlassClassificationMetrics | |||
/// </remarks> | |||
public IReadOnlyList<double> PerClassLogLoss { get; } | |||
|
|||
internal MulticlassClassificationMetrics(IExceptionContext ectx, DataViewRow overallResult, int topKPredictionCount) | |||
// The confusion matrix. | |||
public ConfusionMatrix ConfusionMatrix { get; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ConfusionMatrix [](start = 31, length = 15)
need documentation #Resolved
PerClassLogLoss = perClassLogLoss.DenseValues().ToImmutableArray(); | ||
ConfusionMatrix = MetricWriter.GetConfusionTableAsType(host, confusionMatrix, binary: false, perClassLogLoss.Length, getWeighted:false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
false [](start = 141, length = 5)
it's false by default, you can probably omit it. #Resolved
// Get the counts names. | ||
var countColumn = confusionDataView.Schema[MetricKinds.ColumnNames.Count]; | ||
var type = countColumn.Annotations.Schema.GetColumnOrNull(AnnotationUtils.Kinds.SlotNames)?.Type as VectorDataViewType; | ||
host.Assert(type != null && type.IsKnownSize && type.ItemType is TextDataViewType, "The Count column does not have a text vector metadata of kind SlotNames."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Count [](start = 100, length = 5)
{MetricKinds.ColumnNames.Count} #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Get the counts names. | ||
var countColumn = confusionDataView.Schema[MetricKinds.ColumnNames.Count]; | ||
var type = countColumn.Annotations.Schema.GetColumnOrNull(AnnotationUtils.Kinds.SlotNames)?.Type as VectorDataViewType; | ||
host.Assert(type != null && type.IsKnownSize && type.ItemType is TextDataViewType, "The Count column does not have a text vector metadata of kind SlotNames."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SlotNames [](start = 158, length = 9)
AnnotationUtils.Kinds.SlotNames #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assert.InRange(recall, 0, 1); | ||
|
||
|
||
// Get the values in the annotations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DOUBLE LINES! DOUBLE SHOCK! #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ofc it'll happen, there is not compiler complaining about it :)
In reply to: 275464491 [](ancestors = 275464491)
private readonly IHost _host; | ||
|
||
/// <summary> | ||
/// The confusion matrix as a structured type, built from the counts of the confusion table idv. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idv [](start = 100, length = 3)
We don't have idv here. Or what is idv
? #Pending
/// <param name="labelNames">The predicted classes names, or the indexes of the classes, if the names are missing.</param> | ||
/// <param name="isSampled">Whether the classes are sampled.</param> | ||
/// <param name="isBinary">Whether the confusion table is the result of a binary classification. </param> | ||
/// <param name="classIndicators">The Annotations of the Count column, in the confusionTable idv.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
confusionTable idv [](start = 86, length = 18)
<paramref name="confusionTableCounts"/>
? #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant the Annotations in the original IDV that gets created by the evaluator. Expanding the comment with the above note. See if what i put in now is too much.
In reply to: 275468277 [](ancestors = 275468277)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually removed the param, so description went away too.
In reply to: 275527727 [](ancestors = 275527727,275468277)
/// Returns a human readable representation of the confusion table. | ||
/// </summary> | ||
/// <returns></returns> | ||
public string GetFormattedConfusionTable() => MetricWriter.GetConfusionTableAsString(this, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MetricWriter.GetConfusionTableAsString [](start = 54, length = 38)
Can we cache it?
Or just calculate it constructor?
I'm not sure in which case this would produce you different results in different times of call. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
caching it and calculating it when first asked for.
In reply to: 275469566 [](ancestors = 275469566)
// Get the counts names. | ||
var countColumn = confusionDataView.Schema[MetricKinds.ColumnNames.Count]; | ||
var type = countColumn.Annotations.Schema.GetColumnOrNull(AnnotationUtils.Kinds.SlotNames)?.Type as VectorDataViewType; | ||
host.Assert(type != null && type.IsKnownSize && type.ItemType is TextDataViewType, "The Count column does not have a text vector metadata of kind SlotNames."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
, "The Count column does not have a text vector metadata of kind SlotNames." [](start = 93, length = 76)
Assert mesages are pointless so I'd prefer to not have them, and instead have a comment, but that's OK. #Closed
internal ConfusionMatrix(IHost host, double[] precision, double[] recall, double[][] confusionTableCounts, | ||
List<ReadOnlyMemory<char>> labelNames, bool isSampled, bool isBinary, DataViewSchema.Annotations classIndicators) | ||
{ | ||
_host = host; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_host = host; [](start = 12, length = 13)
Not a big deal but while you're at it, might help to Contracts.AssertValue(host)
before assigning it. #Closed
@Ivanidzo4ka @TomFinley as you find time. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sfilipi for putting up with my awesome commentary. :) Looks good.
|
||
/// <summary> | ||
/// The confsion matrix counts for the combinations actual class/predicted class. | ||
/// The actual classes are in the rows of the table, and the predicted classes in the columns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rows of the table, and the predicted classes in the columns [](start = 42, length = 59)
I guess that what is a row and what is a column is subjective. Maybe we could talk about the outer/inner index or first/second index? #Resolved
/// The indicators of the predicted classes. | ||
/// It might be the classes names, or just indices of the predicted classes, if the name mapping is missing. | ||
/// </summary> | ||
public int NumberOfPredictedClasses { get; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just meant to give the number of classes?
Maybe we could simplify both the name and the comment.
Suggestion: NumberOfClasses
#Resolved
var confusionMatrix = GetConfusionMatrix(host, confusionDataView, binary, sample, false); | ||
var confusionTableString = GetConfusionTableAsString(confusionMatrix, false); | ||
|
||
// if there is a Weight column, return the weighted confusionMatrix as well, from this function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: capital letter. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…ts can run in WIndows debug.
c05a80c
to
415937d
Compare
Fixes #2009 by exposing the confusion matrix.