Skip to content

Exposing the confusion matrix #3250

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Apr 20, 2019
Merged

Conversation

sfilipi
Copy link
Member

@sfilipi sfilipi commented Apr 8, 2019

Fixes #2009 by exposing the confusion matrix.

@sfilipi sfilipi changed the title Exposing the confusion matrix [WIP] Exposing the confusion matrix Apr 8, 2019
@sfilipi
Copy link
Member Author

sfilipi commented Apr 8, 2019

I am sending this as not final, because i wanted to get early feedback on how much to expose, and where to place it. I still have to document it and add tests before checking in.

I am not adding support to build the weighted confusion matrix, or adding capabilities to sample a subset of classes for it thinking that it might be enough for V1. The infrastructure is in there, let me know if it is desired. #Resolved

@codecov
Copy link

codecov bot commented Apr 9, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@a48dcef). Click here to learn what that means.
The diff coverage is 81.43%.

@@            Coverage Diff            @@
##             master    #3250   +/-   ##
=========================================
  Coverage          ?   72.71%           
=========================================
  Files             ?      808           
  Lines             ?   145289           
  Branches          ?    16237           
=========================================
  Hits              ?   105640           
  Misses            ?    35232           
  Partials          ?     4417
Flag Coverage Δ
#Debug 72.71% <81.43%> (?)
#production 68.23% <72.56%> (?)
#test 88.99% <100%> (?)
Impacted Files Coverage Δ
.../Evaluators/Metrics/BinaryClassificationMetrics.cs 95.23% <100%> (ø)
test/Microsoft.ML.Functional.Tests/Common.cs 98.38% <100%> (ø)
...luators/Metrics/MulticlassClassificationMetrics.cs 100% <100%> (ø)
...sts/Scenarios/Api/Estimators/PredictAndMetadata.cs 100% <100%> (ø)
test/Microsoft.ML.Functional.Tests/Evaluation.cs 100% <100%> (ø)
...s/Metrics/CalibratedBinaryClassificationMetrics.cs 100% <100%> (ø)
...ta/Evaluators/MulticlassClassificationEvaluator.cs 77.73% <100%> (ø)
src/Microsoft.ML.Data/Evaluators/EvaluatorUtils.cs 73.65% <66%> (ø)
...soft.ML.Data/Evaluators/Metrics/ConfusionMatrix.cs 70% <70%> (ø)
...ft.ML.Data/Evaluators/BinaryClassifierEvaluator.cs 77.18% <77.77%> (ø)

{
public double[] PrecisionSums { get; }
public double[] RecallSums { get; }
public ReadOnlyMemory<char>[] Labels => PredictedLabelNames.ToArray(); //sefilipi: return as string[]
Copy link
Contributor

@TomFinley TomFinley Apr 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReadOnlyMemory [](start = 15, length = 14)

Remember, as before, that we provide access to the columns, but do not copy out the annotations data. Remember the discussion around having names.

Remember also, as we've found in #3101 that the labels may not even be text, or even exist. #Pending

Copy link
Member Author

@sfilipi sfilipi Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked a bit on how is the ConfusionMatrix idv build, and it looks like an indicator for the predictedLabels will always be there, and it will be of the ReadOnlyMemory<char> type.

https://github.com/dotnet/machinelearning/blob/master/src/Microsoft.ML.Data/Evaluators/MulticlassClassificationEvaluator.cs#L222
https://github.com/dotnet/machinelearning/blob/master/src/Microsoft.ML.Data/Evaluators/BinaryClassifierEvaluator.cs#L361

The metrics aggregators have the class names in the ctor arguments, and if the class names are missing, those get generated from the size of the Score column.

https://github.com/dotnet/machinelearning/blob/master/src/Microsoft.ML.Data/Evaluators/MulticlassClassificationEvaluator.cs#L114
https://github.com/dotnet/machinelearning/blob/master/src/Microsoft.ML.Data/Evaluators/BinaryClassifierEvaluator.cs#L184

So given that, i left the labels as ReadOnlyMemory<char> instead of providing the annotations of the Count column, and just renamed it to PredictedClassIndicators, since they are not going to be the class names all the time.

Let me know if I am missing something else.


In reply to: 273615151 [](ancestors = 273615151)

Copy link
Contributor

@TomFinley TomFinley Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked a bit on how is the ConfusionMatrix idv build, and it looks like an indicator for the predictedLabels will always be there, and it will be of the ReadOnlyMemory<char> type.

Yes, but AFAIK that is mostly because it existed in order to provide a string to write to console, for the console application, back when this was purely a command line tool and not an API. So it has to have some string description, that doesn't mean that it's a good string description. I think in fact it'll just render the number as a string in case there is none. So... not something we want to expose as part of the API! #Closed


public override string ToString() => MetricWriter.GetConfusionTableAsString(this, false);

public double GetCountForClassPair(string predictedLabel, string actualLabel)
Copy link
Contributor

@TomFinley TomFinley Apr 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string predictedLabel, string actualLabel [](start = 43, length = 41)

As with all the remainder, we index slots by numeric index. They may optionally have a string label (or not), but that is not a requirement, and what the string label is (again, if it even exists) cannot be used as any sort of primary key. #Closed

Binary = binary;
}

public override string ToString() => MetricWriter.GetConfusionTableAsString(this, false);
Copy link
Contributor

@TomFinley TomFinley Apr 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MetricWriter.GetConfusionTableAsString [](start = 45, length = 38)

This is dangerous. Do we consider ToString overrides if they change, breaking changes? #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it is a breaking change.
I'll create a separate API for it.

"Changing the returned values for a property, field, return or 'out' value, such as the value returned from ToString"
on
https://github.com/dotnet/corefx/blob/master/Documentation/coding-guidelines/breaking-change-rules.md


In reply to: 273615670 [](ancestors = 273615670)

{
public sealed class ConfusionMatrix
{
public double[] PrecisionSums { get; }
Copy link
Contributor

@TomFinley TomFinley Apr 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double[] [](start = 15, length = 8)

Arrays are mutable -- never ever make them part of the public surface of an API that's intended to be an immutable structure. #Closed

{
public double[] PrecisionSums { get; }
public double[] RecallSums { get; }
public ReadOnlyMemory<char>[] Labels => PredictedLabelNames.ToArray(); //sefilipi: return as string[]
Copy link
Contributor

@TomFinley TomFinley Apr 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ToArray(); [](start = 68, length = 10)

THis should not exist anyway, but note that we generally there is an expectation in C# APIs that property accesses are "lightweight" in terms of their memory footprint. #Closed

@@ -2,6 +2,7 @@
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

using System.Linq;
using Microsoft.ML.Runtime;
Copy link
Contributor

@TomFinley TomFinley Apr 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I look through this file, it's unclear to me which of the changes require this. #Closed

var weightColumn = confusionDataView.Schema.GetColumnOrNull(MetricKinds.ColumnNames.Weight);
if (getWeighted)
host.CheckParam(weightColumn.HasValue, nameof(getWeighted), "There is no Weight column in the confusionMatrix data view.");

Copy link
Contributor

@TomFinley TomFinley Apr 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, is this something we anticipate might actually happen? We control the creation of this matrix, as far as I am aware, and this is a purely internal utility.

There are two possible answer to your questions, "no" in which case this should have been an assert (with the message as a comment, not a string!), or "yes" in which case this is something the user will potentially have to deal with, in which case having something that user might have a prayer of understanding would be required. (I do not view this message as actionable or even diagnostic in any way from a user's perspective.)

Same for other check below... #Closed


public double GetCountForClassPair(string predictedLabel, string actualLabel)
{
int predictedLabelIndex = PredictedLabelNames.IndexOf(predictedLabel.AsMemory());
Copy link
Contributor

@TomFinley TomFinley Apr 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

predictedLabel.AsMemory() [](start = 66, length = 25)

This API is not what we want anyway (since it relies on strings to index when there's no requirement that the strings even exist much less be unique), but in point of interest, please note that this code absolutely will not work, at least, not reliably in and sensible or predictable fashion.

var m = "hello my friend".AsMemory();
var tokens = new List<ReadOnlyMemory<char>>();
tokens.Add(m.Slice(0, 5));
tokens.Add(m.Slice(6, 2));
tokens.Add(m.Slice(9, 6));

foreach (var token in tokens)
    Console.WriteLine($"'{token}'");

Console.WriteLine(tokens.IndexOf("friend".AsMemory()));
Console.WriteLine(tokens.IndexOf(tokens[2]));

If compiled and run on .NET core 2.1, you get this output.

'hello'
'my'
'friend'
-1
2

So, relying on ReadOnlyMemory<char>'s equality w.r.t. other similar items just is not a safe thing to do at all. You'll also recall all the bugs and barriers we had to fix back when we did the initial switch from our own DvText to ReadOnlyMemory<char>. #Closed

@sfilipi sfilipi changed the title [WIP] Exposing the confusion matrix Exposing the confusion matrix Apr 12, 2019
@@ -440,7 +440,7 @@ public static IEnumerable<SchemaShape.Column> GetTrainerOutputAnnotation(bool is
/// <param name="labelColumn">Label column.</param>
public static IEnumerable<SchemaShape.Column> AnnotationsForMulticlassScoreColumn(SchemaShape.Column? labelColumn = null)
{
var cols = new List<SchemaShape.Column>();
var cols = new List<SchemaShape.Column>();
Copy link
Contributor

@TomFinley TomFinley Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I see the only change in this file is this incorrect indentation. Perhaps we want to fix this? #Closed

using System;
using System.Collections.Generic;
using System.Collections.Immutable;
using Microsoft.ML.Runtime;
Copy link
Contributor

@shauheen shauheen Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the header lines #Resolved

/// <summary>
/// The calculated value of <a href="https://en.wikipedia.org/wiki/Precision_and_recall#Precision">precision</a> for each class.
/// </summary>
public ImmutableArray<double> PerClassPrecision { get; }
Copy link
Contributor

@TomFinley TomFinley Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ImmutableArray [](start = 15, length = 14)

That's fine, note that elsewhere in similar situations we've been using IReadOnlyList, but I guess this works too? #Closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also prefer to expose IReadOnlyList just for the sake of consistency. (I don't think we use ImmutableArray in public interface in metrics at all).


In reply to: 275005362 [](ancestors = 275005362)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So @sfilipi I notice this was marked as resolved, if you disagree with me and @Ivanidzo4ka that's fine, but would you mind sharing justification? It is somewhat inconsistent?


In reply to: 275082837 [](ancestors = 275082837,275005362)

Copy link
Member Author

@sfilipi sfilipi Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ivan has commented after i marked it as resolved, and i marked it as resolved because i thought your comment was just an observation, not a suggestion for change. I'll swap it.

We do use the ImmutableArrays in the Normalizers.


In reply to: 275472667 [](ancestors = 275472667,275082837,275005362)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did the change. Just wanted to mention that the reason i used arrays in the first place, is because i believe arrays are preferred over lists when it comes to small collections that don't get updated frequently.


In reply to: 275504885 [](ancestors = 275504885,275472667,275082837,275005362)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sfilipi. You may be right, but for things like this I feel like using the more "consistent" solution with what we've done elsewhere, may be more desirable. (Even if one or two places we do use immutable arrays. See also VectorDataViewType.Dimensions, but that one was deliberate.)

It may be that in future we may decide using immutable arrays is more desirable. Incidentally this is allowed by the rules is "Returning a value of a more derived type for a property, field, return or out value", so in principle we could change what we think of as read only lists to immutable arrays.


In reply to: 275554556 [](ancestors = 275554556,275504885,275472667,275082837,275005362)

/// <param name="predictedClassIndicatorIndex">The index of the predicted label indicator, in the <see cref="PredictedClassesIndicators"/>.</param>
/// <param name="actualClassIndicatorIndex">The index of the actual label indicator, in the <see cref="PredictedClassesIndicators"/>.</param>
/// <returns></returns>
public double GetCountForClassPair(uint predictedClassIndicatorIndex, uint actualClassIndicatorIndex)
Copy link
Contributor

@TomFinley TomFinley Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actualClassIndicatorIndex [](start = 83, length = 25)

Indices are usually int, as far as I understand. I understand that as far as they are key types they are represented as int, but when it comes down to, say, doing slot indexing or whatever, they're ints. What you think? #Closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, might be nice if there was a special property to indicate the number of classes, maybe. I guess they can get it from the above lists, but, might be nice otherwise.


In reply to: 275005689 [](ancestors = 275005689)


private readonly IHost _host;

internal ConfusionMatrix(IHost host, double[] precision, double[] recall, double[][] confusionTableCounts,
Copy link
Contributor

@TomFinley TomFinley Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

precision [](start = 54, length = 9)

Might be nice to have some asserts here to make poor readers of the code (like me!) aware of the intent here -- I assume precision and recall have the same length, that confusionTableCounts is a jagged array with that same length along both dimensions, but these are just guesses based on my understanding of the setting, which many people coming to the code fresh would not have. And even I'm not sure. #Closed

/// It contains information about the predicted classes, if that information is available.
/// It can be the classes names, or their indices.
/// </summary>
public DataViewSchema.Annotations ClassIndicators { get; }
Copy link
Member Author

@sfilipi sfilipi Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public DataViewSchema.Annotations ClassIndicators { get; } [](start = 7, length = 59)

@[email protected] is exposing this desirable? #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to not? AFAIK you get the class indicators out of the column they pass to the evaluate method. So why store it here? Why not just say, "get the indicators?" What does this API buy you? So my initial reaction is, "no, why is this object so complicated?"


In reply to: 275067993 [](ancestors = 275067993)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, if I think about a confusion matrix, all I really want is the matrix, maybe a few summary statistics. Making this API for this limited object responsible for querying the metadata on the predicted label column when we already have APIs to do that that are more general seems like it is just making the story harder than it needs to be.


In reply to: 275473328 [](ancestors = 275473328,275067993)

/// The indicators of the predicted classes.
/// It might be the classes names, or just indices of the predicted classes, if the name mapping is missing.
/// </summary>
internal IReadOnlyList<ReadOnlyMemory<char>> PredictedClassesIndicators;
Copy link
Member Author

@sfilipi sfilipi Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PredictedClassesIndicators [](start = 53, length = 26)

if the above is ok, i can remove this, and just pass in the annotations everywhere. #Resolved

// if there is a Weight column, return the weighted confusionMatrix as well, from this function.
if (isWeighted)
{
confusionMatrix = GetConfusionTableAsType(host, confusionDataView, binary, sample, false);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confusionMatrix = GetConfusionTableAsType(host, confusionDataView, binary, sample, false); [](start = 16, length = 90)

why we doing this again?
We already did that in line 1359 #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be true instead of false.


In reply to: 275079866 [](ancestors = 275079866)

return confusionTableString;
}

public static ConfusionMatrix GetConfusionTableAsType(IHost host, IDataView confusionDataView, bool binary = true, int sample = -1, bool getWeighted = false)
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetConfusionTableAsType [](start = 38, length = 23)

Maybe it make sense to rename this method to GetConfusionMatrix and GetConfusionTable to GetConfusionMatrixAsString or GetConfusionString? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetConfusionMatrixAsString exists. So I used GetConfusionTableAsFormattedString.

and I hate it already. lmk if anything better comes to mind.


In reply to: 275080299 [](ancestors = 275080299)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you at least rename this one to GetConfusionMatrix ?


In reply to: 275088372 [](ancestors = 275088372,275080299)

@@ -1553,13 +1581,13 @@ private static string GetFoldMetricsAsString(IHostEnvironment env, IDataView dat
}

// Get a string representation of a confusion table.
private static string GetConfusionTableAsString(double[][] confusionTable, double[] rowSums, double[] columnSums,
List<ReadOnlyMemory<char>> predictedLabelNames, string prefix = "", bool sampled = false, bool binary = true)
internal static string GetConfusionTableAsString(ConfusionMatrix confusionMatrix, bool isWeighted)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetConfusionTableAsString [](start = 31, length = 25)

Mix of ConfusionMatrix and ConfusionTable makes me
...
http://i.imgur.com/lgSDkvC.gif
...
Confuse!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is also confusionDataView


In reply to: 275081264 [](ancestors = 275081264)

@@ -1591,10 +1619,11 @@ private static string GetFoldMetricsAsString(IHostEnvironment env, IDataView dat
else
rowLabelFormat = string.Format("{{1,{0}}} ||", paddingLen);

var confusionTable = confusionMatrix.ConfusionTableCounts;
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var confusionTable = confusionMatrix.ConfusionTableCounts; [](start = 12, length = 58)

SO MUCH CONFUSION!
no, seriously, how confusionTable can be ConfusionTableCounts in ConfusionMatrix? #Resolved

var sb = new StringBuilder();
if (numLabels == 2 && binary)
if (numLabels == 2 && confusionMatrix.Binary)
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsBinary? #Resolved

@@ -74,6 +74,8 @@ public class BinaryClassificationMetrics
/// </remarks>
public double AreaUnderPrecisionRecallCurve { get; }

public ConfusionMatrix ConfusionMatrix { get; }
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConfusionMatrix [](start = 31, length = 15)

need documentation #Closed

@@ -82,9 +82,12 @@ public sealed class MulticlassClassificationMetrics
/// </remarks>
public IReadOnlyList<double> PerClassLogLoss { get; }

internal MulticlassClassificationMetrics(IExceptionContext ectx, DataViewRow overallResult, int topKPredictionCount)
// The confusion matrix.
public ConfusionMatrix ConfusionMatrix { get; }
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConfusionMatrix [](start = 31, length = 15)

need documentation #Resolved

PerClassLogLoss = perClassLogLoss.DenseValues().ToImmutableArray();
ConfusionMatrix = MetricWriter.GetConfusionTableAsType(host, confusionMatrix, binary: false, perClassLogLoss.Length, getWeighted:false);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

false [](start = 141, length = 5)

it's false by default, you can probably omit it. #Resolved

// Get the counts names.
var countColumn = confusionDataView.Schema[MetricKinds.ColumnNames.Count];
var type = countColumn.Annotations.Schema.GetColumnOrNull(AnnotationUtils.Kinds.SlotNames)?.Type as VectorDataViewType;
host.Assert(type != null && type.IsKnownSize && type.ItemType is TextDataViewType, "The Count column does not have a text vector metadata of kind SlotNames.");
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Count [](start = 100, length = 5)

{MetricKinds.ColumnNames.Count} #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved it to a comment, per Tom's suggestion.


In reply to: 275463006 [](ancestors = 275463006)

// Get the counts names.
var countColumn = confusionDataView.Schema[MetricKinds.ColumnNames.Count];
var type = countColumn.Annotations.Schema.GetColumnOrNull(AnnotationUtils.Kinds.SlotNames)?.Type as VectorDataViewType;
host.Assert(type != null && type.IsKnownSize && type.ItemType is TextDataViewType, "The Count column does not have a text vector metadata of kind SlotNames.");
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SlotNames [](start = 158, length = 9)

AnnotationUtils.Kinds.SlotNames #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved it to a comment, per Tom's suggestion.


In reply to: 275463153 [](ancestors = 275463153)

Assert.InRange(recall, 0, 1);


// Get the values in the annotations
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DOUBLE LINES! DOUBLE SHOCK! #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ofc it'll happen, there is not compiler complaining about it :)


In reply to: 275464491 [](ancestors = 275464491)

private readonly IHost _host;

/// <summary>
/// The confusion matrix as a structured type, built from the counts of the confusion table idv.
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idv [](start = 100, length = 3)

We don't have idv here. Or what is idv? #Pending

/// <param name="labelNames">The predicted classes names, or the indexes of the classes, if the names are missing.</param>
/// <param name="isSampled">Whether the classes are sampled.</param>
/// <param name="isBinary">Whether the confusion table is the result of a binary classification. </param>
/// <param name="classIndicators">The Annotations of the Count column, in the confusionTable idv.</param>
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confusionTable idv [](start = 86, length = 18)

<paramref name="confusionTableCounts"/>? #Pending

Copy link
Member Author

@sfilipi sfilipi Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant the Annotations in the original IDV that gets created by the evaluator. Expanding the comment with the above note. See if what i put in now is too much.


In reply to: 275468277 [](ancestors = 275468277)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually removed the param, so description went away too.


In reply to: 275527727 [](ancestors = 275527727,275468277)

/// Returns a human readable representation of the confusion table.
/// </summary>
/// <returns></returns>
public string GetFormattedConfusionTable() => MetricWriter.GetConfusionTableAsString(this, false);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MetricWriter.GetConfusionTableAsString [](start = 54, length = 38)

Can we cache it?
Or just calculate it constructor?
I'm not sure in which case this would produce you different results in different times of call. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

caching it and calculating it when first asked for.


In reply to: 275469566 [](ancestors = 275469566)

// Get the counts names.
var countColumn = confusionDataView.Schema[MetricKinds.ColumnNames.Count];
var type = countColumn.Annotations.Schema.GetColumnOrNull(AnnotationUtils.Kinds.SlotNames)?.Type as VectorDataViewType;
host.Assert(type != null && type.IsKnownSize && type.ItemType is TextDataViewType, "The Count column does not have a text vector metadata of kind SlotNames.");
Copy link
Contributor

@TomFinley TomFinley Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

, "The Count column does not have a text vector metadata of kind SlotNames." [](start = 93, length = 76)

Assert mesages are pointless so I'd prefer to not have them, and instead have a comment, but that's OK. #Closed

internal ConfusionMatrix(IHost host, double[] precision, double[] recall, double[][] confusionTableCounts,
List<ReadOnlyMemory<char>> labelNames, bool isSampled, bool isBinary, DataViewSchema.Annotations classIndicators)
{
_host = host;
Copy link
Contributor

@TomFinley TomFinley Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_host = host; [](start = 12, length = 13)

Not a big deal but while you're at it, might help to Contracts.AssertValue(host) before assigning it. #Closed

@sfilipi
Copy link
Member Author

sfilipi commented Apr 16, 2019

@Ivanidzo4ka @TomFinley as you find time. Thanks.

Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sfilipi for putting up with my awesome commentary. :) Looks good.

@sfilipi sfilipi requested a review from eerhardt April 19, 2019 07:08

/// <summary>
/// The confsion matrix counts for the combinations actual class/predicted class.
/// The actual classes are in the rows of the table, and the predicted classes in the columns.
Copy link
Contributor

@artidoro artidoro Apr 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rows of the table, and the predicted classes in the columns [](start = 42, length = 59)

I guess that what is a row and what is a column is subjective. Maybe we could talk about the outer/inner index or first/second index? #Resolved

/// The indicators of the predicted classes.
/// It might be the classes names, or just indices of the predicted classes, if the name mapping is missing.
/// </summary>
public int NumberOfPredictedClasses { get; }
Copy link
Contributor

@artidoro artidoro Apr 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just meant to give the number of classes?
Maybe we could simplify both the name and the comment.

Suggestion: NumberOfClasses #Resolved

var confusionMatrix = GetConfusionMatrix(host, confusionDataView, binary, sample, false);
var confusionTableString = GetConfusionTableAsString(confusionMatrix, false);

// if there is a Weight column, return the weighted confusionMatrix as well, from this function.
Copy link
Contributor

@artidoro artidoro Apr 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: capital letter. #Resolved

Copy link
Contributor

@artidoro artidoro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@sfilipi sfilipi merged commit 610ffcb into dotnet:master Apr 20, 2019
@sfilipi sfilipi deleted the confusionMatrix branch April 20, 2019 05:57
@ghost ghost locked as resolved and limited conversation to collaborators Mar 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bring back support for ConfusionMatrix in the new API
5 participants