Skip to content

Cannot produce a Precision-Recall Curve from our Binary Classification Evaluator #2645

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rogancarr opened this issue Feb 20, 2019 · 13 comments · Fixed by #3039
Closed

Cannot produce a Precision-Recall Curve from our Binary Classification Evaluator #2645

rogancarr opened this issue Feb 20, 2019 · 13 comments · Fixed by #3039
Assignees
Labels
API Issues pertaining the friendly API usability Smoothing user interaction or experience
Milestone

Comments

@rogancarr
Copy link
Contributor

In the Supported Scenarios for V1 (see #2498) we want to be able to easily plot a PR curve from output of the evaluator.

This is not currently possible in the current set of APIs*. As detailed in #2465, we cannot set a threshold on a binary classification prediction, so we cannot loop over the various thresholds. Furthermore, the Evaluator, while it has enough information to return lists of precisions and recalls at the various cutoffs, does not have an avenue to return this information.

* Well, actually this is possible using custom mappers like so:

  1. Get a list of all unique probability scores
    (e.g. by reading the IDataView as an enumerable and creating a dictionary of observed values)
  2. For each unique value of the probability:
    a. Write a custom mapper to produce PredictedLabel at that probability threshold.
    b. Calculate Precision and Recall with these labels.

However, it doesn't seem like this workaround qualifies as satisfying the V1 Scenario — it's a more advanced use of the APIs, and it requires an individual to understand the ins-and-outs of calculating PR curves

@rogancarr rogancarr added API Issues pertaining the friendly API usability Smoothing user interaction or experience labels Feb 20, 2019
@shauheen
Copy link
Contributor

Thanks for filing the issue @rogancarr, this is clearly a lack of functionality but we can surely do this post March.

@rogancarr
Copy link
Contributor Author

@shauheen Sounds good! The scenario is technically blocked, but there is a workaround (documented above) because our APIs are so flexible.

@CESARDELATORRE
Copy link
Contributor

That workaround looks pretty complex for a getting started sample.
We're hitting this issue in one of the samples (Spam detection):

https://github.com/dotnet/machinelearning-samples/blob/73803ce861e16b48edefaad97771bcd721e268d6/samples/csharp/getting-started/BinaryClassification_SpamDetection/SpamDetectionConsoleApp/Program.cs#L61-L66

We cannot use the new BinaryPredictionTransformer() any longer..

I hope we can fix/simplify this for 0.12 or by v1.0 in any case?

@rogancarr
Copy link
Contributor Author

@CESARDELATORRE Current plan is for post-v1 @shauheen

@CESARDELATORRE
Copy link
Contributor

@asthana86 @shauheen @eerhardt - So, shall we remove the "Spam Detection" sample from the repo?
Or @rogancarr can you help on implementing the mentioned workaround above in the "Spam Detection" sample? - But I'm not sure if the workaround is going to be too complex for a getting started sample...
Thoughts?

@rogancarr
Copy link
Contributor Author

I'd be happy to help, but let's first resolve if this is in or out for V1.

@shauheen
Copy link
Contributor

shauheen commented Mar 1, 2019

This along with #2465 would be stretch goals. At the moment we should not take these.

@ganik ganik self-assigned this Mar 14, 2019
@shauheen shauheen added this to the 0319 milestone Mar 15, 2019
@ganik
Copy link
Member

ganik commented Mar 19, 2019

We do have functionality that calculates PR curve, its just we dont expose it yet (see internal):

internal sealed class BinaryClassifierEvaluator : RowToRowEvaluatorBase<BinaryClassifierEvaluator.Aggregator>
    {
        public sealed class Arguments
        {
            [Argument(ArgumentType.AtMostOnce, HelpText = "Probability value for classification thresholding")]
            public Single Threshold;

            [Argument(ArgumentType.AtMostOnce, HelpText = "Use raw score value instead of probability for classification thresholding", ShortName = "useRawScore")]
            public bool UseRawScoreThreshold = true;

            [Argument(ArgumentType.AtMostOnce, HelpText = "The number of samples to use for p/r curve generation. Specify 0 for no p/r curve generation", ShortName = "numpr")]
            public int NumRocExamples;

            [Argument(ArgumentType.AtMostOnce, HelpText = "The number of samples to use for AUC calculation. If 0, AUC is not computed. If -1, the whole dataset is used", ShortName = "numauc")]
            public int MaxAucExamples = -1;

            [Argument(ArgumentType.AtMostOnce, HelpText = "The number of samples to use for AUPRC calculation. Specify 0 for no AUPRC calculation", ShortName = "numauprc")]
            public int NumAuPrcExamples = 100000;
        }

So Arguments need to be public and our BinaryClassifier Evaluator should accept them

       /// <summary>
        /// Evaluates scored binary classification data, if the predictions are not calibrated.
        /// </summary>
        /// <typeparam name="T">The shape type for the input data.</typeparam>
        /// <param name="catalog">The binary classification catalog.</param>
        /// <param name="data">The data to evaluate.</param>
        /// <param name="label">The index delegate for the label column.</param>
        /// <param name="pred">The index delegate for columns from uncalibrated prediction of a binary classifier.
        /// Under typical scenarios, this will just be the same tuple of results returned from the trainer.</param>
        /// <returns>The evaluation results for these uncalibrated outputs.</returns>
        public static BinaryClassificationMetrics Evaluate<T>(
            this BinaryClassificationCatalog catalog,
            DataView<T> data,
            Func<T, Scalar<bool>> label,
            Func<T, (Scalar<float> score, Scalar<bool> predictedLabel)> pred)

@rogancarr
Copy link
Contributor Author

I don't think we need to make the arguments public necessarily. These mostly have to do with how we calculate the AU(PR|RO)C. I'd hold off on those for v1.0.

What we need is a way to somehow return the calculated points for the curve through the API.

@ganik
Copy link
Member

ganik commented Mar 19, 2019

@rogancarr I am specifically pointing out this one:

[Argument(ArgumentType.AtMostOnce, HelpText = "The number of samples to use for p/r curve generation. Specify 0 for no p/r curve generation", ShortName = "numpr")]
public int NumRocExamples;

If its not set (by default its 0) then no PR calculations are happening. It has to be set to have points calculated.

@rogancarr
Copy link
Contributor Author

Let's see how the API comes along and we can work through once we have running code.

I do think that this set of arguments seem a bit weird. I haven't seen the code, but I would expect the AU(RO|PR)C to need a (RO|PR)-curve calculation :)

             [Argument(ArgumentType.AtMostOnce, HelpText = "The number of samples to use for p/r curve generation. Specify 0 for no p/r curve generation", ShortName = "numpr")]
            public int NumRocExamples;

            [Argument(ArgumentType.AtMostOnce, HelpText = "The number of samples to use for AUC calculation. If 0, AUC is not computed. If -1, the whole dataset is used", ShortName = "numauc")]
            public int MaxAucExamples = -1;

            [Argument(ArgumentType.AtMostOnce, HelpText = "The number of samples to use for AUPRC calculation. Specify 0 for no AUPRC calculation", ShortName = "numauprc")]
            public int NumAuPrcExamples = 100000;

@ganik
Copy link
Member

ganik commented Mar 20, 2019

@rogancarr @Ivanidzo4ka @TomFinley
For a short term fix (for V1) suggestion is to have additional params numRocExamples passed into public static BinaryClassificationMetrics Evaluate(...)
Post V1 we will need to discuss how to streamline Arguments passing for API, command line, entrypoints. It would be great to have a single Arguments class for all these cases

@TomFinley
Copy link
Contributor

TomFinley commented Mar 20, 2019

@rogancarr @Ivanidzo4ka @TomFinley
For a short term fix (for V1) suggestion is to have additional params numRocExamples passed into public static BinaryClassificationMetrics Evaluate(...)

That seems fine. In fact that is the correct solution, not just in the short term. Probably as an overload, or even a differently named evaluate method on the context, something like EvaluateWithPRCurve or somesuch.

Post V1 we will need to discuss how to streamline Arguments passing for API, command line, entrypoints. It would be great to have a single Arguments class for all these cases

I disagree with that rather entirely, and I feel that we need to be clear on this point. Entry-points and command line have their own little "interface" due to their own unique infrastructure -- we produce Arguments and Options classes specifically because that's what the entry-points and command line need to use for their own infrastructure.

"Unifying" them is what we did with the ML.NET v0.1 API: a "straight" conversion of this same infrastructure. This we know results in a terrible API. In fact all of the work we've been doing for the past year starting with #371 and onwards in one way or another has been making an actual real honest-to-god C# API, rather than going down what we know is that bad path.

So, this is not something we will be doing post v1 unless we have forgotten our experiences.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API usability Smoothing user interaction or experience
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants