Skip to content

Exception when trying to Evaluate AveragedPerceptronTrainer, LinearSvm #1579

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
abgoswam opened this issue Nov 8, 2018 · 5 comments
Closed

Comments

@abgoswam
Copy link
Member

abgoswam commented Nov 8, 2018

For a couple of Learners, we get an exception during Evaluate

  • AveragedPerceptronTrainer
  • LinearSvm

Exception :

Message: System.ArgumentOutOfRangeException : Probability column 'Probability' not found
Parameter name: name

Sample :

    [Fact]
    public void OVA_BC_AP()
    {
        string dataPath = GetDataPath("breast-cancer.txt");

        // Create a new context for ML.NET operations. It can be used for exception tracking and logging, 
        // as a catalog of available operations and as the source of randomness.
        var mlContext = new MLContext(seed: 1);
        var reader = new TextLoader(mlContext, new TextLoader.Arguments()
        {
            Column = new[]
                    {
                        new TextLoader.Column("Label", DataKind.R4, 0),
                        new TextLoader.Column("Features", DataKind.R4, new [] { new TextLoader.Range(1, 9) }),
                    }
        });

        // Data
        var data = reader.Read(GetDataPath(dataPath));

        // Pipeline
        var pipeline = new AveragedPerceptronTrainer(mlContext, "Label", "Features");

        var model = pipeline.Fit(data);
        var predictions = model.Transform(data);

        // Metrics
        var metrics = mlContext.BinaryClassification.Evaluate(predictions);
    }
@yaeldekel
Copy link

Not sure this is a bug - AveragedPerceptron does not produce calibrated models. If we don't expose calibration APIs, we should probably do that. Also, we may want to consider warning when the probability column isn't there instead of throwing.

@Zruty0
Copy link
Contributor

Zruty0 commented Nov 8, 2018

..or call mlContext.BinaryClassification.EvaluateNonCalibrated.

Exposing the calibration API is fine. I would just make a calibration estimator for that though, that trains towards one parameter. Embedding calibrator into learner seems somewhat unnecessary.

@abgoswam
Copy link
Member Author

abgoswam commented Nov 8, 2018

Thanks for the comments. Using mlContext.BinaryClassification.EvaluateNonCalibrated got me metrics I was looking for.

I have a few follow up questions, based on the comments above:

  • What do me mean by the following "make a calibration estimator that trains towards one parameter" ?

  • My understanding is that currently some learners have calibrator embedded (e.g. FastTree) while other learners do not (e.g. AveragedPerceptron) . Is that by design ?

  • From a user perspective, is there a way to know if I should use Evaluate or EvaluateNonCalibrated . I feel this may confuse users of ML.NET

@Zruty0 @yaeldekel

@Zruty0
Copy link
Contributor

Zruty0 commented Nov 8, 2018

What do me mean by the following "make a calibration estimator that trains towards one parameter" ?

I mean that a calibrator is just one peculiar form of trainer: it learns a monotonous function that transforms 'scores' into 'probabilities', with the goal to minimize the log-loss against the 'target label'. So, it is actually a univariate classification trainer. We should create a PlattCalibrationEstimator to train Platt calibrators and a PavCalibrationEstimator to train PAV calibrators.

My understanding is that currently some learners have calibrator embedded (e.g. FastTree) while other learners do not (e.g. AveragedPerceptron) . Is that by design ?

Some learners under some conditions are essentially learning a calibrated model. For example, FastTree classifier already minimizes log-loss of Sigmoid(Score) against the target label. So, if we just take a sigmoid of the score, we already have a calibrated output (even though it's calibrated against train set). For such learners, we produce models that are 'self-calibrated'. For other learners, that don't have this property, we don't.

From a user perspective, is there a way to know if I should use Evaluate or EvaluateNonCalibrated . I feel this may confuse users of ML.NET

You can inspect the schema and see if there is a Probability column. If there is, you can use Evaluate, if there isn't, you can only use EvaluateNonCalibrated.

@abgoswam
Copy link
Member Author

Closing this issue,, since we verified that mlContext.BinaryClassification.EvaluateNonCalibrated gave us the desired metrics for AveragedPerceptron.

Created a separate issue #1622 for adding calibration estimators in ML.NET

@ghost ghost locked as resolved and limited conversation to collaborators Mar 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants