Sample Breaks When Change Learner from FastTree to AveragedPerceptron #2477

daholste · 2019-02-08T02:57:09Z

@srsaggam and I were looking at the sentiment analysis sample:

https://github.com/dotnet/machinelearning-samples/blob/a79ced6c6bb788c2189d81e5993863e15cf8be0c/samples/csharp/getting-started/BinaryClassification_SentimentAnalysis/SentimentAnalysis/SentimentAnalysisConsoleApp/Program.cs#L59

When you change the learner from 'FastTree' to 'AveragedPerceptron', the sample throws the exception:

System.ArgumentOutOfRangeException: 'Probability column 'Probability' not found'

This is probably because AveragedPerceptron is not calibrated, but FastTree is. Any thoughts on how to handle this scenario? We've successfully made use of PlattCalibratorEstimator, but I don't think this is supported b/c (1) it's not exposed / hanging off of MLContext, and (2) it takes an IPredictor

The text was updated successfully, but these errors were encountered:

srsaggam · 2019-02-08T02:58:18Z

@CESARDELATORRE This is blocking some part of CLI tool.

srsaggam · 2019-02-08T02:58:29Z

@justinormont

TomFinley · 2019-02-08T03:55:51Z

The error message describes it cannot find the column Probability. Some binary classifiers naturally provide in addition to the actual predictions, a calibrated probabilistic score. Most do not. FastTree, being a form of logitboost, does, but averaged perceptron does not. (See for analogous situation, perceptron in sklearn and gradient boosted classifiers in sklearn. Note that while both have predict only one of them has predict_proba. This is an analogous situation.)

If your scenario requires probabilities, please use one of our calibrator estimators to fit one. PlattCalibratorEstimator is a good default choice. Note that it is often important to fit the estimator to a different dataset than the one the original model was trained on, because the distribution of predicted labels between test and prediction sets often differs (see the concept of "overfitting" if you're curious as to reasons why), leading to a distored version of probabilities.

Nonetheless I do apologize; we have attempted to communicate that many learners would not be auto-calibrated months ago in issue #1604 and other channels outside of Github, but clearly that path of communication failed. I wonder if you share my opinion that other paths of communication might prove more reliable and capable, and if so, what you might suggest?

If this is answered to your satisfaction, please feel free to close. If not of course followup question are always welcomed.

daholste · 2019-02-08T06:17:59Z

Thank you, @TomFinley!
Sorry, re; communication, we were definitely aware that this was a result of the difference in auto-calibration between learners. My fault for not making this more clear.
Good to know that PlattCalibratorEstimator is the way to go! Thank you! Would you have any suggestions for a supported way to generate an IPredictor from an EstimatorChain (to be able to use PlattCalibratorEstimator)? Also, do you have any plans to hang PlattCalibatorEstimator off MLContext?

TomFinley · 2019-02-08T17:22:47Z

The relevant issue for that MLContext specifically is #1871, an issue opened and assigned to @sfilipi. If you have thoughts on what would be useful there, that would be a great place to provide that feedback, since that work is not complete. (Similar with other issues.) As you've noted, and as also noted in #1871, the current estimator is not conforming to the new "idioms." (Not working via MLContext, using the IPredictor interface, and so on.)

The result however will still ultimately be, like everything else in the API, an Estimator that returns a Transformer. Also most probably, it will presumably take a model sort of like it does today. If you wanted to get started using the estimator right now to unblock whatever scenario you had in mind, that would be the way to do it.

daholste · 2019-02-08T19:34:59Z

Awesome, thanks a lot @TomFinley! Going to close out this issue as a duplicate of #1871

daholste · 2019-02-08T19:37:25Z

@TomFinley -- also, need to search through existing issues before filing new ones and consuming your time. Thanks for the really helpful responses. Will try to do this better in the future

TomFinley added the answered label Feb 8, 2019

daholste closed this as completed Feb 8, 2019

ghost locked as resolved and limited conversation to collaborators Mar 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sample Breaks When Change Learner from FastTree to AveragedPerceptron #2477

Sample Breaks When Change Learner from FastTree to AveragedPerceptron #2477

daholste commented Feb 8, 2019 •

edited

Loading

srsaggam commented Feb 8, 2019 •

edited

Loading

srsaggam commented Feb 8, 2019

TomFinley commented Feb 8, 2019

daholste commented Feb 8, 2019 •

edited

Loading

TomFinley commented Feb 8, 2019

daholste commented Feb 8, 2019

daholste commented Feb 8, 2019

Sample Breaks When Change Learner from FastTree to AveragedPerceptron #2477

Sample Breaks When Change Learner from FastTree to AveragedPerceptron #2477

Comments

daholste commented Feb 8, 2019 • edited Loading

srsaggam commented Feb 8, 2019 • edited Loading

srsaggam commented Feb 8, 2019

TomFinley commented Feb 8, 2019

daholste commented Feb 8, 2019 • edited Loading

TomFinley commented Feb 8, 2019

daholste commented Feb 8, 2019

daholste commented Feb 8, 2019

daholste commented Feb 8, 2019 •

edited

Loading

srsaggam commented Feb 8, 2019 •

edited

Loading

daholste commented Feb 8, 2019 •

edited

Loading