Skip to content

Improve TrainerEstimatorBase exception messages for types #2044

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomFinley opened this issue Jan 6, 2019 · 0 comments
Closed

Improve TrainerEstimatorBase exception messages for types #2044

TomFinley opened this issue Jan 6, 2019 · 0 comments
Assignees
Labels
API Issues pertaining the friendly API usability Smoothing user interaction or experience

Comments

@TomFinley
Copy link
Contributor

Consider this error message:

if (!FeatureColumn.IsCompatibleWith(featureCol))
throw Host.Except($"Feature column '{FeatureColumn.Name}' is not compatible");

This is quite a charming message. Most of the time (not always, perhaps) a trainer expects a known size vector of float. What if the user feeds in something else? They get this, and are told that it is "not compatible." It is absolutely true. That's not compatible. They try something else, if it doesn't happen to be exactly right, they get the same error message. Not compatible. So, while absolutely true, it sort of reminds me of this game:

Calvin and Hobbes "Nope guess again"

This error message is in the TrainerEstimatorBase. Which is to say, nearly all trainers (except specialized trainers) will check the features column through this, which means that a user's likelihood of seeing this message is fairly high.

Consider the following far more helpful example. Note that we even have a convenience function for standard reporting and formatting of this type mismatch error, since it is so common!

if (t != NumberType.Float && t.KeyCount <= 0)
throw Host.ExceptSchemaMismatch(nameof(schema), "label", schema.Label.Value.Name, "float or a known-cardinality key", t.ToString());

Of course, it may be that there are plenty of places that we are not being less descriptive with types than we should be, but since trainers (and especially the base class!) will be used by nearly everyone, we might identify this as deserving special prioritization.

We should probably change TrainerEstimatorBase so that it or its subclasses use these more descriptive helpers, throughout the type checks. Ideally we'd also scour the codebase for typical checks of this form to make sure they are likewise descriptive.

That is at least what we should do, but, in a super ideal world, we might be able to go one step further and even provide advice on what should be done to get it into the required form (that is, be not just descriptive, but prescriptive, to borrow a phrase from @stephentoub), but how to do that is unclear to me. (The best thing I can think of is, we, by which I mean a data scientist, probably ought to write a web article on typical strategies for featurization, and why it's important to give it some thought, but that's certainly not something you want to put in an exception message. Though it might be a nice thing to link from the exception, if possible.)

/cc @stephentoub

@TomFinley TomFinley added API Issues pertaining the friendly API usability Smoothing user interaction or experience labels Jan 6, 2019
@artidoro artidoro self-assigned this Jan 23, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API usability Smoothing user interaction or experience
Projects
None yet
Development

No branches or pull requests

2 participants