You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
throwHost.Except($"Feature column '{FeatureColumn.Name}' is not compatible");
This is quite a charming message. Most of the time (not always, perhaps) a trainer expects a known size vector of float. What if the user feeds in something else? They get this, and are told that it is "not compatible." It is absolutely true. That's not compatible. They try something else, if it doesn't happen to be exactly right, they get the same error message. Not compatible. So, while absolutely true, it sort of reminds me of this game:
This error message is in the TrainerEstimatorBase. Which is to say, nearly all trainers (except specialized trainers) will check the features column through this, which means that a user's likelihood of seeing this message is fairly high.
Consider the following far more helpful example. Note that we even have a convenience function for standard reporting and formatting of this type mismatch error, since it is so common!
throwHost.ExceptSchemaMismatch(nameof(schema),"label",schema.Label.Value.Name,"float or a known-cardinality key",t.ToString());
Of course, it may be that there are plenty of places that we are not being less descriptive with types than we should be, but since trainers (and especially the base class!) will be used by nearly everyone, we might identify this as deserving special prioritization.
We should probably change TrainerEstimatorBase so that it or its subclasses use these more descriptive helpers, throughout the type checks. Ideally we'd also scour the codebase for typical checks of this form to make sure they are likewise descriptive.
That is at least what we should do, but, in a super ideal world, we might be able to go one step further and even provide advice on what should be done to get it into the required form (that is, be not just descriptive, but prescriptive, to borrow a phrase from @stephentoub), but how to do that is unclear to me. (The best thing I can think of is, we, by which I mean a data scientist, probably ought to write a web article on typical strategies for featurization, and why it's important to give it some thought, but that's certainly not something you want to put in an exception message. Though it might be a nice thing to link from the exception, if possible.)
Consider this error message:
machinelearning/src/Microsoft.ML.Data/Training/TrainerEstimatorBase.cs
Lines 98 to 99 in faffd17
This is quite a charming message. Most of the time (not always, perhaps) a trainer expects a known size vector of float. What if the user feeds in something else? They get this, and are told that it is "not compatible." It is absolutely true. That's not compatible. They try something else, if it doesn't happen to be exactly right, they get the same error message. Not compatible. So, while absolutely true, it sort of reminds me of this game:
This error message is in the
TrainerEstimatorBase
. Which is to say, nearly all trainers (except specialized trainers) will check the features column through this, which means that a user's likelihood of seeing this message is fairly high.Consider the following far more helpful example. Note that we even have a convenience function for standard reporting and formatting of this type mismatch error, since it is so common!
machinelearning/src/Microsoft.ML.Data/Evaluators/MultiClassClassifierEvaluator.cs
Lines 82 to 83 in faffd17
Of course, it may be that there are plenty of places that we are not being less descriptive with types than we should be, but since trainers (and especially the base class!) will be used by nearly everyone, we might identify this as deserving special prioritization.
We should probably change
TrainerEstimatorBase
so that it or its subclasses use these more descriptive helpers, throughout the type checks. Ideally we'd also scour the codebase for typical checks of this form to make sure they are likewise descriptive.That is at least what we should do, but, in a super ideal world, we might be able to go one step further and even provide advice on what should be done to get it into the required form (that is, be not just descriptive, but prescriptive, to borrow a phrase from @stephentoub), but how to do that is unclear to me. (The best thing I can think of is, we, by which I mean a data scientist, probably ought to write a web article on typical strategies for featurization, and why it's important to give it some thought, but that's certainly not something you want to put in an exception message. Though it might be a nice thing to link from the exception, if possible.)
/cc @stephentoub
The text was updated successfully, but these errors were encountered: