-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Improve exception message and make consistent with ExceptSchemaMismatch #2217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
var instanceGetter = cursor.GetGetter<ReadOnlyMemory<char>>(index); | ||
if (!top.Schema.TryGetColumnIndex(AnomalyDetectionEvaluator.TopKResultsColumns.AnomalyScore, out index)) | ||
throw Host.Except("Data view does not contain the 'Anomaly Score' column"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anomaly Score' [](start = 71, length = 14)
some people would get picky about the capitalization :)
Does the exception handler change everything to lowercase? #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it changes to lowercase.
However, the second argument of ExceptSchemaMismatch
is string columnRole
and the message becomes:
return $"Could not find {columnRole} column '{columnName}'";
So columnRole only serves the purpose of qualifying "column", while the parameter columnName
will give the exact name. That's why I have made it lowercase everywhere.
In reply to: 250474268 [](ancestors = 250474268)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference would be to leave the column names as-is unless we are updating this everywhere (everywhere being any place that we refer to a specific column name.)
In reply to: 250823347 [](ancestors = 250823347,250474268)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of the column does not include a space, so the current parameter columnRole
is already not matching the actual column name. As you see in my version, I include both the parameter column role
which is a description of the role of the column "anomaly score", and the actual column name AnomalyDetectionEvaluator.TopKResultsColumns.AnomalyScore
as part of the exception message.
In reply to: 251125939 [](ancestors = 251125939,250823347,250474268)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this is meant to be human readable. What @Zruty0 was doing was trying to unify the error messages, I believe. And they were things like "the label column named 'Bubba' should be like this" or "the score column named 'Bizblam' should be like that," and whatnot. So I think the lowercase names for these roles specifically is appropriate, since it is just meant to be a textual message read and understood by a human.
For that same reason, I'm not too worried about these little strings everywhere. Considering that we have hundreds, possibly thousands of hand-written error messages, this seems fairly minor. So I'm not sure the situation requires something as elaborate as yet-another-static-class-full-of-strings. (I feel differently about something like MetadataUtils.Kinds
where correctness of algorithms is at stake if we get those wrong.) At a certain point these things lend less clarity, not more, I think? #Resolved
src/Microsoft.ML.Data/Evaluators/MultiOutputRegressionEvaluator.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.ML.Data/Evaluators/MultiOutputRegressionEvaluator.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.ML.StandardLearners/FactorizationMachine/FactorizationMachineTrainer.cs
Show resolved
Hide resolved
Codecov Report
@@ Coverage Diff @@
## master #2217 +/- ##
==========================================
- Coverage 71.13% 71.13% -0.01%
==========================================
Files 779 779
Lines 140271 140275 +4
Branches 16046 16047 +1
==========================================
+ Hits 99788 99789 +1
- Misses 36032 36035 +3
Partials 4451 4451
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🕐
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes to Contracts
I am very unenthusiastic about. This is "convenience" taken too far.
@@ -454,14 +455,22 @@ public static Exception ExceptSchemaMismatch(this IExceptionContext ctx, string | |||
=> Process(new ArgumentOutOfRangeException(paramName, MakeSchemaMismatchMsg(columnRole, columnName)), ctx); | |||
public static Exception ExceptSchemaMismatch(string paramName, string columnRole, string columnName, string expectedType, string actualType) | |||
=> Process(new ArgumentOutOfRangeException(paramName, MakeSchemaMismatchMsg(columnRole, columnName, expectedType, actualType))); | |||
public static Exception ExceptSchemaMismatch(string paramName, string columnRole, string columnName, Type expectedType, string actualType) | |||
=> Process(new ArgumentOutOfRangeException(paramName, MakeSchemaMismatchMsg(columnRole, columnName, expectedType.ToString(), actualType))); | |||
public static Exception ExceptSchemaMismatch(string paramName, string columnRole, string columnName, Type[] expectedType, string actualType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not like this very much. First of all it is using the wrong type... the IDataView
type system is based on its own type system, which is distinct from the .NET type system. (E.g., key values and unsigned integers have the same .NET type, but different ColumnType
instances.) Also one of the most common instances of this is something abstract like, "a known sized vector of float," which is not a concept expressible in this. Also it seems to be introducing a "convenience" for a very specialized case.
So, all this is to say, this last addition I am unenthusiastic about. Could we revert that please, and reconsider?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good, will do. What do you think of the use of typeof(float).ToString()
instead of using "float"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussing with Tom, I am reverting my changes only keeping the switch from "input" to "source" column in transformers. @singlis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @artidoro . I think I should probably explain my thinking here, since it may seem counterintuitive.
So generally I've found that having conveniences to make things more consistent is a good thing. However, one thing that I've observed, even in this PR, is that for exception messages specifically, I am a little wary of it. I consider the problems of #2185. One of the problems is the error messages are wrong, but to a casual reader it is not obvious that the error messages are wrong, since it is buried behind layers of "convenience" to make the messages consistent, while making it very, very hard for someone to figure out what the message actually is. I don't think the original author had ill intent, but they just got confused about what "convenience" led to what message and what the meaning of the inputs is. (As in fact we also saw here, with the controversy about "label" vs. "Label", and whether something was supposed to be a column name or not!!)
I'm not sure how to solve that issue, but it seems to me like this ExceptSchemaMismatch
has some problem that makes it badly conceived if its effect of "consistency" is that people are writing wrong error messages, and being fundamentally confused about it. I don't know how to solve that issue. But the change to introduce more layers of abstraction would only compound the problem that these "conveniences" are hard to use and people evidently find them very confusing.
This is not to say I think we should not use ExceptSchemaMismatch
, but I think it needs to be clarified in some fashion. Right now to a reviewer it is really, really hard to tell what is a "right" vs. "wrong" message (as seen by all the discussions in this PR), and the suggestion to get out of that hole by continuing to dig was not one I liked very much. Which is why I had to disagree with some other suggestions in this PR, sorry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @artidoro
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixes #2044.
I make the exception message consistent using
ExceptSchemaMismatch
when possible.In particular, as mentioned in the issue #2044, I did it for the base class for TrainerEstimators.