Skip to content

Mismatch in Label expectations for Multiclass Learners #2628

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rogancarr opened this issue Feb 19, 2019 · 7 comments · Fixed by #2804
Closed

Mismatch in Label expectations for Multiclass Learners #2628

rogancarr opened this issue Feb 19, 2019 · 7 comments · Fixed by #2804
Assignees
Labels
bug Something isn't working usability Smoothing user interaction or experience
Milestone

Comments

@rogancarr
Copy link
Contributor

rogancarr commented Feb 19, 2019

When training a multiclass classifier, the label must be converted to KeyType. Some learners, such as SDCA, do the conversion from R4 automatically, while some, like LogisticRegression do not. This behavior is confusing from an end-user perspective, as one cannot simply interchange learners without modifying the input pipeline.

I suggest adding auto-label-conversion for all multiclass learners in ML.NET.

@rogancarr rogancarr added API Issues pertaining the friendly API usability Smoothing user interaction or experience and removed API Issues pertaining the friendly API labels Feb 19, 2019
@Ivanidzo4ka
Copy link
Contributor

I can be wrong, but it looks like our decision making regarding auto-working-magically workflow for API is quite opposite. We no longer have auto cache, no auto normalization.
Maybe we should remove auto conversion to R4 as well from SDCA.

@Ivanidzo4ka
Copy link
Contributor

I have list of things I want to do to all learners #2613 if you during your discovery find something which doesn't fit into your list, can you extend it?

@TomFinley
Copy link
Contributor

TomFinley commented Feb 19, 2019

SDCA in fact does not convert from float automatically. It was instead written at a time years ago when the only type of label supported was float. Now that we have a more specific type we should shift to using it. It does not convert at all, it just supports both types.

If you prefer it to be consistent, it should be consistent in the other direction, that is, SDCA should stop doing its internal magic. That is, @Ivanidzo4ka's interpretation is correct.

We do not do "auto-conversion" for anything in ML.NET because (1) conversion is the job of transforms and (2) adding the smarts "anywhere" requires that we add them "everywhere," which is a maintainability nightmare. (In fact, it has proven to be practically impossible.)

We have an internal style guide that treats on this subject at some length. Well intentioned but utterly misguided people often think, "hey I know, I'll be super helpful and just support a bunch of types," but all it leads to is an inconsistent, broken feeling API. It is not precisely the same situation, since that deals with IDataView implementations, but the same philosophy and the same dangers of inconsistency (which is the subject of your issue!!) arise just the same.

@rogancarr
Copy link
Contributor Author

I like having consistency. Let's consider this a bug on SDCA for supporting floats instead of only KeyTypes as multiclass labels.

@Ivanidzo4ka
Copy link
Contributor

Just to make it clear, are we fixing this now or post v1?

@rogancarr
Copy link
Contributor Author

We should fix now, as it breaks a key training scenario that any trainer for a task can substituted in a training pipeline.

@TomFinley
Copy link
Contributor

Correct, we must fix now.

@shauheen shauheen added this to the 0319 milestone Mar 5, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working usability Smoothing user interaction or experience
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants