-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Sweep Range of L2RegularizerWeight in AveragedPerceptron #579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @SolyarA ... this looks OK to me @justinormont is it along the lines of what you wanted in #567?
@TomFinley, @Zruty0 : What's the implications of missing the range of 0.4 to 0.5 in L2RegularizerWeight for AveragedPerceptron? A better fix for this would be to add a param to the sweep range to note if the range boundaries are inclusive vs. exclusive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot say what the implications are, I don't have an intuition on how the optimizer would behave close to 0.5 boundary. As for inclusive vs. exclusive boundaries: I believe they were originally included in the code (as Min/Max and also Inf/Lim, or something along these lines). But I assume it was deemed unnecessary complexity and removed? I don't find any trace of this anymore. In reply to: 407652667 [](ancestors = 407652667) |
I'm not sure "close" to 0.5 is actually a completely sensible value. See here: machinelearning/src/Microsoft.ML.StandardLearners/Standard/Online/AveragedLinear.cs Lines 82 to 83 in 5e08fa1
and here: machinelearning/src/Microsoft.ML.StandardLearners/Standard/Online/AveragedLinear.cs Line 209 in 5e08fa1
Indeed I feel like this is all somewhat haphazard, and whoever introduced this sweep range was making the mistake of confusing sweep range with defining valid values... which is not the point at all. Anyway, I'm inclined to just accept @justinormont if that is all right. It seems though like if we are going to have continuous values that the notion of inclusive vs. exclusive bounds needs to be accounted for somehow, not sure why such a concept would be removed. 😦 |
Thanks for pushing in. And thanks @SolyarA for your PR. @TomFinley: Agreed; we will want to reduce the range of the sweep params from the valid to the useful ranges. This will speed up the hyperparameter optimization. The only reason I see to keep the ranges as wide as the valid is if we can find examples where the extreme values led to good scores. We have further ideas on how to focus the sweeper's energy towards useful ranges of hyperparameters, so perhaps the work of figuring out the useful ranges won't be needed. |
* Changed range of L2RegularizerWeight parameter in AveragedPerceptron
* Changed range of L2RegularizerWeight parameter in AveragedPerceptron
Fixes #567