-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Microsoft.ML nuget package no longer has a way to specify number of bins for binning normalization #3109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I don't understand why I would have to use an experimental nuget package in order to use binning normalization. Especially since in the "stable" API there exists a |
I'll let @artidoro comment on why this change was made. |
At first glance this seems like an oversight we should correct. It seems like if you want nothing else out of your bin normalizer, you'd want to configure the number of discretization points. Add back in, maybe write somethign in the functional tests to cover it? |
Currently blocked by: dotnet/machinelearning#3090 dotnet/machinelearning#3119 Also found: dotnet/machinelearning#3109, which requires the usage of the Microsoft.ML.Experimental nuget package to using a binning normalizer.
It seems like this is insufficient: machinelearning/src/Microsoft.ML.Transforms/NormalizerCatalog.cs Lines 26 to 29 in 3663320
The way we solved this from the POV of the command line is we had minmax, GCN, and bin look like separate transforms. Maybe we ought to reflect that through this here. So: maybe there should be one method for minmax, another for GCN, another for bin, thereby allowing more detailed configuration since that's often useful. While we're at it, I see we don't have the fix-zero configuration option settable, just an FYI, and that's pretty important. |
@TomFinley What do you feel about this? |
Looks good at first glance. Might need some input-output column pair overloads for the multicolumn mapping, but I don't insist on it. The people I view as having the most relevant feedback on this though are @artidoro (since he introduced it I believe) and @eerhardt (since he's the one that raised this issue). |
In
v0.11.0
, it was possible to write code like the following:This allowed you to create a
binning
normalizer with the number of bins set to2
.However, this API is no longer public in
Microsoft.ML
. There is a public API where you can specifymode: NormalizingEstimator.NormalizationMode.Binning
, but there is no way to set the number of bins. So when you use this mode, you alwaysMaximumBinCount
set to the default1024
.The text was updated successfully, but these errors were encountered: