Skip to content

Microsoft.ML nuget package no longer has a way to specify number of bins for binning normalization #3109

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
eerhardt opened this issue Mar 27, 2019 · 7 comments · Fixed by #3118
Assignees
Milestone

Comments

@eerhardt
Copy link
Member

eerhardt commented Mar 27, 2019

In v0.11.0, it was possible to write code like the following:

var normalizer = mLContext.Transforms.Normalize(
    new NormalizingEstimator.BinningColumnOptions(outputColumnName: "Label", numBins: 2));

This allowed you to create a binning normalizer with the number of bins set to 2.

However, this API is no longer public in Microsoft.ML. There is a public API where you can specify mode: NormalizingEstimator.NormalizationMode.Binning, but there is no way to set the number of bins. So when you use this mode, you always MaximumBinCount set to the default 1024.

@codemzs
Copy link
Member

codemzs commented Mar 27, 2019

@eerhardt You set max bins via experimental nuget, more here

@eerhardt
Copy link
Member Author

I don't understand why I would have to use an experimental nuget package in order to use binning normalization. Especially since in the "stable" API there exists a Binning mode, but that mode is virtually useless if I can't specify how many bins I want.

@codemzs
Copy link
Member

codemzs commented Mar 27, 2019

I'll let @artidoro comment on why this change was made.

@TomFinley
Copy link
Contributor

At first glance this seems like an oversight we should correct. It seems like if you want nothing else out of your bin normalizer, you'd want to configure the number of discretization points. Add back in, maybe write somethign in the functional tests to cover it?

eerhardt added a commit to eerhardt/UWP-MachineLearning-Sample that referenced this issue Mar 27, 2019
Currently blocked by:

dotnet/machinelearning#3090
dotnet/machinelearning#3119

Also found:

dotnet/machinelearning#3109, which requires the usage of the Microsoft.ML.Experimental nuget package to using a binning normalizer.
@TomFinley
Copy link
Contributor

TomFinley commented Mar 27, 2019

It seems like this is insufficient:

public static NormalizingEstimator Normalize(this TransformsCatalog catalog,
string outputColumnName, string inputColumnName = null,
NormalizingEstimator.NormalizationMode mode = NormalizingEstimator.NormalizationMode.MinMax)
=> new NormalizingEstimator(CatalogUtils.GetEnvironment(catalog), outputColumnName, inputColumnName ?? outputColumnName, mode);

The way we solved this from the POV of the command line is we had minmax, GCN, and bin look like separate transforms. Maybe we ought to reflect that through this here. So: maybe there should be one method for minmax, another for GCN, another for bin, thereby allowing more detailed configuration since that's often useful.

While we're at it, I see we don't have the fix-zero configuration option settable, just an FYI, and that's pretty important.

@codemzs
Copy link
Member

codemzs commented Mar 27, 2019

@TomFinley What do you feel about this?

@TomFinley
Copy link
Contributor

@TomFinley What do you feel about this?

Looks good at first glance. Might need some input-output column pair overloads for the multicolumn mapping, but I don't insist on it. The people I view as having the most relevant feedback on this though are @artidoro (since he introduced it I believe) and @eerhardt (since he's the one that raised this issue).

@eerhardt eerhardt added this to the 0319 milestone Mar 29, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
4 participants