Skip to content

Normalizer parameters require a manual cast #2854

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rogancarr opened this issue Mar 5, 2019 · 12 comments
Open

Normalizer parameters require a manual cast #2854

rogancarr opened this issue Mar 5, 2019 · 12 comments
Assignees
Labels
API Issues pertaining the friendly API enhancement New feature or request P2 Priority of the issue for triage purpose: Needs to be fixed at some point.

Comments

@rogancarr
Copy link
Contributor

rogancarr commented Mar 5, 2019

A trained normalizer does not have public methods or properties that allows access to the normalization parameters. For example, a minmax normalizer doesn't have the min, max, or zero-level available.

I would expect that the trained transformer would have a .Model property, similar to a trained learner.

This is required V1 Scenario (#2498):

  • I can inspect the normalization coefficients of a normalizer in my pipeline without much work. Easy to find via auto-complete.

@shauheen @eerhardt @TomFinley I marked this as P13, but it could be a V1.1 as it does not change APIs, but needs new APIs. This is similar to LDA (#2197), and I think we have a more global problem where trained transforms don't expose a .Model property similar to our learners.

@rogancarr rogancarr added enhancement New feature or request API Issues pertaining the friendly API labels Mar 5, 2019
@artidoro
Copy link
Contributor

artidoro commented Mar 5, 2019

I think it is actually possible to extract the normalizer parameters, but it is not well exposed and requires some casting. The fact that this issue was initially posted highlights that it is quite confusing.

The NormalizingTransformer has a Columns field which exposes ColumnOptions.

public readonly ImmutableArray<ColumnOptions> Columns;

These are not the usual ColumnOptions, we actually have different ones that are used as the arguments of the estimator. These ColumnOptions are meant to expose the the NormalizerParameters and have the following field:

public readonly NormalizerModelParametersBase ModelParameters;

As you will notice NormalizerModelParametersBase needs to be casted to the correct sealed class: AffineNormalizerModelParameters, CdfNormalizerModelParameters or BinNormalizerModelParameters.

Once that is done it's possible to access the parameters.

@artidoro
Copy link
Contributor

artidoro commented Mar 5, 2019

@artidoro artidoro assigned artidoro and unassigned artidoro Mar 5, 2019
@TomFinley
Copy link
Contributor

TomFinley commented Mar 6, 2019

Wait, is the issue that we inappropriately named something ColumnOptions, even though it is not actually an Options object, per se? That is, #2709 got a little too happy with its renamings, even if 95% of the renamings might have been correct?

I think something like this that has info should not be named options.

This suggests @shauheen that this belongs in project 13.

@rogancarr
Copy link
Contributor Author

rogancarr commented Mar 6, 2019

@artidoro Thanks for investigating this!

This is a bit confusing. I can't imagine that we really want to do something like:

var modelParams = transformer.Columns
    .First(x => x.Name == "Induced")
    .ModelParameters as NormalizingTransformer.AffineNormalizerModelParameters<float>;

I do realize that we have to get the parameters out per column, but the searching and casting is a bit awkward. Can we have something like:

var normalizerParameters = normalizer.Columns["MyColumn"].Model;

Where the type for normalizerParameters is given automatically? (Note that the Columns["MyColumn"] is also not possible.)

I originally made a recommendation not to put this in Project13 because I didn't know about the Columns option. I agree with @TomFinley that this needs a breaking change (although I think we still need to talk it through). I'll add Project 13 back on.

@TomFinley
Copy link
Contributor

Where the type for normalizerParameters is given automatically? (Note that the Columns["MyColumn"] is also not possible.)

Without having separate transform types for each sort, this is not possible. So I have a normalizer transform, one column normalized via CDF, the other via affine, the other some other way like binning or whatever. The type is impossible. You must cast, the way it is designed.

If it had been designed that a single application of the estimator could only normalize a single way, and there were different ITransformer implementing classes for each of those ways, then what you say could be possible, but that's not how this thing was designed. Not saying that's good or bad (I actually would prefer it in this strongly typed way), but it's not a simple fix, at all. Otherwise we would have done that already as we already did with, say, things like calibrated vs. uncalibrated predictors.

So, we are doomed to somewhat weak typing here, sorry to say.

@artidoro
Copy link
Contributor

artidoro commented Mar 7, 2019

I can at least rename ColumnOptions to something more intuitive.
How does ColumnNormalizationInfos sound?

@rogancarr
Copy link
Contributor Author

rogancarr commented Mar 7, 2019

Actually, I don't have a problem with Columns here. It's more that you have to search it for a name and that you have to do a manual cast. I'd suggest

  • Adding a convenience lookup, like normalizer.Columns["MyColumn"]
  • Rename the ModelParameters classes. .AffineNormalizerModelParameters<float> might be a bit much? It's not that clear that that's the class you want to cast to from the name.

@TomFinley
Copy link
Contributor

TomFinley commented Mar 7, 2019

Hi @rogancarr, since I misinterpreted your words, perhaps you could provide more detail. What you said is this.

var normalizerParameters = normalizer.Columns["MyColumn"].Model;

According to my best interpretation now, I believe now that you are under the impression that it is somehow possible for, at compile time, the type of normalizerParameters to be known. Is my interpretation of your belief correct? I did not think this was what you were saying originally, but if you are I can provide a different answer.

@rogancarr
Copy link
Contributor Author

rogancarr commented Mar 7, 2019

Well, I am simply describing what I want to have happen. I realize that we can't know the type because it's an option to the learner we haven't trained yet. That said, I don't want to have to cast :) I think manual casting is a huge barrier for someone getting started, especially when it's not clear what the cast needs to be.

So maybe what I want is helper extensions for each type of normalization.

@sfilipi
Copy link
Member

sfilipi commented Mar 9, 2019

we should probably change the title of the issue, since the parameters are actually publicly available.

@rogancarr rogancarr changed the title Normalizer parameters are not publicly available Normalizer parameters require a manual cast Mar 12, 2019
@abgoswam abgoswam self-assigned this Mar 21, 2019
@abgoswam
Copy link
Member

abgoswam commented Mar 21, 2019

As point by @TomFinley above, we have weak typing when it comes to getting the normalization parameters.

For v1.0 the expectation is that users will :

  • use the GetNormalizerModelParameters API to get the NormalizerModelParametersBase
  • cast it to the appropriate type e.g. AffineNormalizerModelParameters, BinNormalizerModelParameters etc to get the parameters

@rogancarr . It seems we would have to re-design this transform to enable the example above (of what you would have wanted)

  1. expose the Columns property in the public surface
  2. internally, Columns is an ImmutableArray . We would have to enable support to get Column by name.
  3. enable strong typing.

Seems this is something we should probably move to backlog or perhaps close ? It certainly looks out of scope for Project 13

@rogancarr
Copy link
Contributor Author

@abgoswam Let's move this to the backlog. Let's see how it lands with v1.0 and see if we want to make improvements for v1.1.

I think the solution for v1.1 is to make lots of explicit samples showing the casting.

@ganik ganik added the P2 Priority of the issue for triage purpose: Needs to be fixed at some point. label Jan 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Issues pertaining the friendly API enhancement New feature or request P2 Priority of the issue for triage purpose: Needs to be fixed at some point.
Projects
None yet
Development

No branches or pull requests

6 participants