Skip to content

Move Normalizer extension method from experimental to stable nuget and remove Normalizer generic APIs #3118

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 2, 2019

Conversation

codemzs
Copy link
Member

@codemzs codemzs commented Mar 27, 2019

fixes #3109
fixes #3161

@codemzs codemzs requested review from wschin and artidoro March 27, 2019 20:12
@TomFinley TomFinley requested review from eerhardt and TomFinley March 28, 2019 00:30
@eerhardt
Copy link
Member

eerhardt commented Mar 28, 2019

Do we also want to get rid of this method?

public static NormalizingEstimator Normalize(this TransformsCatalog catalog,
           string outputColumnName, string inputColumnName = null,
            NormalizingEstimator.NormalizationMode mode = NormalizingEstimator.NormalizationMode.MinMax)
            => new NormalizingEstimator(CatalogUtils.GetEnvironment(catalog), outputColumnName, inputColumnName ?? outputColumnName, mode);

Do we really need a public method that takes an enum (but doesn't allow you to specify the specific values) as well as other public methods that basically just extract that enum into individual overloads. #Resolved

@TomFinley
Copy link
Contributor

TomFinley commented Mar 28, 2019

Wonder what @artidoro thinks, should we add the multi-column mapping capability here? It seems more or less harmless to do so. #Resolved

@artidoro
Copy link
Contributor

artidoro commented Mar 28, 2019

Regarding multi-column mapping, I am not opposed to it here. My only concern is that we will have a lot of overloads for the normalizers, especially if we add another one with an options class in the future. But since they are widely used it might be worth it to add them. They were on the list of transforms that @TomFinley and @glebuk identified as candidates for multi-column mapping with InputOutputColumnPair. #Resolved

@TomFinley
Copy link
Contributor

TomFinley commented Mar 29, 2019

Do we also want to get rid of this method?

public static NormalizingEstimator Normalize(this TransformsCatalog catalog,
           string outputColumnName, string inputColumnName = null,
            NormalizingEstimator.NormalizationMode mode = NormalizingEstimator.NormalizationMode.MinMax)
            => new NormalizingEstimator(CatalogUtils.GetEnvironment(catalog), outputColumnName, inputColumnName ?? outputColumnName, mode);

Do we really need a public method that takes an enum (but doesn't allow you to specify the specific values) as well as other public methods that basically just extract that enum into individual overloads.

@eerhardt I'd totally get behind that... only trouble is that's definitely a breaking change, so wondering what @shauheen thinks. #Resolved

@eerhardt
Copy link
Member

eerhardt commented Mar 29, 2019

If we really think that it's the right thing to do, we have time to do it now.

The alternative is to live with this API forever. #Resolved

@codemzs
Copy link
Member Author

codemzs commented Apr 1, 2019

@[email protected] @[email protected] This PR removes such redundant methods, please review and leave your autograph here: #3116


In reply to: 478105437 [](ancestors = 478105437)

@TomFinley
Copy link
Contributor

As discussed elsewhere it is usually a bad idea to have PRs that leave the codebase in a suboptimal state until they're all checked in, this is usually a sign that they belonged together in the first place. Anyway, thanks for combining them.


In reply to: 478783520 [](ancestors = 478783520,478105437)

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Apr 2, 2019

using Microsoft.ML.Experimental;

is this one still needed? #Resolved


Refers to: test/Microsoft.ML.Tests/Transformers/NormalizerTests.cs:10 in d705dc2. [](commit_id = d705dc2, deletion_comment = False)

@codecov
Copy link

codecov bot commented Apr 2, 2019

Codecov Report

Merging #3118 into master will increase coverage by <.01%.
The diff coverage is 90.62%.

@@            Coverage Diff             @@
##           master    #3118      +/-   ##
==========================================
+ Coverage   72.53%   72.54%   +<.01%     
==========================================
  Files         808      807       -1     
  Lines      144775   144774       -1     
  Branches    16209    16208       -1     
==========================================
+ Hits       105012   105020       +8     
+ Misses      35348    35341       -7     
+ Partials     4415     4413       -2
Flag Coverage Δ
#Debug 72.54% <90.62%> (ø) ⬆️
#production 68.12% <75%> (ø) ⬆️
#test 88.82% <100%> (ø) ⬆️
Impacted Files Coverage Δ
src/Microsoft.ML.Data/Transforms/Normalizer.cs 86.03% <ø> (ø) ⬆️
...s/Scenarios/Api/CookbookSamples/CookbookSamples.cs 99.49% <ø> (ø) ⬆️
...c/Microsoft.ML.SamplesUtils/SamplesDatasetUtils.cs 24.46% <0%> (ø) ⬆️
...icrosoft.ML.Functional.Tests/DataTransformation.cs 100% <100%> (ø) ⬆️
...est/Microsoft.ML.Tests/FeatureContributionTests.cs 98.55% <100%> (ø) ⬆️
...ios/IrisPlantClassificationWithStringLabelTests.cs 98.63% <100%> (ø) ⬆️
test/Microsoft.ML.Tests/OnnxConversionTest.cs 97.22% <100%> (ø) ⬆️
...soft.ML.Tests/PermutationFeatureImportanceTests.cs 100% <100%> (ø) ⬆️
...irectInstantiation/IrisPlantClassificationTests.cs 100% <100%> (ø) ⬆️
test/Microsoft.ML.Functional.Tests/Training.cs 100% <100%> (ø) ⬆️
... and 10 more

@codemzs
Copy link
Member Author

codemzs commented Apr 2, 2019

using Microsoft.ML.Experimental;

Yep for GetColumnPairs API...


In reply to: 478799620 [](ancestors = 478799620)


Refers to: test/Microsoft.ML.Tests/Transformers/NormalizerTests.cs:10 in d705dc2. [](commit_id = d705dc2, deletion_comment = False)

@codemzs codemzs changed the title Move Normalizer extension method from experimental to stable nuget. Move Normalizer extension method from experimental to stable nuget and remove Normalizer generic APIs Apr 2, 2019
@codemzs
Copy link
Member Author

codemzs commented Apr 2, 2019

Spoke offline. We will add it in a separate PR with its own issue and it needs to go through shiproom.


In reply to: 477744551 [](ancestors = 477744551)

Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @codemzs !

@codemzs codemzs merged commit fde1ab7 into dotnet:master Apr 2, 2019
shauheen pushed a commit to shauheen/machinelearning that referenced this pull request Apr 2, 2019
…d remove Normalizer generic APIs (dotnet#3118)

* Move Normalizer extension method from experimental to stable nuget.

* Cleanup unused method in Normalizer Estimator.

* remove normalizer estimator catalog methods that take enum as parameter.

* Remove Microsoft.ML.Experimental references in CS files.

* merge fix.

* cleanup.

* cleanup.

* PR feedback.
@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
6 participants