Skip to content

Added tests for text featurizer options (Part2). #3036

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Mar 21, 2019

Conversation

zeahmed
Copy link
Contributor

@zeahmed zeahmed commented Mar 20, 2019

This PR finally fixes #2967. Test created in this PR are for the following parameters in options class

  • WordNgramExtractor
  • CharNgramExtractor
  • Numeric Feature Normalizer (L1, L2, etc).

The intend here is to test that TextFeaturizer is instantiated for every parameter in the options class. Here, we are not testing the internal components of TextFeaturizer.


var prediction = engine.Predict(data[0]);
Assert.Equal("this is some text in english", string.Join(" ", prediction.OutputTokens));
Assert.Equal(1.0f, prediction.Features[0]);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Mar 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assert.Equal(1.0f, prediction.Features[0]) [](start = 12, length = 42)

Doesn't Assert has option to compare to arrays or enumerables? #Resolved


prediction = engine.Predict(data[1]);
Assert.Equal("xyz", string.Join(" ", prediction.OutputTokens));
Assert.Equal(1.0f, prediction.Features[0]);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Mar 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assert.Equal(1.0f, prediction.Features[0]); [](start = 12, length = 43)

So i expect ngrams to be a,b,c,e,f,g,x,y,z, and end of string and empty string, not sure honestly, I guess. total 12 ngrams.
feature 0 and feature 8 is this end of string and empty string, right? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index 0 is start marker and index 8 is end marker. Then there are total 10 characters including space.


In reply to: 267486364 [](ancestors = 267486364)

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@codecov
Copy link

codecov bot commented Mar 20, 2019

Codecov Report

Merging #3036 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #3036      +/-   ##
==========================================
+ Coverage    72.5%    72.5%   +<.01%     
==========================================
  Files         804      804              
  Lines      144077   144150      +73     
  Branches    16179    16179              
==========================================
+ Hits       104462   104519      +57     
- Misses      35198    35220      +22     
+ Partials     4417     4411       -6
Flag Coverage Δ
#Debug 72.5% <100%> (ø) ⬆️
#production 68.13% <ø> (-0.02%) ⬇️
#test 88.72% <100%> (+0.02%) ⬆️
Impacted Files Coverage Δ
...osoft.ML.Tests/Transformers/TextFeaturizerTests.cs 99.78% <100%> (+0.03%) ⬆️
src/Microsoft.ML.Core/Data/ProgressReporter.cs 70.95% <0%> (-6.99%) ⬇️
src/Microsoft.ML.Maml/MAML.cs 24.75% <0%> (-1.46%) ⬇️
...soft.ML.Data/DataLoadSave/Text/TextLoaderCursor.cs 84.7% <0%> (-0.21%) ⬇️
...ML.Transforms/Text/StopWordsRemovingTransformer.cs 86.1% <0%> (-0.16%) ⬇️
.../Standard/LogisticRegression/LbfgsPredictorBase.cs 71.26% <0%> (ø) ⬆️
src/Microsoft.ML.FastTree/TreeTrainersCatalog.cs 94.18% <0%> (ø) ⬆️
...crosoft.ML.StandardTrainers/Standard/SdcaBinary.cs 72.68% <0%> (ø) ⬆️
src/Microsoft.ML.Data/TrainCatalog.cs 84.18% <0%> (+0.07%) ⬆️
src/Microsoft.ML.Transforms/Text/LdaTransform.cs 89.89% <0%> (+0.62%) ⬆️
... and 1 more

var engine = model.CreatePredictionEngine<TestClass, TestClass>(ML);

var prediction = engine.Predict(data[0]);
Assert.Equal("this is some text in english", string.Join(" ", prediction.OutputTokens));
Copy link
Member

@sfilipi sfilipi Mar 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"this is some text in english" [](start = 25, length = 30)

nit, but for maintainability i'd create a var for this. #Resolved


var prediction = engine.Predict(data[0]);
Assert.Equal("abc efg", string.Join(" ", prediction.OutputTokens));
var expected = new float[] { 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 0.0f, 0.0f, 0.0f };
Copy link
Member

@sfilipi sfilipi Mar 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.0f [](start = 89, length = 4)

should this be 0? or is this the end marker? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its the end marker.


In reply to: 267967436 [](ancestors = 267967436)


var options = new TextFeaturizingEstimator.Options()
{
CharFeatureExtractor = new WordBagEstimator.Options() { NgramLength = 1},
Copy link
Member

@sfilipi sfilipi Mar 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CharFeatureExtractor = new WordBagEstimator.Options() [](start = 16, length = 54)

Is this correct? it doesn't read right to initialize a CharExtractor with the options of a WordBagEstimator... #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! I have added this as a part of issue in #2895.


In reply to: 267968148 [](ancestors = 267968148)

Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@zeahmed zeahmed merged commit 1e71b57 into dotnet:master Mar 21, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

More tests for TextFeaturizer.
3 participants