Skip to content

Created samples for 'FeaturizeText' API. #3120

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 28, 2019

Conversation

zeahmed
Copy link
Contributor

@zeahmed zeahmed commented Mar 27, 2019

Related to #1209.

// as well as the source of randomness.
var mlContext = new MLContext();

// Get a small dataset as an IEnumerable.
Copy link

@shmoradims shmoradims Mar 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get [](start = 15, length = 3)

Create / Define #Resolved

// Get a small dataset as an IEnumerable.
var samples = new List<TextData>()
{
new TextData(){ Text ="ML.NET's FeaturizeText API uses a composition of several basic transforms to convert text into numeric features." },
Copy link

@shmoradims shmoradims Mar 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra space? = " #Resolved

@codecov
Copy link

codecov bot commented Mar 27, 2019

Codecov Report

Merging #3120 into master will decrease coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3120      +/-   ##
==========================================
- Coverage   72.52%   72.51%   -0.01%     
==========================================
  Files         808      808              
  Lines      144665   144665              
  Branches    16198    16198              
==========================================
- Hits       104913   104910       -3     
- Misses      35342    35344       +2     
- Partials     4410     4411       +1
Flag Coverage Δ
#Debug 72.51% <ø> (-0.01%) ⬇️
#production 68.11% <ø> (-0.01%) ⬇️
#test 88.81% <ø> (ø) ⬆️
Impacted Files Coverage Δ
src/Microsoft.ML.Transforms/Text/LdaTransform.cs 89.26% <0%> (-0.63%) ⬇️
...StandardTrainers/Standard/LinearModelParameters.cs 60.05% <0%> (-0.27%) ⬇️
...soft.ML.Data/DataLoadSave/Text/TextLoaderCursor.cs 84.7% <0%> (-0.21%) ⬇️
src/Microsoft.ML.Maml/MAML.cs 26.21% <0%> (+1.45%) ⬆️

@codecov
Copy link

codecov bot commented Mar 27, 2019

Codecov Report

Merging #3120 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #3120   +/-   ##
=======================================
  Coverage   72.52%   72.52%           
=======================================
  Files         808      808           
  Lines      144665   144665           
  Branches    16198    16198           
=======================================
  Hits       104913   104913           
+ Misses      35342    35341    -1     
- Partials     4410     4411    +1
Flag Coverage Δ
#Debug 72.52% <ø> (ø) ⬆️
#production 68.12% <ø> (ø) ⬆️
#test 88.81% <ø> (ø) ⬆️
Impacted Files Coverage Δ
src/Microsoft.ML.Transforms/Text/TextCatalog.cs 41.66% <ø> (ø) ⬆️
...StandardTrainers/Standard/LinearModelParameters.cs 60.05% <0%> (-0.27%) ⬇️
...soft.ML.Data/DataLoadSave/Text/TextLoaderCursor.cs 84.7% <0%> (-0.21%) ⬇️
...ML.Transforms/Text/StopWordsRemovingTransformer.cs 86.1% <0%> (-0.16%) ⬇️
src/Microsoft.ML.Maml/MAML.cs 26.21% <0%> (+1.45%) ⬆️

// Use ML.NET's built-in stop word remover
StopWordsRemoverOptions = new StopWordsRemovingEstimator.Options() { Language = TextFeaturizingEstimator.Language.English },
WordFeatureExtractor = new WordBagEstimator.Options() { NgramLength = 1 },
CharFeatureExtractor = new WordBagEstimator.Options() { NgramLength = 1 },
Copy link

@shmoradims shmoradims Mar 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 [](start = 86, length = 1)

is this single char tokenization? it would be just the alphabets. is it ever useful? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, can be useful in some cases. but lets change it to 3-gram which is more useful.


In reply to: 269776047 [](ancestors = 269776047)

Copy link

@shmoradims shmoradims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Member

@singlis singlis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@zeahmed zeahmed merged commit 233bc2d into dotnet:master Mar 28, 2019
@zeahmed
Copy link
Contributor Author

zeahmed commented Mar 28, 2019

Thanks!

zeahmed added a commit to zeahmed/machinelearning that referenced this pull request Apr 8, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants