Created samples for 'ProduceWordBags' and 'ProduceHashedWordBags' API. #3183

zeahmed · 2019-04-02T22:45:40Z

Related to #1209.

codecov · 2019-04-02T23:26:26Z

Codecov Report

Merging #3183 into master will increase coverage by 0.04%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3183      +/-   ##
==========================================
+ Coverage   72.54%   72.58%   +0.04%     
==========================================
  Files         807      807              
  Lines      144774   144956     +182     
  Branches    16208    16212       +4     
==========================================
+ Hits       105021   105223     +202     
+ Misses      35339    35318      -21     
- Partials     4414     4415       +1

Flag	Coverage Δ
#Debug	`72.58% <ø> (+0.04%)`	⬆️
#production	`68.15% <ø> (+0.01%)`	⬆️
#test	`88.89% <ø> (+0.06%)`	⬆️

Impacted Files	Coverage Δ
src/Microsoft.ML.DataView/KeyDataViewType.cs	`74.57% <0%> (-3.76%)`	⬇️
test/Microsoft.ML.Tests/ImagesTests.cs	`98.69% <0%> (-0.13%)`	⬇️
...Microsoft.ML.Tests/Transformers/NormalizerTests.cs	`100% <0%> (ø)`	⬆️
...ML.Data/Transforms/ConversionsExtensionsCatalog.cs	`44.87% <0%> (ø)`	⬆️
...soft.ML.TestFramework/DataPipe/TestDataPipeBase.cs	`74.03% <0%> (+0.33%)`	⬆️
...soft.ML.Data/DataLoadSave/Text/TextLoaderCursor.cs	`85.11% <0%> (+0.4%)`	⬆️
...rosoft.ML.ImageAnalytics/VectorToImageTransform.cs	`76.77% <0%> (+4.53%)`	⬆️
.../Microsoft.ML.Data/Transforms/ExtensionsCatalog.cs	`100% <0%> (+4.76%)`	⬆️
...c/Microsoft.ML.ImageAnalytics/ExtensionsCatalog.cs	`16.66% <0%> (+5.55%)`	⬆️
src/Microsoft.ML.Transforms/NormalizerCatalog.cs	`84.78% <0%> (+8.11%)`	⬆️

Ivanidzo4ka · 2019-04-03T19:29:50Z

docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/Text/ProduceHashedWordBags.cs

+                new TextData(){ Text = "This is an example to compute bag-of-word features using hashing." },
+                new TextData(){ Text = "ML.NET's ProduceHashedWordBags API produces count of Ngrams and hashes it as an index into a vector of given bit length." },
+                new TextData(){ Text = "It does so by first tokenizing text/string into words/tokens then " },
+                new TextData(){ Text = "computing Ngram and hash them to the index given by hash value." },


Ngram [](start = 50, length = 5)

Ngrams ? #Resolved

Ivanidzo4ka · 2019-04-03T19:30:03Z

docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/Text/ProduceHashedWordBags.cs

+                new TextData(){ Text = "ML.NET's ProduceHashedWordBags API produces count of Ngrams and hashes it as an index into a vector of given bit length." },
+                new TextData(){ Text = "It does so by first tokenizing text/string into words/tokens then " },
+                new TextData(){ Text = "computing Ngram and hash them to the index given by hash value." },
+                new TextData(){ Text = "The hashing schem reduces the size of the output feature vector" },


schem [](start = 52, length = 5)

schema #Resolved

Ivanidzo4ka · 2019-04-03T19:32:27Z

docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/Text/ProduceHashedWordBags.cs

+
+            //  Expected output:
+            //   Number of Features: 256
+            //   Features:    0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000


0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 [](start = 30, length = 78)

This is kinda anticlimactic.
Maybe change numberOfBits to smaller value to get some results different from zero here? #Resolved

Ivanidzo4ka · 2019-04-03T19:34:05Z

docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/Text/ProduceWordBags.cs

+                new TextData(){ Text = "ML.NET's ProduceWordBags API produces bag-of-word features from input text." },
+                new TextData(){ Text = "It does so by first tokenizing text/string into words/tokens then " },
+                new TextData(){ Text = "computing Ngram and their neumeric values." },
+                new TextData(){ Text = "Each position in the output vector corresponds to a particular Ngram." },


Ngram [](start = 103, length = 5)

https://en.wikipedia.org/wiki/N-gram
We call it NGram in code because you can't use - in variable, but for explanation maybe we should use n-gram? #Resolved

Ivanidzo4ka · 2019-04-03T19:36:02Z

docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/Text/ProduceWordBags.cs

+                new TextData(){ Text = "Each position in the output vector corresponds to a particular Ngram." },
+                new TextData(){ Text = "The value at each position corresponds to," },
+                new TextData(){ Text = "the number of times Ngram occured in the data (Tf), or" },
+                new TextData(){ Text = "the inverse of the number of documents contain the Ngram (Idf), or." },


or. [](start = 104, length = 3)

double or, plus dot in the end. I would omit this or. #Resolved

rogancarr · 2019-04-04T00:19:42Z

docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/Text/ProduceWordBags.cs

+            var transformedDataView = textTransformer.Transform(dataview);
+
+            // Create the prediction engine to get the bag-of-word features extracted from the text.
+            var predictionEngine = mlContext.Model.CreatePredictionEngine<TextData, TransformedTextData>(textTransformer);


Why predictionEngine rather than TakeRows and ConvertToEnumerable? I would use the one we would recommend people to use to inspect data in practice.

rogancarr · 2019-04-04T00:21:38Z

docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/Text/ProduceHashedWordBags.cs

+
+            // Preview of the produced n-grams.
+            // Get the slot names from the column's metadata.
+            // If the column is a vector column the slot names corresponds to the names associated with each position in the vector.


If the column is a vector column [](start = 15, length = 32)

The If clause is confusing here. I would just drop it. #Resolved

rogancarr

Approved with comments.

🔵

Ivanidzo4ka

dotnet#3183)

Created samples for 'ProduceWordBags' and 'ProduceHashedWordBags' API.

9471b5d

zeahmed requested review from Ivanidzo4ka, shmoradims, sfilipi and rogancarr April 2, 2019 22:45

sfilipi mentioned this pull request Apr 2, 2019

API reference - Samples for Transforms #1209

Closed

Updated comments!

1b2daca

Ivanidzo4ka reviewed Apr 3, 2019

View reviewed changes

zeahmed added 2 commits April 3, 2019 14:15

Addressed reviewers' comments.

202f051

Addressed reviewers' comments.

d288792

rogancarr reviewed Apr 4, 2019

View reviewed changes

rogancarr approved these changes Apr 4, 2019

View reviewed changes

Changed input/output classes to private.

bdca2a5

Ivanidzo4ka approved these changes Apr 4, 2019

View reviewed changes

Addressed reviewers' comments.

bc60f09

zeahmed merged commit 24645ff into dotnet:master Apr 4, 2019

zeahmed added a commit to zeahmed/machinelearning that referenced this pull request Apr 8, 2019

Created samples for 'ProduceWordBags' and 'ProduceHashedWordBags' API. (

e87ae08

dotnet#3183)

zeahmed mentioned this pull request Apr 8, 2019

Cherry pick for samples (Text) #3240

Closed

ghost locked as resolved and limited conversation to collaborators Mar 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Created samples for 'ProduceWordBags' and 'ProduceHashedWordBags' API. #3183

Created samples for 'ProduceWordBags' and 'ProduceHashedWordBags' API. #3183

Uh oh!

zeahmed commented Apr 2, 2019

Uh oh!

codecov bot commented Apr 2, 2019 •

edited

Loading

Uh oh!

Ivanidzo4ka Apr 3, 2019 •

edited by zeahmed

Loading

Uh oh!

Ivanidzo4ka Apr 3, 2019 •

edited by zeahmed

Loading

Uh oh!

Ivanidzo4ka Apr 3, 2019 •

edited by zeahmed

Loading

Uh oh!

Ivanidzo4ka Apr 3, 2019 •

edited by zeahmed

Loading

Uh oh!

Ivanidzo4ka Apr 3, 2019 •

edited by zeahmed

Loading

Uh oh!

rogancarr Apr 4, 2019

Uh oh!

rogancarr Apr 4, 2019 •

edited by zeahmed

Loading

Uh oh!

rogancarr left a comment •

edited

Loading

Uh oh!

Ivanidzo4ka left a comment

Uh oh!

Uh oh!

Created samples for 'ProduceWordBags' and 'ProduceHashedWordBags' API. #3183

Created samples for 'ProduceWordBags' and 'ProduceHashedWordBags' API. #3183

Uh oh!

Conversation

zeahmed commented Apr 2, 2019

Uh oh!

codecov bot commented Apr 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Ivanidzo4ka Apr 3, 2019 • edited by zeahmed Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Ivanidzo4ka Apr 3, 2019 • edited by zeahmed Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Ivanidzo4ka Apr 3, 2019 • edited by zeahmed Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Ivanidzo4ka Apr 3, 2019 • edited by zeahmed Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Ivanidzo4ka Apr 3, 2019 • edited by zeahmed Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rogancarr Apr 4, 2019

Choose a reason for hiding this comment

Uh oh!

rogancarr Apr 4, 2019 • edited by zeahmed Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rogancarr left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Ivanidzo4ka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Apr 2, 2019 •

edited

Loading

Ivanidzo4ka Apr 3, 2019 •

edited by zeahmed

Loading

Ivanidzo4ka Apr 3, 2019 •

edited by zeahmed

Loading

Ivanidzo4ka Apr 3, 2019 •

edited by zeahmed

Loading

Ivanidzo4ka Apr 3, 2019 •

edited by zeahmed

Loading

Ivanidzo4ka Apr 3, 2019 •

edited by zeahmed

Loading

rogancarr Apr 4, 2019 •

edited by zeahmed

Loading

rogancarr left a comment •

edited

Loading