Added a test showing example of text classification using TensorFlow in ML.Net #2302

zeahmed · 2019-01-29T05:14:54Z

This PR fixes #2301.

Also updated the TensorFlow runtime from 1.10.0 -> 1.12.0

zeahmed · 2019-01-29T05:16:36Z

test/Microsoft.ML.Tests/ScenariosWithDirectInstantiation/TensorflowTests.cs

+        }
+
+        [ConditionalFact(typeof(Environment), nameof(Environment.Is64BitProcess))]
+        public void TensorFlowSentimentClassificationTest()


The test is going to fail as Microsoft.ML.TensorFlow.TestModels nuget is not updated yet. #Resolved

zeahmed · 2019-01-29T05:17:00Z

test/Microsoft.ML.Tests/Transformers/ValueMappingTests.cs

@@ -11,6 +11,7 @@
 using Microsoft.ML.Model;
 using Microsoft.ML.RunTests;
 using Microsoft.ML.Tools;
+using Microsoft.ML.Transforms;


remove it. #Resolved

codecov · 2019-01-29T05:51:51Z

Codecov Report

Merging #2302 into master will increase coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #2302      +/-   ##
==========================================
+ Coverage   71.13%   71.15%   +0.01%     
==========================================
  Files         779      779              
  Lines      140271   140308      +37     
  Branches    16046    16047       +1     
==========================================
+ Hits        99788    99834      +46     
+ Misses      36032    36023       -9     
  Partials     4451     4451

Flag	Coverage Δ
#Debug	`71.15% <100%> (+0.01%)`	⬆️
#production	`67.57% <100%> (ø)`	⬆️
#test	`85.07% <100%> (+0.01%)`	⬆️

eerhardt · 2019-01-29T15:47:46Z

test/Microsoft.ML.Tests/ScenariosWithDirectInstantiation/TensorflowTests.cs

+                   separatorChar: ','
+               );
+
+            var estimator = new WordTokenizingEstimator(mlContext, new[]{


Can you use the mlContext methods to create these estimators? I think #2100 is making all these constructors internal. #Resolved

eerhardt · 2019-01-29T15:49:22Z

test/Microsoft.ML.Tests/ScenariosWithDirectInstantiation/TensorflowTests.cs

+            Array.Resize(ref processedData.Features, 600);
+            var prediction = tfEnginePipe.Predict(processedData);
+
+            Assert.Equal(2, prediction.Prediction.Length);


Can we verify that the predictions were somewhat correct? #Resolved

abgoswam · 2019-01-29T16:41:38Z

test/Microsoft.ML.Tests/ScenariosWithDirectInstantiation/TensorflowTests.cs

+            var estimator = new WordTokenizingEstimator(mlContext, new[]{
+                    new WordTokenizingTransformer.ColumnInfo("Sentiment_Text", "TokenizedWords")
+                }).Append(new ValueMappingEstimator(mlContext, lookupMap, "Words", "Ids", new[] { ("TokenizedWords", "Features") }));
+            var dataPipe = estimator.Fit(dataView)


dataPipe [](start = 16, length = 8)

is there a particular reason why we have dataPipe and tfEnginePipe separate ? #Resolved

Added comments in the code.

In reply to: 251916406 [](ancestors = 251916406)

abgoswam · 2019-01-29T16:44:12Z

test/Microsoft.ML.Tests/ScenariosWithDirectInstantiation/TensorflowTests.cs

+            var dataPipe = estimator.Fit(dataView)
+                .CreatePredictionEngine<TensorFlowSentiment, TensorFlowSentiment>(mlContext);
+
+            string modelLocation = @"sentiment_model";


sentiment_model [](start = 37, length = 15)

so this TF model takes as input a vector of floats. Am i right ?

Perhaps we should add a comment how the model was created etc. #Resolved

No it takes integers as input.

In reply to: 251917695 [](ancestors = 251917695)

abgoswam · 2019-01-29T18:39:39Z

test/Microsoft.ML.Tests/ScenariosWithDirectInstantiation/TensorflowTests.cs

+
+            var estimator = new WordTokenizingEstimator(mlContext, new[]{
+                    new WordTokenizingTransformer.ColumnInfo("Sentiment_Text", "TokenizedWords")
+                }).Append(new ValueMappingEstimator(mlContext, lookupMap, "Words", "Ids", new[] { ("TokenizedWords", "Features") }));


lookupMap [](start = 63, length = 9)

so we are using the word embeddings from the lookupMap -> using the embeddings to construct Features vector -> using the Features vector as input to the TF model.

Am i right ? #Resolved

yes, correct!

In reply to: 251963454 [](ancestors = 251963454)

abgoswam · 2019-01-29T18:40:40Z

test/Microsoft.ML.Tests/ScenariosWithDirectInstantiation/TensorflowTests.cs

+
+            var estimator = new WordTokenizingEstimator(mlContext, new[]{
+                    new WordTokenizingTransformer.ColumnInfo("Sentiment_Text", "TokenizedWords")
+                }).Append(new ValueMappingEstimator(mlContext, lookupMap, "Words", "Ids", new[] { ("TokenizedWords", "Features") }));


ValueMappingEstimator [](start = 30, length = 21)

we should have MLContext extension for it. Looks like its missing presently #Resolved

abgoswam · 2019-01-29T19:14:05Z

test/Microsoft.ML.Tests/ScenariosWithDirectInstantiation/TensorflowTests.cs

+            // Then this integer vector is retrieved from the pipeline and resized to fixed length.
+            // The second pipeline takes the resized integer vector and passed to TensoFlow and get the classification scores.
+            var estimator = mlContext.Transforms.Text.TokenizeWords("Sentiment_Text", "TokenizedWords")
+                .Append(mlContext.Transforms.Conversion.ValueMap(lookupMap, "Words", "Ids", new[] { ("TokenizedWords", "Features") }));


ValueMap [](start = 56, length = 8)

Is this transform doing the re-sizing to the fixed length ?

Does it matter what the fixed length is ? I presume the model was built with fixed length shaped input. But I do not see the shape specified #Resolved

resizing is done at 893 in C# code.

In reply to: 251976925 [](ancestors = 251976925)

I do not see shape specified in line 893

In reply to: 251977773 [](ancestors = 251977773,251976925)

sorry not 893. Its line 899

In reply to: 251978523 [](ancestors = 251978523,251977773,251976925)

We are calling the Predict API and hence in Line #899 we can resize the output of the first pipeline.

How would this work for the 'Transform' API... where the testData has 2 rows (say)

row1 -> "Hi" -> dimension 50 (say)
row2 -> "(some long sentence)" -> dimension 5000 (say)

Will it work ?

In reply to: 251978928 [](ancestors = 251978928,251978523,251977773,251976925)

Here we are not training the TF model at all. It is just the prediction pipeline. For the case you are mentioning, it would require the same resize operation on dataview instead of single prediction.

In reply to: 251981308 [](ancestors = 251981308,251978928,251978523,251977773,251976925)

I understand we are not training the TF model. The Fit() for the TFTransform would not do anyting in this example.

I wanted to know if the re-size operation on dataview would be supported -- If it is supported, can we add it to the unit test with at least 2 rows of text data + use of the Transform API ?

This test case does single prediction (use of Predict API) but does not show use of the Transform API where we would have to re-size more than just 1 row of variable length vector.

In reply to: 252003040 [](ancestors = 252003040,251981308,251978928,251978523,251977773,251976925)

This is actually not in the scope of this test. I will try to add more training related test later on but not in this PR because of the scope.

In reply to: 252020455 [](ancestors = 252020455,252003040,251981308,251978928,251978523,251977773,251976925)

abgoswam · 2019-01-29T19:16:34Z

test/Microsoft.ML.Tests/ScenariosWithDirectInstantiation/TensorflowTests.cs

+                   separatorChar: ','
+               );
+
+            // We cannot resize variable length vector to fixed lenght vector in ML.Net


lenght [](start = 64, length = 6)

typo #Resolved

ericstj

Tensorflow update LGTM

wschin · 2019-01-29T20:32:11Z

test/Microsoft.ML.Tests/ScenariosWithDirectInstantiation/TensorflowTests.cs

+            var prediction = tfEnginePipe.Predict(processedData);
+
+            Assert.Equal(2, prediction.Prediction.Length);
+            Assert.InRange(prediction.Prediction[1], 0.650032759 - 0.01, 0.650032759 + 0.01);


Please use Assert.Equal. If there are only two prediction values, can we check them all? #Resolved

What do you mean by Assert.Equal? Here we are checking the range within particular threshold.
No need to check another value. These are probabilities.

In reply to: 252005383 [](ancestors = 252005383)

ok, I got you what you meant with Assert.Equal. I actually want to check if my values are in range e.g. 0.64 <= prediction <= 0.66 which I cannot do with Assert.Equal, can I?
Also, I feel InRange more readable than other when asserting thresholds.

In reply to: 252006645 [](ancestors = 252006645,252005383)

You have tolerance in Assert.Equal. #Resolved

It uses number of decimal places which is not applicable here.

In reply to: 252050957 [](ancestors = 252050957)

We could trim 0.650032759 to 0.65, if we're comparing as ± 0.01.

wschin · 2019-01-29T20:33:02Z

test/Microsoft.ML.Tests/ScenariosWithDirectInstantiation/TensorflowTests.cs

+                   separatorChar: ','
+               );
+
+            // We cannot resize variable length vector to fixed length vector in ML.Net


ML.Net? ML.NET? @shauheen, any comment? #Resolved

.NET is always stylized as .NET.

https://twitter.com/terrajobst/status/1064638607707631616 #Resolved

ok.

In reply to: 252005632 [](ancestors = 252005632)

wschin · 2019-01-29T20:34:31Z

test/Microsoft.ML.Tests/ScenariosWithDirectInstantiation/TensorflowTests.cs

+
+            // We cannot resize variable length vector to fixed length vector in ML.Net
+            // The trick here is to create two pipelines.
+            // The first pipeline tokenzies the strings into words and maps the words to an integer which is an index in the dictionary.


string"s"? I only see one string data. Also, is it "each word" instead of "words"? #Resolved

wschin · 2019-01-29T20:36:26Z

test/Microsoft.ML.Tests/ScenariosWithDirectInstantiation/TensorflowTests.cs

+            // The trick here is to create two pipelines.
+            // The first pipeline tokenzies the strings into words and maps the words to an integer which is an index in the dictionary.
+            // Then this integer vector is retrieved from the pipeline and resized to fixed length.
+            // The second pipeline takes the resized integer vector and passed to TensoFlow and get the classification scores.


It's not straightforward to user to understand what are "first pipeline" and "second pipeline". Please cite their variable names here or move those statements to the associated pipeline declarations. #Resolved

wschin · 2019-01-29T20:38:15Z

src/Microsoft.ML.Data/Transforms/ConversionsExtensionsCatalog.cs

@@ -141,5 +141,19 @@ public static KeyToValueMappingEstimator MapKeyToValue(this TransformsCatalog.Co
            IEnumerable<TOutputType> values,
            params (string source, string name)[] columns)
            => new ValueMappingEstimator<TInputType, TOutputType>(CatalogUtils.GetEnvironment(catalog), keys, values, columns);
+
+        /// <summary>
+        /// Maps specified keys to specified values


This doesn't explain how this transform works. #Resolved

wschin

Overall looks good but I have some comments around docs and I hope they can be addressed before merging.

eerhardt · 2019-01-29T20:59:43Z

src/Microsoft.ML.Data/Transforms/ConversionsExtensionsCatalog.cs

+        /// <returns></returns>
+        public static ValueMappingEstimator ValueMap(
+            this TransformsCatalog.ConversionTransforms catalog,
+            IDataView lookupMap, string keyColumn, string valueColumn, params (string input, string output)[] columns)


@TomFinley @sfilipi - is this consistent with the order we've decided on with #2064? #Resolved

Any thoughts guys? I saw the method above has same pattern so I followed that.

In reply to: 252014795 [](ancestors = 252014795)

(string outputColumnName, string inputColumnName)

You'll see that if you update to latest.

In reply to: 252027697 [](ancestors = 252027697,252014795)

wschin · 2019-01-29T22:53:33Z

src/Microsoft.ML.Data/Transforms/ConversionsExtensionsCatalog.cs

-        /// Maps specified keys to specified values
+        /// Maps the <paramref name="columns.input"/> using the keys in the dictionary to the values of dictionary.
+        /// In this case, the <paramref name="lookupMap"/> is used as a dictionary where <paramref name="keyColumn"/>
+        /// and <paramref name="valueColumn"/> specify the keys and values of dictionary respectively.


Can you add a value x in the input would be mapped to value stored in dictionary[x]? #Resolved

sfilipi · 2019-01-29T23:27:14Z

test/Microsoft.ML.Tests/ScenariosWithDirectInstantiation/TensorflowTests.cs

+            // The first pipeline 'dataPipe' tokenzies the string into words and maps each word to an integer which is an index in the dictionary.
+            // Then this integer vector is retrieved from the pipeline and resized to fixed length.
+            // The second pipeline 'tfEnginePipe' takes the resized integer vector and passed to TensoFlow and get the classification scores.
+            var estimator = mlContext.Transforms.Text.TokenizeWords("Sentiment_Text", "TokenizedWords")


okenizeWords("Sentiment_Text", "TokenizedWords" [](start = 55, length = 47)

if you rebase to latest, you'll have to swap those. #Resolved

sfilipi

zeahmed · 2019-01-30T00:53:04Z

Thanks all for the useful comments/feedback.

zeahmed added 5 commits January 24, 2019 11:18

Added support for loading map from file through dataview.

47b757a

Added test to show tensorflow text classification scenario.

8415c9a

Updated TensorFlow version.

3e4bbcd

Resolved merge conflicts.

2d83c15

Corrected file paths.

8397ac0

zeahmed requested review from Ivanidzo4ka, eerhardt, abgoswam, yaeldekel and sfilipi January 29, 2019 05:14

zeahmed commented Jan 29, 2019

View reviewed changes

zeahmed changed the title ~~Create a test showing example of text classification in TensorFlow~~ Added a test showing example of text classification using TensorFlow in ML.Net Jan 29, 2019

eerhardt reviewed Jan 29, 2019

View reviewed changes

abgoswam reviewed Jan 29, 2019

View reviewed changes

Addressed reviewers' comments.

0eb434b

abgoswam reviewed Jan 29, 2019

View reviewed changes

Addressed reviewers' comments.

fdc0868

eerhardt mentioned this pull request Jan 29, 2019

Add a Functional.Tests project that doesn't have InternalsVisibleTo #2306

Closed

ericstj approved these changes Jan 29, 2019

View reviewed changes

Fixing a problem in path.

daa4333

wschin reviewed Jan 29, 2019

View reviewed changes

wschin suggested changes Jan 29, 2019

View reviewed changes

Addressed reviewers' comments.

0cc516e

eerhardt reviewed Jan 29, 2019

View reviewed changes

Addressed reviewers' comments.

ddbd9da

wschin reviewed Jan 29, 2019

View reviewed changes

Addressed reviewers' comments.

57e730c

sfilipi reviewed Jan 29, 2019

View reviewed changes

sfilipi approved these changes Jan 29, 2019

View reviewed changes

Merge remote-tracking branch 'upstream/master' into TF_txt_sample

18f5f78

wschin approved these changes Jan 29, 2019

View reviewed changes

Merged with base and addressed reviewers' comments.

f984e0f

zeahmed merged commit b4c1066 into dotnet:master Jan 30, 2019

ghost locked as resolved and limited conversation to collaborators Mar 25, 2022

Added a test showing example of text classification using TensorFlow in ML.Net #2302

Added a test showing example of text classification using TensorFlow in ML.Net #2302

Conversation

zeahmed commented Jan 29, 2019 • edited Loading

zeahmed Jan 29, 2019 • edited Loading

Choose a reason for hiding this comment

zeahmed Jan 29, 2019 • edited Loading

Choose a reason for hiding this comment

codecov bot commented Jan 29, 2019 • edited Loading

Codecov Report

eerhardt Jan 29, 2019 • edited by zeahmed Loading

Choose a reason for hiding this comment

eerhardt Jan 29, 2019 • edited by zeahmed Loading

Choose a reason for hiding this comment

abgoswam Jan 29, 2019 • edited by zeahmed Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abgoswam Jan 29, 2019 • edited by zeahmed Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abgoswam Jan 29, 2019 • edited by zeahmed Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abgoswam Jan 29, 2019 • edited by zeahmed Loading

Choose a reason for hiding this comment

abgoswam Jan 29, 2019 • edited by zeahmed Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abgoswam Jan 29, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abgoswam Jan 29, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abgoswam Jan 29, 2019 • edited by zeahmed Loading

Choose a reason for hiding this comment

ericstj left a comment

Choose a reason for hiding this comment

wschin Jan 29, 2019 • edited by zeahmed Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wschin Jan 29, 2019 • edited by zeahmed Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

justinormont Jan 30, 2019 • edited Loading

Choose a reason for hiding this comment

wschin Jan 29, 2019 • edited by zeahmed Loading

Choose a reason for hiding this comment

eerhardt Jan 29, 2019 • edited by zeahmed Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wschin Jan 29, 2019 • edited by zeahmed Loading

Choose a reason for hiding this comment

wschin Jan 29, 2019 • edited by zeahmed Loading

Choose a reason for hiding this comment

wschin Jan 29, 2019 • edited by zeahmed Loading

Choose a reason for hiding this comment

wschin left a comment

Choose a reason for hiding this comment

eerhardt Jan 29, 2019 • edited by zeahmed Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wschin Jan 29, 2019 • edited by zeahmed Loading

Choose a reason for hiding this comment

sfilipi Jan 29, 2019 • edited by zeahmed Loading

Choose a reason for hiding this comment

sfilipi left a comment

Choose a reason for hiding this comment

zeahmed commented Jan 30, 2019

zeahmed commented Jan 29, 2019 •

edited

Loading

zeahmed Jan 29, 2019 •

edited

Loading

zeahmed Jan 29, 2019 •

edited

Loading

codecov bot commented Jan 29, 2019 •

edited

Loading

eerhardt Jan 29, 2019 •

edited by zeahmed

Loading

eerhardt Jan 29, 2019 •

edited by zeahmed

Loading

abgoswam Jan 29, 2019 •

edited by zeahmed

Loading

abgoswam Jan 29, 2019 •

edited by zeahmed

Loading

abgoswam Jan 29, 2019 •

edited by zeahmed

Loading

abgoswam Jan 29, 2019 •

edited by zeahmed

Loading

abgoswam Jan 29, 2019 •

edited by zeahmed

Loading

abgoswam Jan 29, 2019 •

edited

Loading

abgoswam Jan 29, 2019 •

edited

Loading

abgoswam Jan 29, 2019 •

edited by zeahmed

Loading

wschin Jan 29, 2019 •

edited by zeahmed

Loading

wschin Jan 29, 2019 •

edited by zeahmed

Loading

justinormont Jan 30, 2019 •

edited

Loading

wschin Jan 29, 2019 •

edited by zeahmed

Loading

eerhardt Jan 29, 2019 •

edited by zeahmed

Loading

wschin Jan 29, 2019 •

edited by zeahmed

Loading

wschin Jan 29, 2019 •

edited by zeahmed

Loading

wschin Jan 29, 2019 •

edited by zeahmed

Loading

eerhardt Jan 29, 2019 •

edited by zeahmed

Loading

wschin Jan 29, 2019 •

edited by zeahmed

Loading

sfilipi Jan 29, 2019 •

edited by zeahmed

Loading