Add NaiveBayes sample & docs #3246

ganik · 2019-04-08T21:49:19Z

codecov · 2019-04-08T22:29:56Z

Codecov Report

Merging #3246 into master will increase coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3246      +/-   ##
==========================================
+ Coverage   72.61%   72.62%   +<.01%     
==========================================
  Files         804      807       +3     
  Lines      145025   145080      +55     
  Branches    16213    16213              
==========================================
+ Hits       105314   105366      +52     
- Misses      35294    35297       +3     
  Partials     4417     4417

Flag	Coverage Δ
#Debug	`72.62% <ø> (ø)`	⬆️
#production	`68.17% <ø> (+0.01%)`	⬆️
#test	`88.93% <ø> (ø)`	⬆️

Impacted Files	Coverage Δ
src/Microsoft.ML.Maml/MAML.cs	`24.75% <0%> (-1.46%)`	⬇️
src/Microsoft.ML.Transforms/Text/LdaTransform.cs	`89.26% <0%> (-0.63%)`	⬇️
...soft.ML.Data/DataLoadSave/Text/TextLoaderCursor.cs	`84.7% <0%> (-0.21%)`	⬇️
...OnnxTransformer.StaticPipe/OnnxStaticExtensions.cs	`100% <0%> (ø)`
...L.DnnImageFeaturizer.ResNet18/ResNet18Extension.cs	`100% <0%> (ø)`
...r.StaticPipe/DnnImageFeaturizerStaticExtensions.cs	`100% <0%> (ø)`
...ML.Transforms/Text/StopWordsRemovingTransformer.cs	`86.26% <0%> (+0.15%)`	⬆️
...StandardTrainers/Standard/LinearModelParameters.cs	`60.31% <0%> (+0.26%)`	⬆️
...soft.ML.TestFramework/DataPipe/TestDataPipeBase.cs	`74.03% <0%> (+0.33%)`	⬆️

codecov · 2019-04-08T22:30:03Z

Codecov Report

Merging #3246 into master will increase coverage by 0.07%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3246      +/-   ##
==========================================
+ Coverage   72.61%   72.69%   +0.07%     
==========================================
  Files         804      807       +3     
  Lines      145025   145172     +147     
  Branches    16213    16225      +12     
==========================================
+ Hits       105314   105529     +215     
+ Misses      35294    35227      -67     
+ Partials     4417     4416       -1

Flag	Coverage Δ
#Debug	`72.69% <ø> (+0.07%)`	⬆️
#production	`68.22% <ø> (+0.06%)`	⬆️
#test	`88.97% <ø> (+0.05%)`	⬆️

Impacted Files	Coverage Δ
...classClassification/MulticlassNaiveBayesTrainer.cs	`87.17% <ø> (ø)`	⬆️
...oft.ML.StandardTrainers/StandardTrainersCatalog.cs	`92.34% <ø> (ø)`	⬆️
...c/Microsoft.ML.FastTree/Utils/ThreadTaskManager.cs	`79.48% <0%> (-20.52%)`	⬇️
src/Microsoft.ML.Maml/MAML.cs	`24.75% <0%> (-1.46%)`	⬇️
...oft.ML.Transforms/Text/TextFeaturizingEstimator.cs	`90.57% <0%> (-1.41%)`	⬇️
...soft.ML.Data/DataLoadSave/Text/TextLoaderCursor.cs	`84.7% <0%> (-0.21%)`	⬇️
...osoft.ML.Tests/Transformers/TextFeaturizerTests.cs	`99.58% <0%> (-0.2%)`	⬇️
...StandardTrainers/Standard/Simple/SimpleTrainers.cs	`77.61% <0%> (-0.17%)`	⬇️
src/Microsoft.ML.Recommender/RecommenderCatalog.cs	`70.83% <0%> (ø)`	⬆️
...dardTrainers/Standard/Online/AveragedPerceptron.cs	`89.7% <0%> (ø)`	⬆️
... and 29 more

Ivanidzo4ka · 2019-04-09T21:51:07Z

docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/MulticlassClassification/NaiveBayes.cs

+                Console.WriteLine($"Label: {p.Label}, Prediction: {p.PredictedLabel}");
+
+            // Expected output:
+            //   Label: 1, Prediction: 2


2 [](start = 39, length = 1)

it basically assigns one class for prediction which looks like bug for me.
Have you create issue about that?
Not sure it's worth having sample which showing broken learner. #Resolved

yes, this is a repro for issue #3226 and @codemzs is looking into it #Resolved

This is not a bug. Naive Bayes considers features to be binary in our implementation, that is how features are binned. In this sample pipeline all your features are greater than equal to zero that means the feature histogram for each class will be of the same size hence you are seeing this behavior. Please modify your code to have feature values take either negative or positive values.

When we were implementing Naive Bayes we thought about this case of features taking continuous values and for that we would need to implement Gaussian distribution to bin the features. However it wasn't a requirement at the time.

CC: @glebuk @TomFinley @justinormont #Resolved

Do we have samples where Naive Bayes works well? #Resolved

this is a such sample

In reply to: 274157820 [](ancestors = 274157820)

codemzs · 2019-04-11T20:39:50Z

docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/MulticlassClassification/NaiveBayes.cs

+            //   Label: 2, Prediction: 2
+            //   Label: 3, Prediction: 3
+            //   Label: 2, Prediction: 2
+            //   Label: 3, Prediction: 3


NICE! #Resolved

codemzs · 2019-04-11T20:43:46Z

...soft.ML.Samples/Dynamic/Trainers/MulticlassClassification/MulticlassClassification.ttinclude

@@ -75,7 +75,7 @@ namespace Samples.Dynamic.Trainers.MulticlassClassification
        private static IEnumerable<DataPoint> GenerateRandomDataPoints(int count, int seed=0)
        {
            var random = new Random(seed);
-            float randomFloat() => (float)random.NextDouble();
+            float randomFloat() => (float)(random.NextDouble() - 0.5);


float randomFloat() => (float)(random.NextDouble() - 0.5); [](start = 12, length = 58)

This is great, but did you re-generate all the TT by running custom tool to make sure the samples are not broken? #Resolved

yes, only 3 tt depend on this one, they are regenerated

In reply to: 274642625 [](ancestors = 274642625)

codemzs

natke

Were there some extra assumptions that we were going to explicitly document for this trainer?

natke · 2019-04-11T20:57:50Z

...soft.ML.Samples/Dynamic/Trainers/MulticlassClassification/MulticlassClassification.ttinclude

@@ -75,7 +75,7 @@ namespace Samples.Dynamic.Trainers.MulticlassClassification
        private static IEnumerable<DataPoint> GenerateRandomDataPoints(int count, int seed=0)
        {
            var random = new Random(seed);
-            float randomFloat() => (float)random.NextDouble();
+            float randomFloat() => (float)(random.NextDouble() - 0.5);


Why do we do this? #Resolved

@natke Its to make sure feature values are evenly distributed between -0.5 and +0.5. This gives us even number of positive and negative examples. Naive Bayes considers two types of feature values 1) greater than zero and 2) less than equal to zero and you want to have a sample with both those feature values to have sensible prediction. I believe @ganik has talked about it briefly in the doc that he has attached here. #Resolved

Ok, great. Is it worth adding a comment to the code? Also, which doc? #ByDesign

@natke I believe he has already here and this should show up in the docs right? #Resolved

Ok, so yes it is spelt out in the trainer code comments. I wonder if we should add a comment to this sample code too, to be absolutely clear. #Resolved

I cant add it to the sample code since this code is shared (generated from .tt which is shared) by 3 other trainers that don't have this NaiveBayes problem

In reply to: 274708597 [](ancestors = 274708597)

so we have random values from -.5 to .5 range, some trainers like NB need that, others like OVA dont but will be ok with that

In reply to: 274651105 [](ancestors = 274651105)

Actually I think I know how to do it, I ll send next iteration

In reply to: 275193160 [](ancestors = 275193160,274708597)

shmoradims · 2019-04-12T16:41:19Z

docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/MulticlassClassification/NaiveBayes.cs

+using System.Linq;
+using Microsoft.ML;
+using Microsoft.ML.Data;
+using Microsoft.ML.SamplesUtils;


let's remove this as per the checklist #Resolved

shmoradims · 2019-04-12T16:42:20Z

using Microsoft.ML.SamplesUtils;

ditto #Resolved

Refers to: docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/MulticlassClassification/OneVersusAll.cs:6 in 8831b0f. [](commit_id = 8831b0f, deletion_comment = False)

shmoradims · 2019-04-12T16:43:17Z

public static class OneVersusAll

this one doesn't have a .tt file? #Resolved

Refers to: docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/MulticlassClassification/OneVersusAll.cs:10 in 8831b0f. [](commit_id = 8831b0f, deletion_comment = False)

shmoradims · 2019-04-12T16:50:33Z

...crosoft.ML.StandardTrainers/Standard/MulticlassClassification/MulticlassNaiveBayesTrainer.cs

+    /// in a class even though they may be dependent on each other. It is a multi-class trainer that accepts
+    /// binary feature values of type float, i.e., feature values are either true or false, specifically a
+    /// feature value greater than zero is treated as true.
+    /// </summary>


info: this is good for the 1st pass for docs. please leave the 2nd pass empty, so that we improve this next week. #Resolved

ganik · 2019-04-15T02:42:03Z

public static class OneVersusAll

it does

In reply to: 482643079 [](ancestors = 482643079)

Refers to: docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/MulticlassClassification/OneVersusAll.cs:10 in 8831b0f. [](commit_id = 8831b0f, deletion_comment = False)

shmoradims · 2019-04-15T22:17:47Z

docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/MulticlassClassification/PairwiseCoupling.cs

@@ -3,7 +3,7 @@
 using System.Linq;
 using Microsoft.ML;
 using Microsoft.ML.Data;
-using Microsoft.ML.SamplesUtils;
+

 namespace Samples.Dynamic.Trainers.MulticlassClassification


extra line? #Resolved

shmoradims · 2019-04-15T22:18:08Z

src/Microsoft.ML.StandardTrainers/StandardTrainersCatalog.cs

+        /// <example>
+        /// <format type="text/markdown">
+        /// <![CDATA[
+        ///  [!code-csharp[SDCA](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/MulticlassClassification/NaiveBayes.cs)]


SDCA [](start = 27, length = 4)

rename #Resolved

shmoradims

Add NaiveBayes sample

2726f13

ganik requested a review from codemzs April 8, 2019 21:49

ganik changed the title ~~Add NaiveBayes sample~~ [WIP] Add NaiveBayes sample Apr 9, 2019

Ivanidzo4ka reviewed Apr 9, 2019

View reviewed changes

fix sample and doc

8037fd1

ganik changed the title ~~[WIP] Add NaiveBayes sample~~ Add NaiveBayes sample Apr 10, 2019

shmoradims mentioned this pull request Apr 10, 2019

Docs and samples for the API reference site (P0 & P1 Trainers) #2522

Closed

ganik changed the title ~~Add NaiveBayes sample~~ Add NaiveBayes sample & docs Apr 10, 2019

Add example link

8831b0f

codemzs reviewed Apr 11, 2019

View reviewed changes

codemzs approved these changes Apr 11, 2019

View reviewed changes

natke reviewed Apr 11, 2019

View reviewed changes

shmoradims reviewed Apr 12, 2019

View reviewed changes

artidoro mentioned this pull request Apr 12, 2019

Multiclass Classification Samples Update #3322

Merged

ganik added 2 commits April 14, 2019 19:52

fix comments

7046cfd

fix comment

64a8f5a

shmoradims reviewed Apr 15, 2019

View reviewed changes

shmoradims approved these changes Apr 15, 2019

View reviewed changes

ganik added 2 commits April 15, 2019 15:36

remove extra line

d056196

fix comments

b28d900

ganik merged commit 66ff419 into dotnet:master Apr 16, 2019

ghost locked as resolved and limited conversation to collaborators Mar 22, 2022

Add NaiveBayes sample & docs #3246

Add NaiveBayes sample & docs #3246

Uh oh!

Conversation

ganik commented Apr 8, 2019

Uh oh!

codecov bot commented Apr 8, 2019

Codecov Report

Uh oh!

codecov bot commented Apr 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Ivanidzo4ka Apr 9, 2019 • edited by ganik Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ganik Apr 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codemzs Apr 9, 2019 • edited by ganik Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

justinormont Apr 10, 2019 • edited by ganik Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ganik Apr 10, 2019

Choose a reason for hiding this comment

Uh oh!

codemzs Apr 11, 2019 • edited by ganik Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codemzs Apr 11, 2019 • edited by ganik Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ganik Apr 11, 2019

Choose a reason for hiding this comment

Uh oh!

codemzs left a comment

Choose a reason for hiding this comment

Uh oh!

natke left a comment

Choose a reason for hiding this comment

Uh oh!

natke Apr 11, 2019 • edited by ganik Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codemzs Apr 11, 2019 • edited by ganik Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

natke Apr 11, 2019 • edited by ganik Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codemzs Apr 11, 2019 • edited by ganik Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

natke Apr 11, 2019 • edited by ganik Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ganik Apr 15, 2019

Choose a reason for hiding this comment

Uh oh!

ganik Apr 15, 2019

Choose a reason for hiding this comment

Uh oh!

ganik Apr 15, 2019

Choose a reason for hiding this comment

Uh oh!

shmoradims Apr 12, 2019 • edited by ganik Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

codecov bot commented Apr 8, 2019 •

edited

Loading

Ivanidzo4ka Apr 9, 2019 •

edited by ganik

Loading

ganik Apr 9, 2019 •

edited

Loading

codemzs Apr 9, 2019 •

edited by ganik

Loading

justinormont Apr 10, 2019 •

edited by ganik

Loading

codemzs Apr 11, 2019 •

edited by ganik

Loading

codemzs Apr 11, 2019 •

edited by ganik

Loading

natke Apr 11, 2019 •

edited by ganik

Loading

codemzs Apr 11, 2019 •

edited by ganik

Loading

natke Apr 11, 2019 •

edited by ganik

Loading

codemzs Apr 11, 2019 •

edited by ganik

Loading

natke Apr 11, 2019 •

edited by ganik

Loading

shmoradims Apr 12, 2019 •

edited by ganik

Loading

shmoradims commented Apr 12, 2019 •

edited by ganik

Loading

shmoradims commented Apr 12, 2019 •

edited by ganik

Loading

shmoradims Apr 12, 2019 •

edited by ganik

Loading

shmoradims Apr 15, 2019 •

edited by ganik

Loading

shmoradims Apr 15, 2019 •

edited by ganik

Loading