Docs 2nd pass for NaiveBayes, KMeans, OVA, Pairwise and OnnxTransformer #3387

ganik · 2019-04-17T23:18:42Z

part of #2522

wschin · 2019-04-18T05:55:10Z

src/Microsoft.ML.KMeansClustering/KMeansPlusPlusTrainer.cs

@@ -26,7 +26,18 @@

 namespace Microsoft.ML.Trainers
 {
-    /// <include file='./doc.xml' path='doc/members/member[@name="KMeans++"]/*' />
+    /// <summary>


Do you think scoring function to the trained modle should be explained? #WontFix

you mean Predict function? the one that predicts which cluster data point goes to?

In reply to: 276525272 [](ancestors = 276525272)

or do u mean how distance is calculated when algo converges to centroids?

In reply to: 277107516 [](ancestors = 277107516,276525272)

In either case I dont think so. For former its not specific to KMeans++. For latter first how am I supposed to do that? Should I deduce this from the code? Secondly, I dont think this information is important to know for .NET devs. All they need to know that this is an improved version of KMeans and high level details of improvements that are described in wiki links

In reply to: 277107690 [](ancestors = 277107690,277107516,276525272)

That's what I did. Debugging ML model is debugging math equation. Without them, how can user improve their models? We keep saying that .NET devs can ignore details. However, this doesn't look super true in ML.NET's gitter channel.

Anyway, I am not blocking your PR. Just want to pass some of my observations. :) #Resolved

sure, once we have more time we could prioritize such tasks to the top, currently this is lower pri to the rest of things that need to be done.

In reply to: 277148107 [](ancestors = 277148107)

wschin · 2019-04-18T05:58:29Z

...crosoft.ML.StandardTrainers/Standard/MulticlassClassification/MulticlassNaiveBayesTrainer.cs

+    /// <a href ='https://en.wikipedia.org/wiki/Naive_Bayes_classifier'>Naive Bayes</a>
+    /// is a probabilistic classifier that can be used for multiclass problems.
+    /// Using Bayes' theorem, the conditional probability for a sample belonging to a class
+    /// can be calculated based on the sample count for each feature combination groups.


I'd recommend to add math equation to describe the scoring funciton. It's not easy to figure out how this modle is doing prediction given only text description. #ByDesign

I ll need to read a K-Means++ paper to figure out that. I dont think users do care about exact formula. What they would care is that it is an improved version of K-Means and can be used as such (where K-Means is used)

In reply to: 276525896 [](ancestors = 276525896)

codemzs · 2019-04-18T15:39:19Z

Do we not need a table in documentation like other xml PRs? Plus you also need to specify uid. Best to check mine or Wei-Sheng’s PRs for reference. #Resolved

natke · 2019-04-18T21:37:44Z

src/Microsoft.ML.KMeansClustering/KMeansPlusPlusTrainer.cs

+    /// <summary>
+    /// K-means is a popular clustering algorithm. With K-means, the data is clustered into a specified
+    /// number of clusters in order to minimize the within-cluster sum of squares.
+    /// K-means++ improves upon K-means by using the <a href='https://research.microsoft.com/apps/pubs/default.aspx?id=252149'>Yinyang K-Means</a>


Should we say "this algorithm" or "this implementation" instead of K-means++? Is K-means++ meaningful to a user? #Resolved

I think K-means++ brings good message about the algo: that its a improved variation of KMeans. I would rather leave it as it is

In reply to: 276843341 [](ancestors = 276843341)

Sure, thats's fine, as long as it clear that this trainer is the K-means++ trainer :-) #Resolved

natke · 2019-04-18T21:38:39Z

src/Microsoft.ML.KMeansClustering/KMeansPlusPlusTrainer.cs

+    /// number of clusters in order to minimize the within-cluster sum of squares.
+    /// K-means++ improves upon K-means by using the <a href='https://research.microsoft.com/apps/pubs/default.aspx?id=252149'>Yinyang K-Means</a>
+    /// method for choosing the initial cluster centers.
+    /// YYK-Means accelerates K-Means up to an order of magnitude while producing exactly the same clustering results(modulo floating point precision issues).


Same question: is YYK-Means meaningful to a user? Do they configure it here, or it is a fixed property of this trainer? #Resolved

this is abbreviation from "Yinyang K-Means". I ll change it to K-Means++

In reply to: 276843606 [](ancestors = 276843606)

natke · 2019-04-18T22:19:40Z

...crosoft.ML.StandardTrainers/Standard/MulticlassClassification/MulticlassNaiveBayesTrainer.cs

+    /// they may be dependent on each other.
+    /// This multi-class trainer accepts binary feature values of type float, i.e.,
+    /// feature values are either true or false.
+    /// Specifically a feature value greater than zero is treated as true.


I think this needs more clarification. How and when would someone use this in practice? (let's work on some words here: @ganik and @codemzs) #Resolved

After much discussion I'm happy to leave this as is for now and work on some good examples either for the samples or the conceptual docs. #Resolved

sounds good thx

In reply to: 277108028 [](ancestors = 277108028)

natke · 2019-04-18T22:24:51Z

src/Microsoft.ML.StandardTrainers/Standard/MulticlassClassification/OneVersusAllTrainer.cs

+    /// can be different from LightGbmClassifier, which develops a multi-class classifier directly.
+    /// Note that even if the classifier indicates that it does not need caching, OneVersusAll will always
+    /// request caching, as it will be performing multiple passes over the data set.
+    /// This learner will request normalization from the data pipeline if the classifier indicates it would benefit from it.


learner -> trainer or algorithm? #Resolved

natke · 2019-04-19T22:52:33Z

...crosoft.ML.StandardTrainers/Standard/MulticlassClassification/MulticlassNaiveBayesTrainer.cs

+    /// they may be dependent on each other.
+    /// This multi-class trainer accepts binary feature values of type float, i.e.,
+    /// feature values are either true or false.
+    /// Specifically a feature value greater than zero is treated as true.


After much discussion I'm happy to leave this as is for now and work on some good examples either for the samples or the conceptual docs. #Resolved

natke · 2019-04-19T22:53:27Z

src/Microsoft.ML.KMeansClustering/KMeansPlusPlusTrainer.cs

+    /// <summary>
+    /// K-means is a popular clustering algorithm. With K-means, the data is clustered into a specified
+    /// number of clusters in order to minimize the within-cluster sum of squares.
+    /// K-means++ improves upon K-means by using the <a href='https://research.microsoft.com/apps/pubs/default.aspx?id=252149'>Yinyang K-Means</a>


Sure, thats's fine, as long as it clear that this trainer is the K-means++ trainer :-) #Resolved

codecov · 2019-04-19T23:17:33Z

Codecov Report

Merging #3387 into master will increase coverage by 0.06%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3387      +/-   ##
==========================================
+ Coverage   72.69%   72.76%   +0.06%     
==========================================
  Files         807      808       +1     
  Lines      145172   145452     +280     
  Branches    16225    16244      +19     
==========================================
+ Hits       105533   105835     +302     
+ Misses      35223    35196      -27     
- Partials     4416     4421       +5

Flag	Coverage Δ
#Debug	`72.76% <ø> (+0.06%)`	⬆️
#production	`68.27% <ø> (+0.04%)`	⬆️
#test	`89.04% <ø> (+0.06%)`	⬆️

Impacted Files	Coverage Δ
...osoft.ML.KMeansClustering/KMeansPlusPlusTrainer.cs	`84.64% <ø> (ø)`	⬆️
...classClassification/MulticlassNaiveBayesTrainer.cs	`87.17% <ø> (ø)`	⬆️
...ulticlassClassification/PairwiseCouplingTrainer.cs	`90.07% <ø> (ø)`	⬆️
src/Microsoft.ML.OnnxTransformer/OnnxTransform.cs	`86.24% <ø> (ø)`	⬆️
...rd/MulticlassClassification/OneVersusAllTrainer.cs	`74.87% <ø> (ø)`	⬆️
src/Microsoft.ML.OnnxTransformer/OnnxCatalog.cs	`100% <ø> (ø)`	⬆️
...c/Microsoft.ML.FastTree/Utils/ThreadTaskManager.cs	`79.48% <0%> (-20.52%)`	⬇️
src/Microsoft.ML.Data/Transforms/KeyToValue.cs	`79.16% <0%> (-0.65%)`	⬇️
...ML.Transforms/MutualInformationFeatureSelection.cs	`78.58% <0%> (-0.33%)`	⬇️
...ft.ML.Data/Evaluators/BinaryClassifierEvaluator.cs	`77.18% <0%> (-0.01%)`	⬇️
... and 75 more

codemzs · 2019-04-19T23:48:49Z

src/Microsoft.ML.KMeansClustering/KMeansPlusPlusTrainer.cs

+    /// |  |  |
+    /// | -- | -- |
+    /// | Machine learning task | Clustering |
+    /// | Is normalization required? | Yes |


Yes [](start = 39, length = 3)

Are you sure? #Resolved

yep. By default it uses default TrainerInfo() which ahs this default settings

In reply to: 277113743 [](ancestors = 277113743)

codemzs · 2019-04-19T23:49:03Z

src/Microsoft.ML.KMeansClustering/KMeansPlusPlusTrainer.cs

+    /// | Machine learning task | Clustering |
+    /// | Is normalization required? | Yes |
+    /// | Is caching required? | Yes |
+    /// | Required NuGet in addition to Microsoft.ML | None |


None [](start = 55, length = 4)

please check this. #Resolved

yep verified, its true

In reply to: 277113763 [](ancestors = 277113763)

codemzs · 2019-04-19T23:49:42Z

src/Microsoft.ML.KMeansClustering/KMeansPlusPlusTrainer.cs

+    /// K-Means++ observes that there is a lot of redundancy across iterations in the KMeans algorithms and most points do not change their clusters during an iteration.
+    /// It uses various bounding techniques to identify this redundancy and eliminate many distance computations and optimize centroid computations.
+    /// For more information on K-means, and K-means++ see:
+    /// <a href='https://en.wikipedia.org/wiki/K-means_clustering'>K-means</a>


K-means [](start = 8, length = 70)

We have been using text format for md stuff... #Resolved

thx

In reply to: 277113813 [](ancestors = 277113813)

codemzs · 2019-04-19T23:51:43Z

...crosoft.ML.StandardTrainers/Standard/MulticlassClassification/MulticlassNaiveBayesTrainer.cs

+    /// It assumes independence among the presence of features in a class even though
+    /// they may be dependent on each other.
+    /// This multi-class trainer accepts binary feature values of type float, i.e.,
+    /// feature values are either true or false.


dude this is not true, it can accept whatever. It just treats features as greater than zero and less than zero...you should explain this more. #Resolved

thx done.

In reply to: 277114008 [](ancestors = 277114008)

codemzs

shmoradims · 2019-04-20T17:46:15Z

src/Microsoft.ML.KMeansClustering/KMeansPlusPlusTrainer.cs

+    /// <remarks>
+    /// <format type="text/markdown"><![CDATA[
+    /// To create this trainer, use [KMeansTrainer](xref:Microsoft.ML.Trainers.KMeansTrainer).
+    /// ### Trainer Characteristics


/// [](start = 3, length = 5)

input/output table is missing #Resolved

docs for NaiveBayes, KMeans, OVA, Pairwise

7fdb7d4

ganik requested review from shmoradims, artidoro and natke April 17, 2019 23:18

ganik added 2 commits April 17, 2019 16:20

fix spaces

e3dbbc8

fix doc

e397a11

shmoradims mentioned this pull request Apr 17, 2019

Docs and samples for the API reference site (P0 & P1 Trainers) #2522

Closed

wschin reviewed Apr 18, 2019

View reviewed changes

fix build

94a5fcc

ganik added the documentation Related to documentation of ML.NET label Apr 18, 2019

natke suggested changes Apr 18, 2019

View reviewed changes

ganik added 3 commits April 18, 2019 16:07

fix comments

a0619bd

fix comments

d13f693

fix comments

017a852

natke approved these changes Apr 19, 2019

View reviewed changes

fix for Kmeans

c5b9f6b

codemzs reviewed Apr 19, 2019

View reviewed changes

codemzs approved these changes Apr 19, 2019

View reviewed changes

ganik added 2 commits April 19, 2019 17:14

fix comments

fe35942

fix comment

e58a32c

shmoradims reviewed Apr 20, 2019

View reviewed changes

fix build

e12aaa0

ganik changed the title ~~Docs 2nd pass for NaiveBayes, KMeans, OVA, Pairwise~~ [WIP] Docs 2nd pass for NaiveBayes, KMeans, OVA, Pairwise Apr 21, 2019

Fix io table and add OnnxTransform docs

5475499

ganik changed the title ~~[WIP] Docs 2nd pass for NaiveBayes, KMeans, OVA, Pairwise~~ [WIP] Docs 2nd pass for NaiveBayes, KMeans, OVA, Pairwise and OnnxTransformer Apr 21, 2019

sfilipi mentioned this pull request Apr 21, 2019

API reference - XML documentation template for transforms #3204

Closed

fix docs for ONNX

8af3966

ganik changed the title ~~[WIP] Docs 2nd pass for NaiveBayes, KMeans, OVA, Pairwise and OnnxTransformer~~ Docs 2nd pass for NaiveBayes, KMeans, OVA, Pairwise and OnnxTransformer Apr 21, 2019

ganik merged commit 3389f1d into dotnet:master Apr 21, 2019

ghost locked as resolved and limited conversation to collaborators Mar 22, 2022

Docs 2nd pass for NaiveBayes, KMeans, OVA, Pairwise and OnnxTransformer #3387

Docs 2nd pass for NaiveBayes, KMeans, OVA, Pairwise and OnnxTransformer #3387

Uh oh!

Conversation

ganik commented Apr 17, 2019

Uh oh!

wschin Apr 18, 2019 • edited by ganik Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ganik Apr 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wschin Apr 20, 2019 • edited by ganik Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ganik Apr 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wschin Apr 18, 2019 • edited by ganik Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codemzs commented Apr 18, 2019 • edited by ganik Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

natke Apr 18, 2019 • edited by ganik Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

natke Apr 19, 2019 • edited by ganik Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

natke Apr 18, 2019 • edited by ganik Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

natke Apr 18, 2019 • edited by ganik Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

natke Apr 19, 2019 • edited by ganik Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

natke Apr 18, 2019 • edited by ganik Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

natke Apr 19, 2019 • edited by ganik Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

natke Apr 19, 2019 • edited by ganik Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Apr 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codemzs Apr 19, 2019 • edited by ganik Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

wschin Apr 18, 2019 •

edited by ganik

Loading

ganik Apr 19, 2019 •

edited

Loading

wschin Apr 20, 2019 •

edited by ganik

Loading

ganik Apr 21, 2019 •

edited

Loading

wschin Apr 18, 2019 •

edited by ganik

Loading

codemzs commented Apr 18, 2019 •

edited by ganik

Loading

natke Apr 18, 2019 •

edited by ganik

Loading

natke Apr 19, 2019 •

edited by ganik

Loading

natke Apr 18, 2019 •

edited by ganik

Loading

natke Apr 18, 2019 •

edited by ganik

Loading

natke Apr 19, 2019 •

edited by ganik

Loading

natke Apr 18, 2019 •

edited by ganik

Loading

natke Apr 19, 2019 •

edited by ganik

Loading

natke Apr 19, 2019 •

edited by ganik

Loading

codecov bot commented Apr 19, 2019 •

edited

Loading

codemzs Apr 19, 2019 •

edited by ganik

Loading

codemzs Apr 19, 2019 •

edited by ganik

Loading

codemzs Apr 19, 2019 •

edited by ganik

Loading

codemzs Apr 19, 2019 •

edited by ganik

Loading

shmoradims Apr 20, 2019 •

edited by ganik

Loading