Added samples & docs for BinaryClassification.StochasticGradientDescent #2688

shmoradims · 2019-02-21T23:15:03Z

Added samples & docs for BinaryClassification.StochasticGradientDescent
Bunch of typo fixes

…nt, plus a bunch of typo fixing.

codecov · 2019-02-22T00:05:06Z

Codecov Report

Merging #2688 into master will decrease coverage by 0.01%.
The diff coverage is 0%.

@@            Coverage Diff             @@
##           master    #2688      +/-   ##
==========================================
- Coverage   71.71%    71.7%   -0.02%     
==========================================
  Files         808      808              
  Lines      142218   142227       +9     
  Branches    16131    16131              
==========================================
- Hits       101996   101982      -14     
- Misses      35777    35805      +28     
+ Partials     4445     4440       -5

Flag	Coverage Δ
#Debug	`71.7% <0%> (-0.02%)`	⬇️
#production	`67.96% <0%> (-0.03%)`	⬇️
#test	`85.84% <ø> (+0.01%)`	⬆️

Impacted Files	Coverage Δ
src/Microsoft.ML.StaticPipe/LbfgsStatic.cs	`53.15% <ø> (ø)`	⬆️
src/Microsoft.ML.StaticPipe/TreeTrainersStatic.cs	`66.34% <ø> (ø)`	⬆️
...crosoft.ML.StandardLearners/Standard/SdcaBinary.cs	`72.11% <ø> (ø)`	⬆️
src/Microsoft.ML.StaticPipe/SgdStatic.cs	`100% <ø> (ø)`	⬆️
src/Microsoft.ML.Data/EntryPoints/InputBase.cs	`70.96% <ø> (ø)`	⬆️
...L.StandardLearners/Standard/Online/OnlineLinear.cs	`79.43% <ø> (ø)`	⬆️
...StandardLearners/Standard/Online/AveragedLinear.cs	`59.5% <ø> (ø)`	⬆️
...oft.ML.StandardLearners/StandardLearnersCatalog.cs	`88.57% <ø> (ø)`	⬆️
...dardLearners/Standard/Online/AveragedPerceptron.cs	`85.18% <ø> (ø)`	⬆️
src/Microsoft.ML.SamplesUtils/ConsoleUtils.cs	`0% <0%> (ø)`	⬆️
... and 9 more

wschin · 2019-02-22T00:48:51Z

...oft.ML.Samples/Dynamic/Trainers/BinaryClassification/StochasticGradientDescentWithOptions.cs

+            var mlContext = new MLContext(seed: 0);
+
+            // Download and featurize the dataset.
+            var data = SamplesUtils.DatasetUtils.LoadFeaturizedAdultDataset(mlContext);


Please add examples using in-memory data structures if possible and show how to inspect those in-memory examples' predictions. If you search for "InMemory" in VS Text Explorer, you will find a few examples. #Resolved

Since these are in the API docs on the website, I actually would prefer to have the data loading part to be terse. Let's make other samples focused on the data loading aspects, and keep this spare.

In reply to: 259176526 [](ancestors = 259176526)

I agree with Rogan. That would be a scenario tutorial than API sample.

In reply to: 259451392 [](ancestors = 259451392,259176526)

So this is API doc for documentation website? If yes, it makes my feeling even stronger --- ideally, to fit this example into user's own scenario, user should be able to just make minor changes. Having a text loader decreases the flexibility of this example and forces user to go outside Visual Studio because they need to prepare a text file and make sure that file can be loaded correctly. Thinking about scikit-learn, I don't find many of them using text file and it's super easy to start working on their modules. #Pending

I'm not following this comment. LoadFeaturizedAdultDataset downloads the dataset and loads it into memory. It's functionally similar to sklearn datasets module which is used in many of the sklearn examples.

from sklearn import datasets
iris = datasets.load_iris()
digits = datasets.load_digits()

And we have SamplesUtils.DatasetUtils.LoadFeaturizedAdultDataset().

Users are able to copy-paste these samples and run them as-is. Please IM me if you want to further discuss this. :)

In reply to: 259558223 [](ancestors = 259558223)

I'll be using this same template for all the new API samples. If you think there's a better template, please let me know soon.

In reply to: 259947957 [](ancestors = 259947957,259558223)

wschin · 2019-02-22T00:50:03Z

...oft.ML.Samples/Dynamic/Trainers/BinaryClassification/StochasticGradientDescentWithOptions.cs

+            var model = pipeline.Fit(trainTestData.TrainSet);
+
+            // Evaluate how the model is doing on the test data.
+            var dataWithPredictions = model.Transform(trainTestData.TestSet);


Print out metric is a bit far from practical uses in production where we create prediction per example and then make decision based those prediction values. #Resolved

I think purpose of this samples is to show up how you can call specific trainer. And set different in options.
What you want is slightly different thing and should be covered by highlevel documentation and cookbook.

In reply to: 259176767 [](ancestors = 259176767)

Ivan is correct. The API samples have a narrow scope to showcase how to use a single API. We have tutorials, and end-to-end samples repo that cover the practical cases, which involve using multiple APIs.

In reply to: 259443273 [](ancestors = 259443273,259176767)

Ivanidzo4ka · 2019-02-22T17:36:02Z

...ples/Microsoft.ML.Samples/Dynamic/Trainers/BinaryClassification/StochasticGradientDescent.cs

@@ -0,0 +1,47 @@
+using Microsoft.ML;
+
+namespace Microsoft.ML.Samples.Dynamic.Trainers.BinaryClassification


.Trainers.BinaryClassification [](start = 38, length = 30)

it's much easier to call StochasticGradientDescent.Example in Program.cs if it's a truncated namespace. #Resolved

the pattern I've been using is to have the class name be the same as the API name, in this case StochasticGradientDescent. In many cases we have the same trainer in multiple catalogs. So we either have to keep the namespaces distinct, or change the class names here to StochasticGradientDescentBinary or StochasticGradientDescentBinaryClassificaiton. I prefer to use the names spaces and mirror the catalog structure. Makes sense?

In reply to: 259442659 [](ancestors = 259442659)

Ivanidzo4ka · 2019-02-22T17:40:25Z

src/Microsoft.ML.StandardLearners/StandardLearnersCatalog.cs

+        /// <param name="catalog">The binary classification catalog trainer object.</param>
+        /// <param name="labelColumnName">The name of the label column, or dependent variable.</param>
+        /// <param name="featureColumnName">The features, or independent variables.</param>
+        /// <param name="weightColumnName">The name of the example weight column.</param>


weightColumnName [](start = 25, length = 16)

#2665
/// <param name="exampleWeightColumnName">The name of the example weight column (optional).</param>
#Resolved

thanks for the heads up. I'll leave this for when resolving merge conflicts.

In reply to: 259444080 [](ancestors = 259444080)

Ivanidzo4ka · 2019-02-22T17:41:59Z

src/Microsoft.ML.StandardLearners/StandardLearnersCatalog.cs

@@ -19,15 +19,22 @@ namespace Microsoft.ML
    public static class StandardLearnersCatalog
    {
        /// <summary>
-        ///  Predict a target using logistic regression trained with the <see cref="SgdBinaryTrainer"/> trainer.
+        /// Predict a target using a linear classification model trained with <see cref="SgdBinaryTrainer"/>.


linear classification [](start = 37, length = 21)

Feels like word calibrated should be somewhere here. #Resolved

/// The trained model can produce probability by feeding the output value of the linear /// function to a <see cref="PlattCalibrator"/>.

or this one

In reply to: 259444565 [](ancestors = 259444565)

I added this to SgdBinaryTrainer docs. We've been pushing all the detailed remarks to the Trainer/Estimator class so that the info is shared across all the different overload versions.

In reply to: 259445278 [](ancestors = 259445278,259444565)

rogancarr · 2019-02-22T17:58:46Z

...ples/Microsoft.ML.Samples/Dynamic/Trainers/BinaryClassification/StochasticGradientDescent.cs

+    {
+        // In this examples we will use the adult income dataset. The goal is to predict
+        // if a person's income is above $50K or not, based on different pieces of information about that person.
+        // For more details about this dataset, please see https://archive.ics.uci.edu/ml/datasets/adult.


https://archive.ics.uci.edu/ml/datasets/adult [](start = 59, length = 45)

Broken link. Or the site is down right now.... #Resolved

I think they're just temporarily down. This is prominent link. It's the first google result for "uci adult".

In reply to: 259450232 [](ancestors = 259450232)

rogancarr · 2019-02-22T18:00:47Z

...ples/Microsoft.ML.Samples/Dynamic/Trainers/BinaryClassification/StochasticGradientDescent.cs

+    public static class StochasticGradientDescent
+    {
+        // In this examples we will use the adult income dataset. The goal is to predict
+        // if a person's income is above $50K or not, based on different pieces of information about that person.


based on different pieces [](start = 54, length = 26)

"demographic information" #Resolved

that's a better wording. i changed to that.

In reply to: 259450893 [](ancestors = 259450893)

rogancarr · 2019-02-22T18:01:33Z

...oft.ML.Samples/Dynamic/Trainers/BinaryClassification/StochasticGradientDescentWithOptions.cs

+    public static class StochasticGradientDescentWithOptions
+    {
+        // In this examples we will use the adult income dataset. The goal is to predict
+        // if a person's income is above $50K or not, based on different pieces of information about that person.


r not, based on different pieces of information [](start = 47, length = 47)

Ditto from the other sample. #Resolved

changes all samples to use "demographic information" instead.

In reply to: 259451108 [](ancestors = 259451108)

rogancarr · 2019-02-22T18:06:29Z

...oft.ML.Samples/Dynamic/Trainers/BinaryClassification/StochasticGradientDescentWithOptions.cs

+            // Define the trainer options.
+            var options = new SgdBinaryTrainer.Options()
+            {
+                MaxIterations = 30,


MaxIterations = 30, [](start = 16, length = 19)

Can you give a quick //comment on what the options do. Not as in depth as the API docs, but conversational, and what would change about the calculation over the defaults. #Resolved

I added some descriptive comments. I don't want to give too much specifics about defaults or behavior, because those could change and we'll end up with stale comments. Better to keep most details on API docs, where they'll stay in-sync with code.

In reply to: 259452561 [](ancestors = 259452561)

rogancarr · 2019-02-22T18:08:25Z

src/Microsoft.ML.SamplesUtils/ConsoleUtils.cs

+        /// <summary>
+        /// Pretty-print CalibratedBinaryClassificationMetrics objects.
+        /// </summary>
+        /// <param name="metrics">Calibrated binary classification metrics.</param>


binary classification metrics [](start = 45, length = 29)

use a see tag so it can be discovered by the tooling. #Resolved

done

In reply to: 259453147 [](ancestors = 259453147)

rogancarr · 2019-02-22T18:17:50Z

src/Microsoft.ML.StandardLearners/StandardLearnersCatalog.cs

+        /// [!code-csharp[StochasticGradientDescent](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/BinaryClassification/StochasticGradientDescent.cs)]
+        /// ]]>
+        /// </format>
+        /// </example>
        public static SgdBinaryTrainer StochasticGradientDescent(this BinaryClassificationCatalog.BinaryClassificationTrainers catalog,


StochasticGradientDescent [](start = 39, length = 25)

Are we going to have any docs on what SGD is and how this learner implements it? #Resolved

The details are in SgdBinaryTrainer. Please comment on SgdBinaryTrainer if you want more reference links there.

In reply to: 259456046 [](ancestors = 259456046)

rogancarr · 2019-02-22T18:19:27Z

src/Microsoft.ML.StandardLearners/Standard/SdcaBinary.cs

+    /// The Stochastic Gradient Descent (SGD) is one of the popular stochastic optimization procedures that can be integrated
+    /// into several machine learning tasks to achieve state-of-the-art performance. This trainer implements SGD for binary classification
+    /// that supports multi-threading without any locking. If the associated optimization problem is sparse, it achieves a nearly optimal
+    /// rate of convergence. For more details, please refer to http://arxiv.org/pdf/1106.5730v2.pdf.


e. For more details [](start = 26, length = 19)

Call out that this uses the Hogwild method. #Resolved

I though we are reducing Hogwild from our API names. This used to be called HogwildSgdTrainer. So I didn't mention Hogwild here and the only mentioning is in the reference. Do you want Hogwild to be mentioned in the text here?

In reply to: 259456602 [](ancestors = 259456602)

I think we don't want it in the API name so that we're free to change implementations, but I think it's good to be explicit in the docs on what style of SGD this is performing so that advanced users can take it into account. Also, just pointing to Arxiv is a bit confusing because it looks like "for information about SGD, look here" and then it's a paper on a very specific multithreaded implementation.

But in the bigger picture, I think it's better if we nerd out a bit in the remarks.

In reply to: 259480809 [](ancestors = 259480809,259456602)

mentioned Hogwild.

In reply to: 259495627 [](ancestors = 259495627,259480809,259456602)

rogancarr

Looks good! 🎈

Ivanidzo4ka

…mnName.

Added samples & docs for BinaryClassification.StochasticGradientDesce…

c3aebc7

…nt, plus a bunch of typo fixing.

shmoradims requested review from sfilipi, rogancarr, Ivanidzo4ka and yaeldekel February 21, 2019 23:15

This was referenced Feb 21, 2019

Docs and samples for the API reference site (P0 & P1 Trainers) #2522

Closed

Documentation samples for binary classifiers #1257

Closed

wschin reviewed Feb 22, 2019

View reviewed changes

Ivanidzo4ka reviewed Feb 22, 2019

View reviewed changes

rogancarr reviewed Feb 22, 2019

View reviewed changes

Shahab Moradi added 2 commits February 22, 2019 12:00

Addressed PR comments.

16d136e

Mentioned Hogwild

10a01c1

shmoradims added the documentation Related to documentation of ML.NET label Feb 22, 2019

rogancarr approved these changes Feb 23, 2019

View reviewed changes

Ivanidzo4ka approved these changes Feb 25, 2019

View reviewed changes

Shahab Moradi added 3 commits February 25, 2019 11:18

Merged with master and resolved conflicts regarding exampleWeightColu…

ef4168d

…mnName.

Updates to exampleWeightColumnName.

f2a42eb

Fixed trailing whitespaces.

6909bfc

shmoradims merged commit a16eb30 into dotnet:master Feb 25, 2019

shmoradims deleted the binary_trainers_doc branch March 11, 2019 20:51

ghost locked as resolved and limited conversation to collaborators Mar 24, 2022

		@@ -0,0 +1,47 @@
		using Microsoft.ML;

		namespace Microsoft.ML.Samples.Dynamic.Trainers.BinaryClassification

Added samples & docs for BinaryClassification.StochasticGradientDescent #2688

Added samples & docs for BinaryClassification.StochasticGradientDescent #2688

Uh oh!

Conversation

shmoradims commented Feb 21, 2019

Uh oh!

codecov bot commented Feb 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

wschin Feb 22, 2019 • edited by shmoradims Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wschin Feb 23, 2019 • edited by shmoradims Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wschin Feb 22, 2019 • edited by shmoradims Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Ivanidzo4ka Feb 22, 2019 • edited by shmoradims Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Ivanidzo4ka Feb 22, 2019 • edited by shmoradims Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Ivanidzo4ka Feb 22, 2019 • edited by shmoradims Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rogancarr Feb 22, 2019 • edited by shmoradims Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rogancarr Feb 22, 2019 • edited by shmoradims Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rogancarr Feb 22, 2019 • edited by shmoradims Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rogancarr Feb 22, 2019 • edited by shmoradims Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rogancarr Feb 22, 2019 • edited by shmoradims Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

codecov bot commented Feb 22, 2019 •

edited

Loading

wschin Feb 22, 2019 •

edited by shmoradims

Loading

wschin Feb 23, 2019 •

edited by shmoradims

Loading

wschin Feb 22, 2019 •

edited by shmoradims

Loading

Ivanidzo4ka Feb 22, 2019 •

edited by shmoradims

Loading

Ivanidzo4ka Feb 22, 2019 •

edited by shmoradims

Loading

Ivanidzo4ka Feb 22, 2019 •

edited by shmoradims

Loading

rogancarr Feb 22, 2019 •

edited by shmoradims

Loading

rogancarr Feb 22, 2019 •

edited by shmoradims

Loading

rogancarr Feb 22, 2019 •

edited by shmoradims

Loading

rogancarr Feb 22, 2019 •

edited by shmoradims

Loading

rogancarr Feb 22, 2019 •

edited by shmoradims

Loading

rogancarr Feb 22, 2019 •

edited by shmoradims

Loading

rogancarr Feb 22, 2019 •

edited by shmoradims

Loading

shmoradims Feb 22, 2019 •

edited

Loading

rogancarr Feb 22, 2019 •

edited

Loading