Docs & samples for SDCA-based trainers #2771

shmoradims · 2019-02-27T22:06:39Z

Docs & samples for SDCA binary, multi-class, and regression.
Related to #2522

...L.Samples/Dynamic/Trainers/BinaryClassification/StochasticDualCoordinateAscentWithOptions.cs

wschin · 2019-02-28T01:17:50Z

...osoft.ML.Samples/Dynamic/Trainers/MulticlassClassification/StochasticDualCoordinateAscent.cs

+            var mlContext = new MLContext(seed: 0);
+
+            // Create in-memory examples as C# native class.
+            var examples = DatasetUtils.GenerateRandomMulticlassClassificationExamples(1000);


Unfortunately, GenerateRandomMulticlassClassificationExamples is not searchable on the doc site, so the only way to fully learn this pipeline is to clone ML.NET. Because SDCA can work with very tiny data set, we could add things like this

private class DataPoint { [VectorType(3)] public float[] Features; } var samples = new List<DataPoint>() { new DataPoint(){ Features= new float[3] {1, 0, 0} }, new DataPoint(){ Features= new float[3] {0, 2, 1} }, new DataPoint(){ Features= new float[3] {1, 2, 3} }, new DataPoint(){ Features= new float[3] {0, 1, 0} }, new DataPoint(){ Features= new float[3] {0, 2, 1} }, new DataPoint(){ Features= new float[3] {-100, 50, -100} } };

into this file and use them. #Resolved

the type is visible if they use the example, and they can inspect the values with the debugger, BUT moving Featurization into the Samples Utils is a real problem..

In reply to: 261013871 [](ancestors = 261013871)

No, we can't expect user will have visual studio, on for example, Linux. I'd say the best case is that user knows everything they need after reading this example. Please take a look at an scikit-learn example.

X = [[0], [1], [2], [3]] Y = [0, 1, 2, 3] clf = svm.SVC(gamma='scale', decision_function_shape='ovo') clf.fit(X, Y)

Does scikit-learn ask user to go outside the text above to understand that example? In addition, those functions are not searchable on ML.NET doc site, which means a big hole to new users. Honestly, I am not sure if SamplesUtils should be used because it hides some vital information and therefore pushes our examples away from those scikit-learn ones (in terms of readibility). #Pending

the function and type are both searchable:

https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.samplesutils.datasetutils.generaterandommulticlassclassificationexamples?view=ml-dotnet#Microsoft_ML_SamplesUtils_DatasetUtils_GenerateRandomMulticlassClassificationExamples_System_Int32_

https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.samplesutils.datasetutils.multiclassclassificationexample?view=ml-dotnet

There no featurization here, b/c we're using randomly generated data. The featurization issue happens for loading data from file.

In reply to: 261051041 [](ancestors = 261051041,261013871)

Please see my latest comment on #2627. Removing SamplesUtils is a design decision that needs to be made first.

Please see also see my response in another comment where I have the doc links. All of DatasetUtils are searchable:
https://docs.microsoft.com/en-us/dotnet/api/?view=ml-dotnet&term=Microsoft.ML.SamplesUtils.DatasetUtils

In reply to: 261068842 [](ancestors = 261068842)

My bad. Even if it's searchable, it still has no meaningful document at this page. #Resolved

Our documentation coverage is low but we're actively working on it, hence this PR. So that page will become meaningful eventually.

In reply to: 261718177 [](ancestors = 261718177)

Please do not put things separated if they are considered a whole example. The organization of the entire documentation will never be organized and learned in a structured way--- this is how computer science world works. Let me give you another example. How would user learn the definition of a vector column by adding VectorType attribute? Assume that he already finds the doc of GenerateRandomMulticlass. He still need to click on the returned type of GenerateRandomMulticlass, which is List<DatasetUtils.MulticlassClassificationExample>. Then, another page will be opened. Where is the vector attribute of my Features? User needs to click on Fields again to open the 3rd page which contains

[Microsoft.ML.Data.VectorType(new System.Int32[] { 10 })] public float[] Features;

Hiding things in this hierarchical way is definitely a learning barrier. #WontFix

As pointed by Shaauheen, let's keep the samples as is for V1. Post V1, we can address this by proper discussions that were canceled in favor of API work. For now, some sample for V1, is better than no sample.

In reply to: 261763964 [](ancestors = 261763964)

sfilipi · 2019-02-28T05:26:37Z

...Microsoft.ML.Samples/Dynamic/Trainers/BinaryClassification/StochasticDualCoordinateAscent.cs

@@ -5,7 +5,7 @@

 namespace Microsoft.ML.Samples.Dynamic.Trainers.BinaryClassification


Microsoft.ML.Samples.Dynamic [](start = 10, length = 28)

let's keep the namespace for all samples Microsoft.ML.Samples.Dynamic. The nesting is unecessary #Resolved

I've explained the reasoning here:
#2729 (comment)

btw, we don't need namespace nesting for transforms. only for trainers, because of naming conflicts.

In reply to: 261050465 [](ancestors = 261050465)

sfilipi · 2019-02-28T05:28:16Z

docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/MulticlassClassification/LightGbm.cs

@@ -17,7 +17,7 @@ public static void Example()
            // Create in-memory examples as C# native class.
            var examples = DatasetUtils.GenerateRandomMulticlassClassificationExamples(1000);

-            // Convert native C# class to IDataView, a consumble format to ML.NET functions.
+            // Convert native C# class to IDataView, a consumable format to ML.NET functions.


native C# class [](start = 23, length = 15)

please spell out the full type: List #Resolved

I added more specificity, without spelling out the full type.

If we want full-types, we should spell them out instead of using var. Because the compiler will make sure it's always correct and catch the changes. Having it in the comments will go stale in no time.

So far we've been using var in the samples, so I'll keep it for consistency.

In reply to: 261050690 [](ancestors = 261050690)

sfilipi · 2019-02-28T05:28:46Z

        // Create in-memory examples as C# native class.

if you can fix this on this file too.. #Resolved

Refers to: docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/MulticlassClassification/LightGbmWithOptions.cs:19 in c5e034b. [](commit_id = c5e034b, deletion_comment = False)

sfilipi · 2019-02-28T05:32:23Z

src/Microsoft.ML.StandardLearners/Standard/doc.xml

+    The following text describes the SDCA algorithm details.
+    It's used for the remarks section of all SDCA-based trainers (binary, multiclass, regression)
+    -->
+    <member name="SDCA_remarks">


[](start = 3, length = 29)

so shall we keep the docs.xml for reuse, than? #Resolved

If we can reduce the usage to 1 by using cref links, we don't need doc.xml. We should use inline documentation.

I this case we have 4 SDCA trainers that all need to explain what SDCA is and we cannot cref them to each other. So I'm keeping doc.xml for it.

In reply to: 261051276 [](ancestors = 261051276)

codecov · 2019-02-28T22:15:56Z

Codecov Report

Merging #2771 into master will decrease coverage by <.01%.
The diff coverage is 0%.

@@            Coverage Diff             @@
##           master    #2771      +/-   ##
==========================================
- Coverage   71.66%   71.65%   -0.01%     
==========================================
  Files         809      809              
  Lines      142378   142383       +5     
  Branches    16119    16119              
==========================================
- Hits       102030   102027       -3     
- Misses      35915    35922       +7     
- Partials     4433     4434       +1

Flag	Coverage Δ
#Debug	`71.65% <0%> (-0.01%)`	⬇️
#production	`67.9% <0%> (-0.01%)`	⬇️
#test	`85.84% <ø> (ø)`	⬆️

Impacted Files	Coverage Δ
...oft.ML.StandardLearners/StandardLearnersCatalog.cs	`88.57% <ø> (ø)`	⬆️
...crosoft.ML.StandardLearners/Standard/SdcaBinary.cs	`72.11% <ø> (ø)`	⬆️
...oft.ML.StandardLearners/Standard/SdcaRegression.cs	`96.05% <ø> (ø)`	⬆️
...oft.ML.StandardLearners/Standard/SdcaMultiClass.cs	`92.22% <ø> (ø)`	⬆️
src/Microsoft.ML.SamplesUtils/ConsoleUtils.cs	`0% <0%> (ø)`	⬆️
src/Microsoft.ML.Maml/MAML.cs	`24.75% <0%> (-1.46%)`	⬇️

shmoradims · 2019-02-28T22:43:19Z

        // Create in-memory examples as C# native class.

please see my response on the other comments.

In reply to: 468143030 [](ancestors = 468143030)

Refers to: docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/MulticlassClassification/LightGbmWithOptions.cs:19 in c5e034b. [](commit_id = c5e034b, deletion_comment = False)

sfilipi

Shahab Moradi added 4 commits February 26, 2019 15:30

Updated SDCA trainers docs in StandardLearnersCatalog

3d65891

Updated trainer options docs.

27e9044

Added samples

22b2ff5

Resolved conflicts with master

89dbc83

shmoradims requested review from sfilipi and rogancarr February 27, 2019 22:06

Shahab Moradi added 2 commits February 27, 2019 14:14

Added samples to API xml

01c795c

Fixed build error

c5e034b

wschin reviewed Feb 27, 2019

View reviewed changes

...L.Samples/Dynamic/Trainers/BinaryClassification/StochasticDualCoordinateAscentWithOptions.cs Show resolved Hide resolved

wschin reviewed Feb 28, 2019

View reviewed changes

sfilipi reviewed Feb 28, 2019

View reviewed changes

Shahab Moradi added 3 commits February 28, 2019 10:59

Resolved conflict with master

d29f0ce

Fixed MultiClassClassifierMetrics.

84e01cd

Fixed build errors in ConsoleUtils.cs

dc1a8bf

Addressed PR comments.

731e4b0

sfilipi approved these changes Mar 4, 2019

View reviewed changes

wschin approved these changes Mar 4, 2019

View reviewed changes

shmoradims merged commit 09f40d0 into dotnet:master Mar 4, 2019

shmoradims mentioned this pull request Mar 4, 2019

Docs and samples for the API reference site (P0 & P1 Trainers) #2522

Closed

shauheen added this to the 0319 milestone Mar 4, 2019

shmoradims deleted the sdca_trainers_docs2 branch March 11, 2019 20:51

ghost locked as resolved and limited conversation to collaborators Mar 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docs & samples for SDCA-based trainers #2771

Docs & samples for SDCA-based trainers #2771

shmoradims commented Feb 27, 2019

wschin Feb 28, 2019 •

edited by shmoradims

Loading

sfilipi Feb 28, 2019

wschin Feb 28, 2019 •

edited by shmoradims

Loading

shmoradims Feb 28, 2019

shmoradims Feb 28, 2019

wschin Mar 1, 2019 •

edited by shmoradims

Loading

shmoradims Mar 1, 2019

wschin Mar 1, 2019 •

edited by shmoradims

Loading

shmoradims Mar 4, 2019

sfilipi Feb 28, 2019 •

edited by shmoradims

Loading

shmoradims Feb 28, 2019 •

edited

Loading

sfilipi Feb 28, 2019 •

edited by shmoradims

Loading

shmoradims Feb 28, 2019

sfilipi commented Feb 28, 2019 •

edited by shmoradims

Loading

sfilipi Feb 28, 2019 •

edited by shmoradims

Loading

shmoradims Feb 28, 2019

codecov bot commented Feb 28, 2019 •

edited

Loading

shmoradims commented Feb 28, 2019

sfilipi left a comment

		@@ -5,7 +5,7 @@

		namespace Microsoft.ML.Samples.Dynamic.Trainers.BinaryClassification

Docs & samples for SDCA-based trainers #2771

Docs & samples for SDCA-based trainers #2771

Conversation

shmoradims commented Feb 27, 2019

wschin Feb 28, 2019 • edited by shmoradims Loading

Choose a reason for hiding this comment

sfilipi Feb 28, 2019

Choose a reason for hiding this comment

wschin Feb 28, 2019 • edited by shmoradims Loading

Choose a reason for hiding this comment

shmoradims Feb 28, 2019

Choose a reason for hiding this comment

shmoradims Feb 28, 2019

Choose a reason for hiding this comment

wschin Mar 1, 2019 • edited by shmoradims Loading

Choose a reason for hiding this comment

shmoradims Mar 1, 2019

Choose a reason for hiding this comment

wschin Mar 1, 2019 • edited by shmoradims Loading

Choose a reason for hiding this comment

shmoradims Mar 4, 2019

Choose a reason for hiding this comment

sfilipi Feb 28, 2019 • edited by shmoradims Loading

Choose a reason for hiding this comment

shmoradims Feb 28, 2019 • edited Loading

Choose a reason for hiding this comment

sfilipi Feb 28, 2019 • edited by shmoradims Loading

Choose a reason for hiding this comment

shmoradims Feb 28, 2019

Choose a reason for hiding this comment

sfilipi commented Feb 28, 2019 • edited by shmoradims Loading

sfilipi Feb 28, 2019 • edited by shmoradims Loading

Choose a reason for hiding this comment

shmoradims Feb 28, 2019

Choose a reason for hiding this comment

codecov bot commented Feb 28, 2019 • edited Loading

Codecov Report

shmoradims commented Feb 28, 2019

sfilipi left a comment

Choose a reason for hiding this comment

wschin Feb 28, 2019 •

edited by shmoradims

Loading

wschin Feb 28, 2019 •

edited by shmoradims

Loading

wschin Mar 1, 2019 •

edited by shmoradims

Loading

wschin Mar 1, 2019 •

edited by shmoradims

Loading

sfilipi Feb 28, 2019 •

edited by shmoradims

Loading

shmoradims Feb 28, 2019 •

edited

Loading

sfilipi Feb 28, 2019 •

edited by shmoradims

Loading

sfilipi commented Feb 28, 2019 •

edited by shmoradims

Loading

sfilipi Feb 28, 2019 •

edited by shmoradims

Loading

codecov bot commented Feb 28, 2019 •

edited

Loading