-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Added OneVersusAll and PairwiseCoupling samples. #3159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3159 +/- ##
=========================================
+ Coverage 72.53% 72.64% +0.1%
=========================================
Files 808 807 -1
Lines 144740 145080 +340
Branches 16202 16213 +11
=========================================
+ Hits 104986 105391 +405
+ Misses 35343 35271 -72
- Partials 4411 4418 +7
|
{ | ||
public static class OneVersusAll | ||
{ | ||
public static void Example() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example [](start = 27, length = 7)
you probably want to link this file to extension method.
/// <format type="text/markdown">
/// <]
/// ]]></format>
/// </example>
``` #Resolved
// Convert the string labels into key types. | ||
mlContext.Transforms.Conversion.MapValueToKey("Label") | ||
// Apply OneVersusAll multiclass trainer on top of SDCA Logistic Regression binary trainer. | ||
.Append(mlContext.MulticlassClassification.Trainers.OneVersusAll(mlContext.BinaryClassification.Trainers.SdcaLogisticRegression())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
usually multiclass pipelines add a MapKeyToValue at the end. #ByDesign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Train the model. | ||
var model = pipeline.Fit(split.TrainSet); | ||
|
||
// Do prediction on the test set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do [](start = 15, length = 2)
Generate #Resolved
// Micro Accuracy: 0.77 | ||
// Macro Accuracy: 0.75 | ||
// Log Loss: 0.69 | ||
// Log Loss Reduction: 0.49 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Log Loss Reduction: 0.49 [](start = 12, length = 29)
add reading the PredictedLabel column. #Resolved
// Micro Accuracy: 0.75 | ||
// Macro Accuracy: 0.73 | ||
// Log Loss: 0.70 | ||
// Log Loss Reduction: 0.49 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, add reading the PredictedLabel. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var mlContext = new MLContext(seed: 0); | ||
|
||
// Create a list of data examples. | ||
var examples = DatasetUtils.GenerateRandomMulticlassClassificationExamples(1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GenerateRandomMulticlassClassificationExamples [](start = 40, length = 46)
is it possible to use something like GenerateRandomDataPoints that's used in binary classification?
machinelearning/docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/BinaryClassification/FastTree.cs
Line 68 in 0ce5618
private static IEnumerable<DataPoint> GenerateRandomDataPoints(int count, int seed=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method generates data points for multi class, with 4 labels. Except for this it does do it similarly to the GenerateRandomDataPoints. Not sure what other similarity you would want?
In reply to: 271461899 [](ancestors = 271461899)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider using T4 templates if you see a lot of duplicate code across multi-class samples. Below is a T4 we used for regression. All *.cs files will be autogenerated from the .tt template file. Refers to: docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/MulticlassClassification/OneVersusAll.cs:1 in 01dccde. [](commit_id = 01dccde, deletion_comment = False) |
Add NaiveBayes
private class DataPoint | ||
{ | ||
public uint Label { get; set; } | ||
[VectorType(20)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[VectorType(20)] [](start = 11, length = 17)
is the annotation necessary? #ByDesign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var metrics = mlContext.MulticlassClassification.Evaluate(transformedTestData); | ||
SamplesUtils.ConsoleUtils.PrintMetrics(metrics); | ||
|
||
// Expected output: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Expected output: [](start = 12, length = 19)
how come this line is repeated? #Resolved
// Look at 5 predictions | ||
foreach (var p in predictions.Take(5)) | ||
Console.WriteLine($"Label: {p.Label}, Prediction: {p.PredictedLabel}"); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add ExpectedOutputPerInstance after this #Resolved
string Comments= ""; | ||
|
||
string ExpectedOutputPerInstance= @"// Expected output: | ||
// Label: 1, Prediction: 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Label: 1, Prediction: 2 [](start = 17, length = 23)
how come generated labels are 0,1,2 but here I see 1,2,3. how did they get changed? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
string ExpectedOutput = @"// Expected output: | ||
// Expected output: | ||
// Micro Accuracy: 0.35 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0.35 [](start = 30, length = 5)
can we get something above 60%? this is much worse that the other two. #ByDesign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Micro Accuracy: 0.35 | ||
// Macro Accuracy: 0.33 | ||
// Log Loss: 34.54 | ||
// Log Loss Reduction: -30.47 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we usually indent the lines below Expected output with an extra space. #Resolved
<#=OptionsInclude#> | ||
<# } #> | ||
|
||
namespace Microsoft.ML.Samples.Dynamic.Trainers.MulticlassClassification |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Microsoft.ML [](start = 10, length = 12)
please drop Microsoft.ML prefix as per #3205 #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of #2522.
Adds a sample for OneVersusAll classification.
Adds a sample for PairwiseCoupling classification.