Skip to content

Added OneVersusAll and PairwiseCoupling samples. #3159

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Apr 5, 2019
Merged

Added OneVersusAll and PairwiseCoupling samples. #3159

merged 11 commits into from
Apr 5, 2019

Conversation

ganik
Copy link
Member

@ganik ganik commented Apr 1, 2019

Part of #2522.
Adds a sample for OneVersusAll classification.
Adds a sample for PairwiseCoupling classification.

@ganik ganik changed the title OneVersusAll sample Add OneVersusAll and PairwiseCoupling samples Apr 1, 2019
@ganik ganik changed the title Add OneVersusAll and PairwiseCoupling samples Added OneVersusAll and PairwiseCoupling samples Apr 1, 2019
@ganik ganik changed the title Added OneVersusAll and PairwiseCoupling samples Added OneVersusAll and PairwiseCoupling samples. Apr 1, 2019
@codecov
Copy link

codecov bot commented Apr 1, 2019

Codecov Report

Merging #3159 into master will increase coverage by 0.1%.
The diff coverage is n/a.

@@            Coverage Diff            @@
##           master    #3159     +/-   ##
=========================================
+ Coverage   72.53%   72.64%   +0.1%     
=========================================
  Files         808      807      -1     
  Lines      144740   145080    +340     
  Branches    16202    16213     +11     
=========================================
+ Hits       104986   105391    +405     
+ Misses      35343    35271     -72     
- Partials     4411     4418      +7
Flag Coverage Δ
#Debug 72.64% <ø> (+0.1%) ⬆️
#production 68.19% <ø> (+0.07%) ⬆️
#test 88.92% <ø> (+0.1%) ⬆️
Impacted Files Coverage Δ
...oft.ML.StandardTrainers/StandardTrainersCatalog.cs 92.34% <ø> (+3.27%) ⬆️
src/Microsoft.ML.DataView/KeyDataViewType.cs 74.57% <0%> (-3.76%) ⬇️
...rosoft.ML.Data/Scorers/PredictedLabelScorerBase.cs 81.71% <0%> (-0.62%) ⬇️
src/Microsoft.ML.Data/Transforms/ValueMapping.cs 84.26% <0%> (-0.14%) ⬇️
test/Microsoft.ML.Tests/ImagesTests.cs 98.69% <0%> (-0.13%) ⬇️
src/Microsoft.ML.Transforms/CategoricalCatalog.cs 68.42% <0%> (ø) ⬆️
...osoft.ML.Recommender/SafeTrainingAndModelBuffer.cs 78.87% <0%> (ø) ⬆️
...ML.Tests/TrainerEstimators/MetalinearEstimators.cs 100% <0%> (ø) ⬆️
src/Microsoft.ML.Data/Transforms/Normalizer.cs 86.03% <0%> (ø) ⬆️
...Microsoft.ML.Transforms/FeatureSelectionCatalog.cs 60% <0%> (ø) ⬆️
... and 28 more

{
public static class OneVersusAll
{
public static void Example()
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Apr 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example [](start = 27, length = 7)

you probably want to link this file to extension method.

     /// <format type="text/markdown">
        /// <![CDATA[
        ///  [!code-csharp[SDCA](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/MulticlassClassification/StochasticDualCoordinateAscentWithOptions.cs)]
        /// ]]></format>
        /// </example>
``` #Resolved

// Convert the string labels into key types.
mlContext.Transforms.Conversion.MapValueToKey("Label")
// Apply OneVersusAll multiclass trainer on top of SDCA Logistic Regression binary trainer.
.Append(mlContext.MulticlassClassification.Trainers.OneVersusAll(mlContext.BinaryClassification.Trainers.SdcaLogisticRegression()));
Copy link
Member

@sfilipi sfilipi Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually multiclass pipelines add a MapKeyToValue at the end. #ByDesign

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe, I dont need it here


In reply to: 271136620 [](ancestors = 271136620)

// Train the model.
var model = pipeline.Fit(split.TrainSet);

// Do prediction on the test set.
Copy link
Member

@sfilipi sfilipi Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do [](start = 15, length = 2)

Generate #Resolved

// Micro Accuracy: 0.77
// Macro Accuracy: 0.75
// Log Loss: 0.69
// Log Loss Reduction: 0.49
Copy link
Member

@sfilipi sfilipi Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Log Loss Reduction: 0.49 [](start = 12, length = 29)

add reading the PredictedLabel column. #Resolved

// Micro Accuracy: 0.75
// Macro Accuracy: 0.73
// Log Loss: 0.70
// Log Loss Reduction: 0.49
Copy link
Member

@sfilipi sfilipi Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, add reading the PredictedLabel. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


In reply to: 271136850 [](ancestors = 271136850)

var mlContext = new MLContext(seed: 0);

// Create a list of data examples.
var examples = DatasetUtils.GenerateRandomMulticlassClassificationExamples(1000);
Copy link

@shmoradims shmoradims Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GenerateRandomMulticlassClassificationExamples [](start = 40, length = 46)

is it possible to use something like GenerateRandomDataPoints that's used in binary classification?

private static IEnumerable<DataPoint> GenerateRandomDataPoints(int count, int seed=0)
#Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method generates data points for multi class, with 4 labels. Except for this it does do it similarly to the GenerateRandomDataPoints. Not sure what other similarity you would want?


In reply to: 271461899 [](ancestors = 271461899)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is actually done, thx


In reply to: 271497239 [](ancestors = 271497239,271461899)

@shmoradims
Copy link

shmoradims commented Apr 2, 2019

using Microsoft.ML.Data;

consider using T4 templates if you see a lot of duplicate code across multi-class samples. Below is a T4 we used for regression. All *.cs files will be autogenerated from the .tt template file.
#3099 #Resolved


Refers to: docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/MulticlassClassification/OneVersusAll.cs:1 in 01dccde. [](commit_id = 01dccde, deletion_comment = False)

@ganik
Copy link
Member Author

ganik commented Apr 2, 2019

using Microsoft.ML.Data;

Good suggestions, done.


In reply to: 479163541 [](ancestors = 479163541)


Refers to: docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/MulticlassClassification/OneVersusAll.cs:1 in 01dccde. [](commit_id = 01dccde, deletion_comment = False)

private class DataPoint
{
public uint Label { get; set; }
[VectorType(20)]
Copy link
Member

@sfilipi sfilipi Apr 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[VectorType(20)] [](start = 11, length = 17)

is the annotation necessary? #ByDesign

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, needed for schema check


In reply to: 271907075 [](ancestors = 271907075)

Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

var metrics = mlContext.MulticlassClassification.Evaluate(transformedTestData);
SamplesUtils.ConsoleUtils.PrintMetrics(metrics);

// Expected output:
Copy link

@shmoradims shmoradims Apr 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Expected output: [](start = 12, length = 19)

how come this line is repeated? #Resolved

// Look at 5 predictions
foreach (var p in predictions.Take(5))
Console.WriteLine($"Label: {p.Label}, Prediction: {p.PredictedLabel}");

Copy link

@shmoradims shmoradims Apr 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add ExpectedOutputPerInstance after this #Resolved

string Comments= "";

string ExpectedOutputPerInstance= @"// Expected output:
// Label: 1, Prediction: 2
Copy link

@shmoradims shmoradims Apr 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Label: 1, Prediction: 2 [](start = 17, length = 23)

how come generated labels are 0,1,2 but here I see 1,2,3. how did they get changed? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, generated labels are 1,2,3


In reply to: 271938136 [](ancestors = 271938136)


string ExpectedOutput = @"// Expected output:
// Expected output:
// Micro Accuracy: 0.35
Copy link

@shmoradims shmoradims Apr 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.35 [](start = 30, length = 5)

can we get something above 60%? this is much worse that the other two. #ByDesign

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried, we cant, this is a linear model :)


In reply to: 271938623 [](ancestors = 271938623)

// Micro Accuracy: 0.35
// Macro Accuracy: 0.33
// Log Loss: 34.54
// Log Loss Reduction: -30.47
Copy link

@shmoradims shmoradims Apr 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we usually indent the lines below Expected output with an extra space. #Resolved

<#=OptionsInclude#>
<# } #>

namespace Microsoft.ML.Samples.Dynamic.Trainers.MulticlassClassification
Copy link

@shmoradims shmoradims Apr 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Microsoft.ML [](start = 10, length = 12)

please drop Microsoft.ML prefix as per #3205 #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


In reply to: 272754353 [](ancestors = 272754353)

Copy link

@shmoradims shmoradims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@ganik ganik merged commit f19b560 into dotnet:master Apr 5, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants