Skip to content

Added samples & docs for BinaryClassification.StochasticGradientDescent #2688

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Feb 25, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ namespace Microsoft.ML.Samples.Dynamic.Trainers.BinaryClassification
public static class AveragedPerceptron
{
// In this examples we will use the adult income dataset. The goal is to predict
// if a person's income is above $50K or not, based on different pieces of information about that person.
// if a person's income is above $50K or not, based on demographic information about that person.
// For more details about this dataset, please see https://archive.ics.uci.edu/ml/datasets/adult.
public static void Example()
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ namespace Microsoft.ML.Samples.Dynamic.Trainers.BinaryClassification
public static class AveragedPerceptronWithOptions
{
// In this examples we will use the adult income dataset. The goal is to predict
// if a person's income is above $50K or not, based on different pieces of information about that person.
// if a person's income is above $50K or not, based on demographic information about that person.
// For more details about this dataset, please see https://archive.ics.uci.edu/ml/datasets/adult.
public static void Example()
{
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
using Microsoft.ML;

namespace Microsoft.ML.Samples.Dynamic.Trainers.BinaryClassification
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.Trainers.BinaryClassification [](start = 38, length = 30)

it's much easier to call StochasticGradientDescent.Example in Program.cs if it's a truncated namespace. #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the pattern I've been using is to have the class name be the same as the API name, in this case StochasticGradientDescent. In many cases we have the same trainer in multiple catalogs. So we either have to keep the namespaces distinct, or change the class names here to StochasticGradientDescentBinary or StochasticGradientDescentBinaryClassificaiton. I prefer to use the names spaces and mirror the catalog structure. Makes sense?


In reply to: 259442659 [](ancestors = 259442659)

{
public static class StochasticGradientDescent
{
// In this examples we will use the adult income dataset. The goal is to predict
// if a person's income is above $50K or not, based on demographic information about that person.
// For more details about this dataset, please see https://archive.ics.uci.edu/ml/datasets/adult.
Copy link
Contributor

@rogancarr rogancarr Feb 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://archive.ics.uci.edu/ml/datasets/adult [](start = 59, length = 45)

Broken link. Or the site is down right now.... #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they're just temporarily down. This is prominent link. It's the first google result for "uci adult".


In reply to: 259450232 [](ancestors = 259450232)

public static void Example()
{
// Create a new context for ML.NET operations. It can be used for exception tracking and logging,
// as a catalog of available operations and as the source of randomness.
// Setting the seed to a fixed number in this example to make outputs deterministic.
var mlContext = new MLContext(seed: 0);

// Download and featurize the dataset.
var data = SamplesUtils.DatasetUtils.LoadFeaturizedAdultDataset(mlContext);

// Leave out 10% of data for testing.
var trainTestData = mlContext.BinaryClassification.TrainTestSplit(data, testFraction: 0.1);

// Create data training pipeline.
var pipeline = mlContext.BinaryClassification.Trainers.StochasticGradientDescent();

// Fit this pipeline to the training data.
var model = pipeline.Fit(trainTestData.TrainSet);

// Evaluate how the model is doing on the test data.
var dataWithPredictions = model.Transform(trainTestData.TestSet);
var metrics = mlContext.BinaryClassification.Evaluate(dataWithPredictions);
SamplesUtils.ConsoleUtils.PrintMetrics(metrics);

// Expected output:
// Accuracy: 0.85
// AUC: 0.90
// F1 Score: 0.67
// Negative Precision: 0.90
// Negative Recall: 0.91
// Positive Precision: 0.68
// Positive Recall: 0.65
// LogLoss: 0.48
// LogLossReduction: 38.31
// Entropy: 0.78
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
using Microsoft.ML;
using Microsoft.ML.Trainers;

namespace Microsoft.ML.Samples.Dynamic.Trainers.BinaryClassification
{
public static class StochasticGradientDescentWithOptions
{
// In this examples we will use the adult income dataset. The goal is to predict
// if a person's income is above $50K or not, based on demographic information about that person.
// For more details about this dataset, please see https://archive.ics.uci.edu/ml/datasets/adult.
public static void Example()
{
// Create a new context for ML.NET operations. It can be used for exception tracking and logging,
// as a catalog of available operations and as the source of randomness.
// Setting the seed to a fixed number in this example to make outputs deterministic.
var mlContext = new MLContext(seed: 0);

// Download and featurize the dataset.
var data = SamplesUtils.DatasetUtils.LoadFeaturizedAdultDataset(mlContext);
Copy link
Member

@wschin wschin Feb 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add examples using in-memory data structures if possible and show how to inspect those in-memory examples' predictions. If you search for "InMemory" in VS Text Explorer, you will find a few examples. #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these are in the API docs on the website, I actually would prefer to have the data loading part to be terse. Let's make other samples focused on the data loading aspects, and keep this spare.


In reply to: 259176526 [](ancestors = 259176526)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Rogan. That would be a scenario tutorial than API sample.


In reply to: 259451392 [](ancestors = 259451392,259176526)

Copy link
Member

@wschin wschin Feb 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is API doc for documentation website? If yes, it makes my feeling even stronger --- ideally, to fit this example into user's own scenario, user should be able to just make minor changes. Having a text loader decreases the flexibility of this example and forces user to go outside Visual Studio because they need to prepare a text file and make sure that file can be loaded correctly. Thinking about scikit-learn, I don't find many of them using text file and it's super easy to start working on their modules. #Pending

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following this comment. LoadFeaturizedAdultDataset downloads the dataset and loads it into memory. It's functionally similar to sklearn datasets module which is used in many of the sklearn examples.

from sklearn import datasets
iris = datasets.load_iris()
digits = datasets.load_digits()

And we have SamplesUtils.DatasetUtils.LoadFeaturizedAdultDataset().

Users are able to copy-paste these samples and run them as-is. Please IM me if you want to further discuss this. :)


In reply to: 259558223 [](ancestors = 259558223)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be using this same template for all the new API samples. If you think there's a better template, please let me know soon.


In reply to: 259947957 [](ancestors = 259947957,259558223)


// Leave out 10% of data for testing.
var trainTestData = mlContext.BinaryClassification.TrainTestSplit(data, testFraction: 0.1);

// Define the trainer options.
var options = new SgdBinaryTrainer.Options()
{
// Make the convergence tolerance tighter.
ConvergenceTolerance = 5e-5,
// Increase the maximum number of passes over training data.
MaxIterations = 30,
// Give the instances of the positive class slightly more weight.
PositiveInstanceWeight = 1.2f,
};

// Create data training pipeline.
var pipeline = mlContext.BinaryClassification.Trainers.StochasticGradientDescent(options);

// Fit this pipeline to the training data.
var model = pipeline.Fit(trainTestData.TrainSet);

// Evaluate how the model is doing on the test data.
var dataWithPredictions = model.Transform(trainTestData.TestSet);
Copy link
Member

@wschin wschin Feb 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Print out metric is a bit far from practical uses in production where we create prediction per example and then make decision based those prediction values. #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think purpose of this samples is to show up how you can call specific trainer. And set different in options.
What you want is slightly different thing and should be covered by highlevel documentation and cookbook.


In reply to: 259176767 [](ancestors = 259176767)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ivan is correct. The API samples have a narrow scope to showcase how to use a single API. We have tutorials, and end-to-end samples repo that cover the practical cases, which involve using multiple APIs.


In reply to: 259443273 [](ancestors = 259443273,259176767)

var metrics = mlContext.BinaryClassification.Evaluate(dataWithPredictions);
SamplesUtils.ConsoleUtils.PrintMetrics(metrics);

// Expected output:
// Accuracy: 0.85
// AUC: 0.90
// F1 Score: 0.67
// Negative Precision: 0.91
// Negative Recall: 0.89
// Positive Precision: 0.65
// Positive Recall: 0.70
// LogLoss: 0.48
// LogLossReduction: 37.52
// Entropy: 0.78
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ public static class SymbolicStochasticGradientDescent
{
// This example requires installation of additional nuget package <a href="https://www.nuget.org/packages/Microsoft.ML.HalLearners/">Microsoft.ML.HalLearners</a>.
// In this example we will use the adult income dataset. The goal is to predict
// if a person's income is above $50K or not, based on different pieces of information about that person.
// if a person's income is above $50K or not, based on demographic information about that person.
// For more details about this dataset, please see https://archive.ics.uci.edu/ml/datasets/adult
public static void Example()
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ public static class SymbolicStochasticGradientDescentWithOptions
{
// This example requires installation of additional nuget package <a href="https://www.nuget.org/packages/Microsoft.ML.HalLearners/">Microsoft.ML.HalLearners</a>.
// In this example we will use the adult income dataset. The goal is to predict
// if a person's income is above $50K or not, based on different pieces of information about that person.
// if a person's income is above $50K or not, based on demographic information about that person.
// For more details about this dataset, please see https://archive.ics.uci.edu/ml/datasets/adult
public static void Example()
{
Expand Down
2 changes: 1 addition & 1 deletion src/Microsoft.ML.Data/EntryPoints/InputBase.cs
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ public abstract class LearnerInputBaseWithLabel : LearnerInputBase
public abstract class LearnerInputBaseWithWeight : LearnerInputBaseWithLabel
{
/// <summary>
/// Column to use for example weight.
/// The name of the example weight column.
/// </summary>
[Argument(ArgumentType.AtMostOnce, HelpText = "Column to use for example weight", ShortName = "weight", SortOrder = 4, Visibility = ArgumentAttribute.VisibilityType.EntryPointsOnly)]
public string WeightColumn = null;
Expand Down
12 changes: 12 additions & 0 deletions src/Microsoft.ML.SamplesUtils/ConsoleUtils.cs
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,18 @@ public static void PrintMetrics(BinaryClassificationMetrics metrics)
Console.WriteLine($"Positive Recall: {metrics.PositiveRecall:F2}");
}

/// <summary>
/// Pretty-print CalibratedBinaryClassificationMetrics objects.
/// </summary>
/// <param name="metrics"><see cref="CalibratedBinaryClassificationMetrics"/> object.</param>
public static void PrintMetrics(CalibratedBinaryClassificationMetrics metrics)
{
PrintMetrics(metrics as BinaryClassificationMetrics);
Console.WriteLine($"LogLoss: {metrics.LogLoss:F2}");
Console.WriteLine($"LogLossReduction: {metrics.LogLossReduction:F2}");
Console.WriteLine($"Entropy: {metrics.Entropy:F2}");
}

/// <summary>
/// Pretty-print RegressionMetrics objects.
/// </summary>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ public abstract class AveragedLinearOptions : OnlineLinearOptions
public bool DoLazyUpdates = true;

/// <summary>
/// L2 weight for <a href='tmpurl_regularization'>regularization</a>.
/// The L2 weight for <a href='tmpurl_regularization'>regularization</a>.
/// </summary>
[Argument(ArgumentType.AtMostOnce, HelpText = "L2 Regularization Weight", ShortName = "reg", SortOrder = 50)]
[TGUI(Label = "L2 Regularization Weight")]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ public sealed class AveragedPerceptronTrainer : AveragedLinearTrainer<BinaryPred
private readonly Options _args;

/// <summary>
/// Options for the averaged perceptron trainer.
/// Options for the <see cref="AveragedPerceptronTrainer"/>.
/// </summary>
public sealed class Options : AveragedLinearOptions
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ public abstract class OnlineLinearOptions : LearnerInputBaseWithLabel
/// <summary>
/// Number of passes through the training dataset.
/// </summary>
[Argument(ArgumentType.AtMostOnce, HelpText = "Number of iterations", ShortName = "iter, numIterations", SortOrder = 50)]
[Argument(ArgumentType.AtMostOnce, HelpText = "Number of iterations", ShortName = "iter,numIterations", SortOrder = 50)]
[TGUI(Label = "Number of Iterations", Description = "Number of training iterations through data", SuggestedSweeps = "1,10,100")]
[TlcModule.SweepableLongParamAttribute("NumIterations", 1, 100, stepSize: 10, isLogScale: true)]
public int NumberOfIterations = OnlineDefault.NumIterations;
Expand All @@ -43,7 +43,7 @@ public abstract class OnlineLinearOptions : LearnerInputBaseWithLabel
/// This property is only used if the provided value is positive and <see cref="InitialWeights"/> is not specified.
/// The weights and bias will be randomly selected from InitialWeights * [-0.5,0.5] interval with uniform distribution.
/// </value>
[Argument(ArgumentType.AtMostOnce, HelpText = "Init weights diameter", ShortName = "initwts, initWtsDiameter", SortOrder = 140)]
[Argument(ArgumentType.AtMostOnce, HelpText = "Init weights diameter", ShortName = "initwts,initWtsDiameter", SortOrder = 140)]
[TGUI(Label = "Initial Weights Scale", SuggestedSweeps = "0,0.1,0.5,1")]
[TlcModule.SweepableFloatParamAttribute("InitWtsDiameter", 0.0f, 1.0f, numSteps: 5)]
public float InitialWeightsDiameter = 0;
Expand Down
55 changes: 52 additions & 3 deletions src/Microsoft.ML.StandardLearners/Standard/SdcaBinary.cs
Original file line number Diff line number Diff line change
Expand Up @@ -1723,36 +1723,77 @@ public abstract class SgdBinaryTrainerBase<TModel> :
{
public class OptionsBase : LearnerInputBaseWithWeight
{
/// <summary>
/// The L2 weight for <a href='tmpurl_regularization'>regularization</a>.
/// </summary>
[Argument(ArgumentType.AtMostOnce, HelpText = "L2 Regularization constant", ShortName = "l2", SortOrder = 50)]
[TGUI(Label = "L2 Regularization Constant", SuggestedSweeps = "1e-7,5e-7,1e-6,5e-6,1e-5")]
[TlcModule.SweepableDiscreteParam("L2Const", new object[] { 1e-7f, 5e-7f, 1e-6f, 5e-6f, 1e-5f })]
public float L2Weight = Defaults.L2Weight;

/// <summary>
/// The degree of lock-free parallelism used by SGD.
/// </summary>
/// <value>
/// Defaults to automatic depending on data sparseness. Determinism is not guaranteed.
/// </value>
[Argument(ArgumentType.AtMostOnce, HelpText = "Degree of lock-free parallelism. Defaults to automatic depending on data sparseness. Determinism not guaranteed.", ShortName = "nt,t,threads", SortOrder = 50)]
[TGUI(Label = "Number of threads", SuggestedSweeps = "1,2,4")]
public int? NumThreads;

/// <summary>
/// The convergence tolerance. If the exponential moving average of loss reductions falls below this tolerance,
/// the algorithm is deemed to have converged and will stop.
/// </summary>
[Argument(ArgumentType.AtMostOnce, HelpText = "Exponential moving averaged improvement tolerance for convergence", ShortName = "tol")]
[TGUI(SuggestedSweeps = "1e-2,1e-3,1e-4,1e-5")]
[TlcModule.SweepableDiscreteParam("ConvergenceTolerance", new object[] { 1e-2f, 1e-3f, 1e-4f, 1e-5f })]
public double ConvergenceTolerance = 1e-4;

/// <summary>
/// The maximum number of passes through the training dataset.
/// </summary>
/// <value>
/// Set to 1 to simulate online learning.
/// </value>
[Argument(ArgumentType.AtMostOnce, HelpText = "Maximum number of iterations; set to 1 to simulate online learning.", ShortName = "iter")]
[TGUI(Label = "Max number of iterations", SuggestedSweeps = "1,5,10,20")]
[TlcModule.SweepableDiscreteParam("MaxIterations", new object[] { 1, 5, 10, 20 })]
public int MaxIterations = Defaults.MaxIterations;

/// <summary>
/// The initial <a href="tmpurl_lr">learning rate</a> used by SGD.
/// </summary>
[Argument(ArgumentType.AtMostOnce, HelpText = "Initial learning rate (only used by SGD)", ShortName = "ilr,lr")]
[TGUI(Label = "Initial Learning Rate (for SGD)")]
public double InitLearningRate = Defaults.InitLearningRate;

/// <summary>
/// Determines whether to shuffle data for each training iteration.
/// </summary>
/// <value>
/// <see langword="true" /> to shuffle data for each training iteration; otherwise, <see langword="false" />.
/// Default is <see langword="true" />.
/// </value>
[Argument(ArgumentType.AtMostOnce, HelpText = "Shuffle data every epoch?", ShortName = "shuf")]
[TlcModule.SweepableDiscreteParam("Shuffle", null, isBool: true)]
public bool Shuffle = true;

/// <summary>
/// The weight to be applied to the positive class. This is useful for training with imbalanced data.
/// </summary>
/// <value>
/// Default value is 1, which means no extra weight.
/// </value>
[Argument(ArgumentType.AtMostOnce, HelpText = "Apply weight to the positive class, for imbalanced data", ShortName = "piw")]
public float PositiveInstanceWeight = 1;

/// <summary>
/// Determines the frequency of checking for convergence in terms of number of iterations.
/// </summary>
/// <value>
/// Default equals <see cref="NumThreads"/>."
/// </value>
[Argument(ArgumentType.AtMostOnce, HelpText = "Convergence check frequency (in terms of number of iterations). Default equals number of threads", ShortName = "checkFreq")]
public int? CheckFrequency;

Expand Down Expand Up @@ -1802,7 +1843,7 @@ internal static class Defaults
/// <param name="env">The environment to use.</param>
/// <param name="featureColumn">The name of the feature column.</param>
/// <param name="labelColumn">The name of the label column.</param>
/// <param name="weightColumn">The name for the example weight column.</param>
/// <param name="weightColumn">The name of the example weight column.</param>
/// <param name="maxIterations">The maximum number of iterations; set to 1 to simulate online learning.</param>
/// <param name="initLearningRate">The initial learning rate used by SGD.</param>
/// <param name="l2Weight">The L2 regularizer constant.</param>
Expand Down Expand Up @@ -2077,13 +2118,21 @@ private protected override void CheckLabel(RoleMappedData examples, out int weig
}

/// <summary>
/// Train logistic regression using a parallel stochastic gradient method.
/// The <see cref="IEstimator{TTransformer}"/> for training logistic regression using a parallel stochastic gradient method.
/// The trained model is <a href='tmpurl_calib'>calibrated</a> and can produce probability by feeding the output value of the
/// linear function to a <see cref="PlattCalibrator"/>.
/// </summary>
/// <remarks>
/// The Stochastic Gradient Descent (SGD) is one of the popular stochastic optimization procedures that can be integrated
/// into several machine learning tasks to achieve state-of-the-art performance. This trainer implements the Hogwild SGD for binary classification
/// that supports multi-threading without any locking. If the associated optimization problem is sparse, Hogwild SGD achieves a nearly optimal
/// rate of convergence. For more details about Hogwild SGD, please refer to http://arxiv.org/pdf/1106.5730v2.pdf.
/// </remarks>
public sealed class SgdBinaryTrainer :
SgdBinaryTrainerBase<CalibratedModelParametersBase<LinearBinaryModelParameters, PlattCalibrator>>
{
/// <summary>
/// Options available to training logistic regression using the implemented stochastic gradient method.
/// Options for the <see cref="SgdBinaryTrainer"/>.
/// </summary>
public sealed class Options : OptionsBase
{
Expand Down
Loading