Skip to content

Docs & samples for SDCA-based trainers #2771

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Mar 4, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 0 additions & 43 deletions docs/samples/Microsoft.ML.Samples/Dynamic/SDCARegression.cs

This file was deleted.

Original file line number Diff line number Diff line change
@@ -5,7 +5,7 @@

namespace Microsoft.ML.Samples.Dynamic.Trainers.BinaryClassification
Copy link
Member

@sfilipi sfilipi Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Microsoft.ML.Samples.Dynamic [](start = 10, length = 28)

let's keep the namespace for all samples Microsoft.ML.Samples.Dynamic. The nesting is unecessary #Resolved

Copy link
Author

@shmoradims shmoradims Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've explained the reasoning here:
#2729 (comment)

btw, we don't need namespace nesting for transforms. only for trainers, because of naming conflicts.


In reply to: 261050465 [](ancestors = 261050465)

{
public static class SDCALogisticRegression
public static class StochasticDualCoordinateAscent
{
public static void Example()
{
Original file line number Diff line number Diff line change
@@ -4,7 +4,7 @@

namespace Microsoft.ML.Samples.Dynamic.Trainers.BinaryClassification
{
public static class SDCASupportVectorMachine
public static class StochasticDualCoordinateAscentNonCalibrated
{
public static void Example()
{
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
using Microsoft.ML;
using Microsoft.ML.Trainers;

namespace Microsoft.ML.Samples.Dynamic.Trainers.BinaryClassification
{
public static class StochasticDualCoordinateAscentWithOptions
{
// In this examples we will use the adult income dataset. The goal is to predict
// if a person's income is above $50K or not, based on demographic information about that person.
// For more details about this dataset, please see https://archive.ics.uci.edu/ml/datasets/adult.
public static void Example()
{
// Create a new context for ML.NET operations. It can be used for exception tracking and logging,
// as a catalog of available operations and as the source of randomness.
// Setting the seed to a fixed number in this example to make outputs deterministic.
var mlContext = new MLContext(seed: 0);

// Download and featurize the dataset.
var data = SamplesUtils.DatasetUtils.LoadFeaturizedAdultDataset(mlContext);

// Leave out 10% of data for testing.
var trainTestData = mlContext.BinaryClassification.TrainTestSplit(data, testFraction: 0.1);

// Define the trainer options.
var options = new SdcaBinaryTrainer.Options()
{
// Make the convergence tolerance tighter.
ConvergenceTolerance = 0.05f,
// Increase the maximum number of passes over training data.
MaxIterations = 30,
// Give the instances of the positive class slightly more weight.
PositiveInstanceWeight = 1.2f,
};

// Create data training pipeline.
var pipeline = mlContext.BinaryClassification.Trainers.StochasticDualCoordinateAscent(options);

// Fit this pipeline to the training data.
var model = pipeline.Fit(trainTestData.TrainSet);

// Evaluate how the model is doing on the test data.
var dataWithPredictions = model.Transform(trainTestData.TestSet);
var metrics = mlContext.BinaryClassification.Evaluate(dataWithPredictions);
SamplesUtils.ConsoleUtils.PrintMetrics(metrics);

// Expected output:
// Accuracy: 0.85
// AUC: 0.90
// F1 Score: 0.66
// Negative Precision: 0.89
// Negative Recall: 0.92
// Positive Precision: 0.70
// Positive Recall: 0.63
// LogLoss: 0.47
// LogLossReduction: 39.77
// Entropy: 0.78
}
}
}
Original file line number Diff line number Diff line change
@@ -5,7 +5,7 @@

namespace Microsoft.ML.Samples.Dynamic.Trainers.MulticlassClassification
{
class LightGbm
public static class LightGbm
{
// This example requires installation of additional nuget package <a href="https://www.nuget.org/packages/Microsoft.ML.LightGBM/">Microsoft.ML.LightGBM</a>.
public static void Example()
@@ -14,10 +14,10 @@ public static void Example()
// as a catalog of available operations and as the source of randomness.
var mlContext = new MLContext();

// Create in-memory examples as C# native class.
// Create a list of data examples.
var examples = DatasetUtils.GenerateRandomMulticlassClassificationExamples(1000);

// Convert native C# class to IDataView, a consumble format to ML.NET functions.
// Convert the examples list to an IDataView object, which is consumable by ML.NET API.
var dataView = mlContext.Data.LoadFromEnumerable(examples);

//////////////////// Data Preview ////////////////////
Original file line number Diff line number Diff line change
@@ -7,7 +7,7 @@

namespace Microsoft.ML.Samples.Dynamic.Trainers.MulticlassClassification
{
class LightGbmWithOptions
public static class LightGbmWithOptions
{
// This example requires installation of additional nuget package <a href="https://www.nuget.org/packages/Microsoft.ML.LightGBM/">Microsoft.ML.LightGBM</a>.
public static void Example()
@@ -16,10 +16,10 @@ public static void Example()
// as a catalog of available operations and as the source of randomness.
var mlContext = new MLContext(seed: 0);

// Create in-memory examples as C# native class.
// Create a list of data examples.
var examples = DatasetUtils.GenerateRandomMulticlassClassificationExamples(1000);

// Convert native C# class to IDataView, a consumble format to ML.NET functions.
// Convert the examples list to an IDataView object, which is consumable by ML.NET API.
var dataView = mlContext.Data.LoadFromEnumerable(examples);

//////////////////// Data Preview ////////////////////
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
using Microsoft.ML.Data;
using Microsoft.ML.SamplesUtils;

namespace Microsoft.ML.Samples.Dynamic.Trainers.MulticlassClassification
{
public static class StochasticDualCoordinateAscent
{
public static void Example()
{
// Create a new context for ML.NET operations. It can be used for exception tracking and logging,
// as a catalog of available operations and as the source of randomness.
// Setting the seed to a fixed number in this example to make outputs deterministic.
var mlContext = new MLContext(seed: 0);

// Create a list of data examples.
var examples = DatasetUtils.GenerateRandomMulticlassClassificationExamples(1000);
Copy link
Member

@wschin wschin Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, GenerateRandomMulticlassClassificationExamples is not searchable on the doc site, so the only way to fully learn this pipeline is to clone ML.NET. Because SDCA can work with very tiny data set, we could add things like this

        private class DataPoint
        {
            [VectorType(3)]
            public float[] Features;
        }
        var samples = new List<DataPoint>() 
        { 
             new DataPoint(){ Features= new float[3] {1, 0, 0} }, 
             new DataPoint(){ Features= new float[3] {0, 2, 1} }, 
             new DataPoint(){ Features= new float[3] {1, 2, 3} }, 
             new DataPoint(){ Features= new float[3] {0, 1, 0} }, 
             new DataPoint(){ Features= new float[3] {0, 2, 1} },
             new DataPoint(){ Features= new float[3] {-100, 50, -100} } 
         };

into this file and use them. #Resolved

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the type is visible if they use the example, and they can inspect the values with the debugger, BUT moving Featurization into the Samples Utils is a real problem..


In reply to: 261013871 [](ancestors = 261013871)

Copy link
Member

@wschin wschin Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we can't expect user will have visual studio, on for example, Linux. I'd say the best case is that user knows everything they need after reading this example. Please take a look at an scikit-learn example.

X = [[0], [1], [2], [3]]
Y = [0, 1, 2, 3]
clf = svm.SVC(gamma='scale', decision_function_shape='ovo')
clf.fit(X, Y)

Does scikit-learn ask user to go outside the text above to understand that example? In addition, those functions are not searchable on ML.NET doc site, which means a big hole to new users. Honestly, I am not sure if SamplesUtils should be used because it hides some vital information and therefore pushes our examples away from those scikit-learn ones (in terms of readibility). #Pending

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my latest comment on #2627. Removing SamplesUtils is a design decision that needs to be made first.

Please see also see my response in another comment where I have the doc links. All of DatasetUtils are searchable:
https://docs.microsoft.com/en-us/dotnet/api/?view=ml-dotnet&term=Microsoft.ML.SamplesUtils.DatasetUtils


In reply to: 261068842 [](ancestors = 261068842)

Copy link
Member

@wschin wschin Mar 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. Even if it's searchable, it still has no meaningful document at this page. #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our documentation coverage is low but we're actively working on it, hence this PR. So that page will become meaningful eventually.


In reply to: 261718177 [](ancestors = 261718177)

Copy link
Member

@wschin wschin Mar 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not put things separated if they are considered a whole example. The organization of the entire documentation will never be organized and learned in a structured way--- this is how computer science world works. Let me give you another example. How would user learn the definition of a vector column by adding VectorType attribute? Assume that he already finds the doc of GenerateRandomMulticlass. He still need to click on the returned type of GenerateRandomMulticlass, which is List<DatasetUtils.MulticlassClassificationExample>. Then, another page will be opened. Where is the vector attribute of my Features? User needs to click on Fields again to open the 3rd page which contains

[Microsoft.ML.Data.VectorType(new System.Int32[] { 10 })]
public float[] Features;

Hiding things in this hierarchical way is definitely a learning barrier. #WontFix

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As pointed by Shaauheen, let's keep the samples as is for V1. Post V1, we can address this by proper discussions that were canceled in favor of API work. For now, some sample for V1, is better than no sample.


In reply to: 261763964 [](ancestors = 261763964)


// Convert the examples list to an IDataView object, which is consumable by ML.NET API.
var dataView = mlContext.Data.LoadFromEnumerable(examples);

//////////////////// Data Preview ////////////////////
// Label Features
// AA 0.7262433,0.8173254,0.7680227,0.5581612,0.2060332,0.5588848,0.9060271,0.4421779,0.9775497,0.2737045
// BB 0.4919063,0.6673147,0.8326591,0.6695119,1.182151,0.230367,1.06237,1.195347,0.8771811,0.5145918
// CC 1.216908,1.248052,1.391902,0.4326252,1.099942,0.9262842,1.334019,1.08762,0.9468155,0.4811099
// DD 0.7871246,1.053327,0.8971719,1.588544,1.242697,1.362964,0.6303943,0.9810045,0.9431419,1.557455

// Create a pipeline.
var pipeline =
// Convert the string labels into key types.
mlContext.Transforms.Conversion.MapValueToKey("Label")
// Apply StochasticDualCoordinateAscent multiclass trainer.
.Append(mlContext.MulticlassClassification.Trainers.StochasticDualCoordinateAscent());

// Split the data into training and test sets. Only training set is used in fitting
// the created pipeline. Metrics are computed on the test.
var split = mlContext.MulticlassClassification.TrainTestSplit(dataView, testFraction: 0.1);

// Train the model.
var model = pipeline.Fit(split.TrainSet);

// Do prediction on the test set.
var dataWithPredictions = model.Transform(split.TestSet);

// Evaluate the trained model using the test set.
var metrics = mlContext.MulticlassClassification.Evaluate(dataWithPredictions);
SamplesUtils.ConsoleUtils.PrintMetrics(metrics);

// Expected output:
// Micro Accuracy: 0.82
// Macro Accuracy: 0.81
// Log Loss: 0.43
// Log Loss Reduction: 67.93
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
using Microsoft.ML.Data;
using Microsoft.ML.SamplesUtils;
using Microsoft.ML.Trainers;

namespace Microsoft.ML.Samples.Dynamic.Trainers.MulticlassClassification
{
public static class StochasticDualCoordinateAscentWithOptions
{
public static void Example()
{
// Create a new context for ML.NET operations. It can be used for exception tracking and logging,
// as a catalog of available operations and as the source of randomness.
// Setting the seed to a fixed number in this example to make outputs deterministic.
var mlContext = new MLContext(seed: 0);

// Create a list of data examples.
var examples = DatasetUtils.GenerateRandomMulticlassClassificationExamples(1000);

// Convert the examples list to an IDataView object, which is consumable by ML.NET API.
var dataView = mlContext.Data.LoadFromEnumerable(examples);

//////////////////// Data Preview ////////////////////
// Label Features
// AA 0.7262433,0.8173254,0.7680227,0.5581612,0.2060332,0.5588848,0.9060271,0.4421779,0.9775497,0.2737045
// BB 0.4919063,0.6673147,0.8326591,0.6695119,1.182151,0.230367,1.06237,1.195347,0.8771811,0.5145918
// CC 1.216908,1.248052,1.391902,0.4326252,1.099942,0.9262842,1.334019,1.08762,0.9468155,0.4811099
// DD 0.7871246,1.053327,0.8971719,1.588544,1.242697,1.362964,0.6303943,0.9810045,0.9431419,1.557455

var options = new SdcaMultiClassTrainer.Options
{
// Add custom loss
LossFunction = new HingeLoss.Options(),
// Make the convergence tolerance tighter.
ConvergenceTolerance = 0.05f,
// Increase the maximum number of passes over training data.
MaxIterations = 30,
};

// Create a pipeline.
var pipeline =
// Convert the string labels into key types.
mlContext.Transforms.Conversion.MapValueToKey("Label")
// Apply StochasticDualCoordinateAscent multiclass trainer.
.Append(mlContext.MulticlassClassification.Trainers.StochasticDualCoordinateAscent(options));

// Split the data into training and test sets. Only training set is used in fitting
// the created pipeline. Metrics are computed on the test.
var split = mlContext.MulticlassClassification.TrainTestSplit(dataView, testFraction: 0.1);

// Train the model.
var model = pipeline.Fit(split.TrainSet);

// Do prediction on the test set.
var dataWithPredictions = model.Transform(split.TestSet);

// Evaluate the trained model using the test set.
var metrics = mlContext.MulticlassClassification.Evaluate(dataWithPredictions);
SamplesUtils.ConsoleUtils.PrintMetrics(metrics);

// Expected output:
// Micro Accuracy: 0.82
// Macro Accuracy: 0.81
// Log Loss: 0.64
// Log Loss Reduction: 52.51
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
using System;
using System.Linq;
using Microsoft.ML.Data;

namespace Microsoft.ML.Samples.Dynamic.Trainers.Regression
{
public static class StochasticDualCoordinateAscent
{
public static void Example()
{
// Create a new context for ML.NET operations. It can be used for exception tracking and logging,
// as a catalog of available operations and as the source of randomness.
// Setting the seed to a fixed number in this example to make outputs deterministic.
var mlContext = new MLContext(seed: 0);

// Create in-memory examples as C# native class and convert to IDataView
var data = SamplesUtils.DatasetUtils.GenerateFloatLabelFloatFeatureVectorSamples(1000);
var dataView = mlContext.Data.LoadFromEnumerable(data);

// Split the data into training and test sets. Only training set is used in fitting
// the created pipeline. Metrics are computed on the test.
var split = mlContext.MulticlassClassification.TrainTestSplit(dataView, testFraction: 0.1);

// Train the model.
var pipeline = mlContext.Regression.Trainers.StochasticDualCoordinateAscent();
var model = pipeline.Fit(split.TrainSet);

// Do prediction on the test set.
var dataWithPredictions = model.Transform(split.TestSet);

// Evaluate the trained model using the test set.
var metrics = mlContext.Regression.Evaluate(dataWithPredictions);
SamplesUtils.ConsoleUtils.PrintMetrics(metrics);

// Expected output:
// L1: 0.27
// L2: 0.11
// LossFunction: 0.11
// RMS: 0.33
// RSquared: 0.56
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
using Microsoft.ML.Data;
using Microsoft.ML.Trainers;

namespace Microsoft.ML.Samples.Dynamic.Trainers.Regression
{
public static class StochasticDualCoordinateAscentWithOptions
{
public static void Example()
{
// Create a new context for ML.NET operations. It can be used for exception tracking and logging,
// as a catalog of available operations and as the source of randomness.
// Setting the seed to a fixed number in this example to make outputs deterministic.
var mlContext = new MLContext(seed: 0);

// Create in-memory examples as C# native class and convert to IDataView
var data = SamplesUtils.DatasetUtils.GenerateFloatLabelFloatFeatureVectorSamples(1000);
var dataView = mlContext.Data.LoadFromEnumerable(data);

// Split the data into training and test sets. Only training set is used in fitting
// the created pipeline. Metrics are computed on the test.
var split = mlContext.MulticlassClassification.TrainTestSplit(dataView, testFraction: 0.1);

// Create trainer options.
var options = new SdcaRegressionTrainer.Options
{
// Make the convergence tolerance tighter.
ConvergenceTolerance = 0.02f,
// Increase the maximum number of passes over training data.
MaxIterations = 30,
// Increase learning rate for bias
BiasLearningRate = 0.1f
};

// Train the model.
var pipeline = mlContext.Regression.Trainers.StochasticDualCoordinateAscent(options);
var model = pipeline.Fit(split.TrainSet);

// Do prediction on the test set.
var dataWithPredictions = model.Transform(split.TestSet);

// Evaluate the trained model using the test set.
var metrics = mlContext.Regression.Evaluate(dataWithPredictions);
SamplesUtils.ConsoleUtils.PrintMetrics(metrics);

// Expected output:
// L1: 0.26
// L2: 0.11
// LossFunction: 0.11
// RMS: 0.33
// RSquared: 0.56
}
}
}
23 changes: 17 additions & 6 deletions src/Microsoft.ML.SamplesUtils/ConsoleUtils.cs
Original file line number Diff line number Diff line change
@@ -31,21 +31,32 @@ public static void PrintMetrics(BinaryClassificationMetrics metrics)
public static void PrintMetrics(CalibratedBinaryClassificationMetrics metrics)
{
PrintMetrics(metrics as BinaryClassificationMetrics);
Console.WriteLine($"LogLoss: {metrics.LogLoss:F2}");
Console.WriteLine($"LogLossReduction: {metrics.LogLossReduction:F2}");
Console.WriteLine($"Log Loss: {metrics.LogLoss:F2}");
Console.WriteLine($"Log Loss Reduction: {metrics.LogLossReduction:F2}");
Console.WriteLine($"Entropy: {metrics.Entropy:F2}");
}

/// <summary>
/// Pretty-print MultiClassClassifierMetrics objects.
/// </summary>
/// <param name="metrics"><see cref="MultiClassClassifierMetrics"/> object.</param>
public static void PrintMetrics(MultiClassClassifierMetrics metrics)
{
Console.WriteLine($"Micro Accuracy: {metrics.MicroAccuracy:F2}");
Console.WriteLine($"Macro Accuracy: {metrics.MacroAccuracy:F2}");
Console.WriteLine($"Log Loss: {metrics.LogLoss:F2}");
Console.WriteLine($"Log Loss Reduction: {metrics.LogLossReduction:F2}");
}

/// <summary>
/// Pretty-print RegressionMetrics objects.
/// </summary>
/// <param name="metrics">Regression metrics.</param>
public static void PrintMetrics(RegressionMetrics metrics)
{
Console.WriteLine($"L1: {metrics.MeanAbsoluteError:F2}");
Console.WriteLine($"L2: {metrics.MeanSquaredError:F2}");
Console.WriteLine($"LossFunction: {metrics.LossFunction:F2}");
Console.WriteLine($"RMS: {metrics.RootMeanSquaredError:F2}");
Console.WriteLine($"Mean Absolute Error: {metrics.MeanAbsoluteError:F2}");
Console.WriteLine($"Mean Square dError: {metrics.MeanSquaredError:F2}");
Console.WriteLine($"Root Mean Squared Error: {metrics.RootMeanSquaredError:F2}");
Console.WriteLine($"RSquared: {metrics.RSquared:F2}");
}

2 changes: 1 addition & 1 deletion src/Microsoft.ML.SamplesUtils/SamplesDatasetUtils.cs
Original file line number Diff line number Diff line change
@@ -677,7 +677,7 @@ public MulticlassClassificationExample()
}

/// <summary>
/// Helper function used to generate random <see cref="GenerateRandomMulticlassClassificationExamples"/>s.
/// Helper function used to generate random <see cref="MulticlassClassificationExample"/> objects.
/// </summary>
/// <param name="count">Number of generated examples.</param>
/// <returns>A list of random examples.</returns>
69 changes: 67 additions & 2 deletions src/Microsoft.ML.StandardLearners/Standard/SdcaBinary.cs
Original file line number Diff line number Diff line change
@@ -157,40 +157,80 @@ public abstract class SdcaTrainerBase<TOptions, TTransformer, TModel> : Stochast
// 3. Don't "guess" the iteration to converge. It is very data-set dependent and hard to control. Always check for at least once to ensure convergence.
// 4. Use dual variable updates to infer whether a full iteration of convergence checking is necessary. Convergence checking iteration is time-consuming.

/// <summary>
/// Options for the SDCA-based trainers.
/// </summary>
public abstract class OptionsBase : TrainerInputBaseWithLabel
{
/// <summary>
/// The L2 <a href='tmpurl_regularization'>regularization</a> hyperparameter.
/// </summary>
[Argument(ArgumentType.AtMostOnce, HelpText = "L2 regularizer constant. By default the l2 constant is automatically inferred based on data set.", NullName = "<Auto>", ShortName = "l2", SortOrder = 1)]
[TGUI(Label = "L2 Regularizer Constant", SuggestedSweeps = "<Auto>,1e-7,1e-6,1e-5,1e-4,1e-3,1e-2")]
[TlcModule.SweepableDiscreteParam("L2Const", new object[] { "<Auto>", 1e-7f, 1e-6f, 1e-5f, 1e-4f, 1e-3f, 1e-2f })]
public float? L2Const;

// REVIEW: make the default positive when we know how to consume a sparse model
/// <summary>
/// The L1 <a href='tmpurl_regularization'>regularization</a> hyperparameter.
/// </summary>
[Argument(ArgumentType.AtMostOnce, HelpText = "L1 soft threshold (L1/L2). Note that it is easier to control and sweep using the threshold parameter than the raw L1-regularizer constant. By default the l1 threshold is automatically inferred based on data set.", NullName = "<Auto>", ShortName = "l1", SortOrder = 2)]
[TGUI(Label = "L1 Soft Threshold", SuggestedSweeps = "<Auto>,0,0.25,0.5,0.75,1")]
[TlcModule.SweepableDiscreteParam("L1Threshold", new object[] { "<Auto>", 0f, 0.25f, 0.5f, 0.75f, 1f })]
public float? L1Threshold;

/// <summary>
/// The degree of lock-free parallelism.
/// </summary>
/// <value>
/// Defaults to automatic depending on data sparseness. Determinism is not guaranteed.
/// </value>
[Argument(ArgumentType.AtMostOnce, HelpText = "Degree of lock-free parallelism. Defaults to automatic. Determinism not guaranteed.", NullName = "<Auto>", ShortName = "nt,t,threads", SortOrder = 50)]
[TGUI(Label = "Number of threads", SuggestedSweeps = "<Auto>,1,2,4")]
public int? NumThreads;

/// <summary>
/// The tolerance for the ratio between duality gap and primal loss for convergence checking.
/// </summary>
[Argument(ArgumentType.AtMostOnce, HelpText = "The tolerance for the ratio between duality gap and primal loss for convergence checking.", ShortName = "tol")]
[TGUI(SuggestedSweeps = "0.001, 0.01, 0.1, 0.2")]
[TlcModule.SweepableDiscreteParam("ConvergenceTolerance", new object[] { 0.001f, 0.01f, 0.1f, 0.2f })]
public float ConvergenceTolerance = 0.1f;

/// <summary>
/// The maximum number of passes to perform over the data.
/// </summary>
/// <value>
/// Set to 1 to simulate online learning. Defaults to automatic.
/// </value>
[Argument(ArgumentType.AtMostOnce, HelpText = "Maximum number of iterations; set to 1 to simulate online learning. Defaults to automatic.", NullName = "<Auto>", ShortName = "iter")]
[TGUI(Label = "Max number of iterations", SuggestedSweeps = "<Auto>,10,20,100")]
[TlcModule.SweepableDiscreteParam("MaxIterations", new object[] { "<Auto>", 10, 20, 100 })]
public int? MaxIterations;

/// <summary>
/// Determines whether to shuffle data for each training iteration.
/// </summary>
/// <value>
/// <see langword="true" /> to shuffle data for each training iteration; otherwise, <see langword="false" />.
/// Default is <see langword="true" />.
/// </value>
[Argument(ArgumentType.AtMostOnce, HelpText = "Shuffle data every epoch?", ShortName = "shuf")]
[TlcModule.SweepableDiscreteParam("Shuffle", null, isBool: true)]
public bool Shuffle = true;

/// <summary>
/// Determines the frequency of checking for convergence in terms of number of iterations.
/// </summary>
/// <value>
/// Set to zero or negative value to disable checking. If <see langword="null"/>, it defaults to <see cref="NumThreads"/>."
/// </value>
[Argument(ArgumentType.AtMostOnce, HelpText = "Convergence check frequency (in terms of number of iterations). Set as negative or zero for not checking at all. If left blank, it defaults to check after every 'numThreads' iterations.", NullName = "<Auto>", ShortName = "checkFreq")]
public int? CheckFrequency;

/// <summary>
/// The learning rate for adjusting bias from being regularized.
/// </summary>
[Argument(ArgumentType.AtMostOnce, HelpText = "The learning rate for adjusting bias from being regularized.", ShortName = "blr")]
[TGUI(SuggestedSweeps = "0, 0.01, 0.1, 1")]
[TlcModule.SweepableDiscreteParam("BiasLearningRate", new object[] { 0.0f, 0.01f, 0.1f, 1f })]
@@ -1419,8 +1459,17 @@ public abstract class SdcaBinaryTrainerBase<TModelParameters> :

public override TrainerInfo Info { get; }

/// <summary>
/// Options base class for binary SDCA trainers.
/// </summary>
public class BinaryOptionsBase : OptionsBase
{
/// <summary>
/// The weight to be applied to the positive class. This is useful for training with imbalanced data.
/// </summary>
/// <value>
/// Default value is 1, which means no extra weight.
/// </value>
[Argument(ArgumentType.AtMostOnce, HelpText = "Apply weight to the positive class, for imbalanced data", ShortName = "piw")]
public float PositiveInstanceWeight = 1;

@@ -1517,11 +1566,17 @@ private protected override BinaryPredictionTransformer<TModelParameters> MakeTra
=> new BinaryPredictionTransformer<TModelParameters>(Host, model, trainSchema, FeatureColumn.Name);
}

/// <summary>
/// The <see cref="IEstimator{TTransformer}"/> for training a binary logistic regression classification model using the stochastic dual coordinate ascent method.
/// The trained model is <a href='tmpurl_calib'>calibrated</a> and can produce probability by feeding the output value of the
/// linear function to a <see cref="PlattCalibrator"/>.
/// </summary>
/// <include file='doc.xml' path='doc/members/member[@name="SDCA_remarks"]/*' />
public sealed class SdcaBinaryTrainer :
SdcaBinaryTrainerBase<CalibratedModelParametersBase<LinearBinaryModelParameters, PlattCalibrator>>
{
/// <summary>
/// Configuration to training logistic regression using SDCA.
/// Options for the <see cref="SdcaBinaryTrainer"/>.
/// </summary>
public sealed class Options : BinaryOptionsBase
{
@@ -1577,13 +1632,23 @@ private protected override SchemaShape.Column[] ComputeSdcaBinaryClassifierSchem
}
}

/// <summary>
/// The <see cref="IEstimator{TTransformer}"/> for training a binary logistic regression classification model using the stochastic dual coordinate ascent method.
/// </summary>
/// <include file='doc.xml' path='doc/members/member[@name="SDCA_remarks"]/*' />
public sealed class SdcaNonCalibratedBinaryTrainer : SdcaBinaryTrainerBase<LinearBinaryModelParameters>
{
/// <summary>
/// General Configuration to training linear model using SDCA.
/// Options for the <see cref="SdcaNonCalibratedBinaryTrainer"/>.
/// </summary>
public sealed class Options : BinaryOptionsBase
{
/// <summary>
/// The custom <a href="tmpurl_loss">loss</a>.
/// </summary>
/// <value>
/// If unspecified, <see cref="LogLoss"/> will be used.
/// </value>
[Argument(ArgumentType.Multiple, HelpText = "Loss Function", ShortName = "loss", SortOrder = 50)]
public ISupportSdcaClassificationLossFactory LossFunction = new LogLossFactory();
}
17 changes: 14 additions & 3 deletions src/Microsoft.ML.StandardLearners/Standard/SdcaMultiClass.cs
Original file line number Diff line number Diff line change
@@ -24,17 +24,28 @@

namespace Microsoft.ML.Trainers
{
// SDCA linear multiclass trainer.
/// <include file='doc.xml' path='doc/members/member[@name="SDCA"]/*' />
/// <summary>
/// The <see cref="IEstimator{TTransformer}"/> for training a multiclass logistic regression classification model using the stochastic dual coordinate ascent method.
/// </summary>
/// <include file='doc.xml' path='doc/members/member[@name="SDCA_remarks"]/*' />
public class SdcaMultiClassTrainer : SdcaTrainerBase<SdcaMultiClassTrainer.Options, MulticlassPredictionTransformer<MulticlassLogisticRegressionModelParameters>, MulticlassLogisticRegressionModelParameters>
{
internal const string LoadNameValue = "SDCAMC";
internal const string UserNameValue = "Fast Linear Multi-class Classification (SA-SDCA)";
internal const string ShortName = "sasdcamc";
internal const string Summary = "The SDCA linear multi-class classification trainer.";

/// <summary>
/// Options for the <see cref="SdcaMultiClassTrainer"/>.
/// </summary>
public sealed class Options : OptionsBase
{
/// <summary>
/// The custom <a href="tmpurl_loss">loss</a>.
/// </summary>
/// <value>
/// If unspecified, <see cref="LogLoss"/> will be used.
/// </value>
[Argument(ArgumentType.Multiple, HelpText = "Loss Function", ShortName = "loss", SortOrder = 50)]
public ISupportSdcaClassificationLossFactory LossFunction = new LogLossFactory();
}
@@ -200,7 +211,7 @@ private protected override void TrainWithoutLock(IProgressChannelProvider progre
var output = labelOutput + labelPrimalUpdate * normSquared - WDot(in features, in weights[iClass], biasReg[iClass] + biasUnreg[iClass]);
var dualUpdate = _loss.DualUpdate(output, 1, dual, invariant, numThreads);

// The successive over-relaxation apporach to adjust the sum of dual variables (biasReg) to zero.
// The successive over-relaxation approach to adjust the sum of dual variables (biasReg) to zero.
// Reference to details: http://stat.rutgers.edu/home/tzhang/papers/ml02_dual.pdf, pp. 16-17.
var adjustment = l1ThresholdZero ? lr * biasReg[iClass] : lr * l1IntermediateBias[iClass];
dualUpdate -= adjustment;
17 changes: 16 additions & 1 deletion src/Microsoft.ML.StandardLearners/Standard/SdcaRegression.cs
Original file line number Diff line number Diff line change
@@ -21,19 +21,34 @@

namespace Microsoft.ML.Trainers
{
/// <include file='doc.xml' path='doc/members/member[@name="SDCA"]/*' />
/// <summary>
/// The <see cref="IEstimator{TTransformer}"/> for training a regression model using the stochastic dual coordinate ascent method.
/// </summary>
/// <include file='doc.xml' path='doc/members/member[@name="SDCA_remarks"]/*' />
public sealed class SdcaRegressionTrainer : SdcaTrainerBase<SdcaRegressionTrainer.Options, RegressionPredictionTransformer<LinearRegressionModelParameters>, LinearRegressionModelParameters>
{
internal const string LoadNameValue = "SDCAR";
internal const string UserNameValue = "Fast Linear Regression (SA-SDCA)";
internal const string ShortName = "sasdcar";
internal const string Summary = "The SDCA linear regression trainer.";

/// <summary>
/// Options for the <see cref="SdcaRegressionTrainer"/>.
/// </summary>
public sealed class Options : OptionsBase
{
/// <summary>
/// A custom <a href="tmpurl_loss">loss</a>.
/// </summary>
/// <value>
/// Defaults to <see cref="SquaredLoss"/>
/// </value>
[Argument(ArgumentType.Multiple, HelpText = "Loss Function", ShortName = "loss", SortOrder = 50)]
public ISupportSdcaRegressionLossFactory LossFunction = new SquaredLossFactory();

/// <summary>
/// Create the <see cref="Options"/> object.
/// </summary>
public Options()
{
// Using a higher default tolerance for better RMS.
48 changes: 6 additions & 42 deletions src/Microsoft.ML.StandardLearners/Standard/doc.xml
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
<?xml version="1.0" encoding="utf-8"?>
<doc>
<members>

<member name="SDCA">
<summary>
Train an SDCA linear model.
</summary>
<!--
The following text describes the SDCA algorithm details.
It's used for the remarks section of all SDCA-based trainers (binary, multiclass, regression)
-->
<member name="SDCA_remarks">
Copy link
Member

@sfilipi sfilipi Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[](start = 3, length = 29)

so shall we keep the docs.xml for reuse, than? #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can reduce the usage to 1 by using cref links, we don't need doc.xml. We should use inline documentation.

I this case we have 4 SDCA trainers that all need to explain what SDCA is and we cannot cref them to each other. So I'm keeping doc.xml for it.


In reply to: 261051276 [](ancestors = 261051276)

<remarks>
This classifier is a trainer based on the Stochastic Dual Coordinate Ascent(SDCA) method, a state-of-the-art optimization technique for convex objective functions.
This trainer is based on the Stochastic Dual Coordinate Ascent (SDCA) method, a state-of-the-art optimization technique for convex objective functions.
The algorithm can be scaled for use on large out-of-memory data sets due to a semi-asynchronized implementation that supports multi-threading.
<para>
Convergence is underwritten by periodically enforcing synchronization between primal and dual updates in a separate thread.
@@ -32,42 +32,6 @@
</list>
</remarks>
</member>
<example name="StochasticDualCoordinateAscentBinaryClassifier">
<example>
<code language="csharp">
new StochasticDualCoordinateAscentBinaryClassifier
{
MaxIterations = 100,
NumThreads = 7,
LossFunction = new SmoothedHingeLossSDCAClassificationLossFunction(),
Caching = Microsoft.ML.Models.CachingOptions.Memory
}
</code>
</example>
</example>
<example name="StochasticDualCoordinateAscentClassifier">
<example>
<code language="csharp">
new StochasticDualCoordinateAscentClassifier
{
MaxIterations = 100,
NumThreads = 7,
LossFunction = new SmoothedHingeLossSDCAClassificationLossFunction()
}
</code>
</example>
</example>
<example name="StochasticDualCoordinateAscentRegressor">
<example>
<code language="csharp">
new StochasticDualCoordinateAscentRegressor
{
MaxIterations = 100,
NumThreads = 5
}
</code>
</example>
</example>

</members>
</doc>
92 changes: 58 additions & 34 deletions src/Microsoft.ML.StandardLearners/StandardLearnersCatalog.cs
Original file line number Diff line number Diff line change
@@ -110,17 +110,22 @@ public static SgdNonCalibratedBinaryTrainer StochasticGradientDescentNonCalibrat
}

/// <summary>
/// Predict a target using a linear regression model trained with the SDCA trainer.
/// Predict a target using a linear regression model trained with <see cref="SdcaRegressionTrainer"/>.
/// </summary>
/// <param name="catalog">The regression catalog trainer object.</param>
/// <param name="labelColumnName">The name of the label column.</param>
/// <param name="featureColumnName">The name of the feature column.</param>
/// <param name="exampleWeightColumnName">The name of the example weight column (optional).</param>
/// <param name="l2Const">The L2 regularization hyperparameter.</param>
/// <param name="l1Threshold">The L1 regularization hyperparameter. Higher values will tend to lead to more sparse model.</param>
/// <param name="l2Const">The L2 <a href='tmpurl_regularization'>regularization</a> hyperparameter.</param>
/// <param name="l1Threshold">The L1 <a href='tmpurl_regularization'>regularization</a> hyperparameter. Higher values will tend to lead to more sparse model.</param>
/// <param name="maxIterations">The maximum number of passes to perform over the data.</param>
/// <param name="loss">The custom loss, if unspecified will be <see cref="SquaredLoss"/>.</param>

/// <param name="loss">The custom <a href="tmpurl_loss">loss</a>, if unspecified will be <see cref="SquaredLoss"/>.</param>
/// <example>
/// <format type="text/markdown">
/// <![CDATA[
/// [!code-csharp[SDCA](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/Regression/StochasticDualCoordinateAscent.cs)]
/// ]]></format>
/// </example>
public static SdcaRegressionTrainer StochasticDualCoordinateAscent(this RegressionCatalog.RegressionTrainers catalog,
string labelColumnName = DefaultColumnNames.Label,
string featureColumnName = DefaultColumnNames.Features,
@@ -136,12 +141,18 @@ public static SdcaRegressionTrainer StochasticDualCoordinateAscent(this Regressi
}

/// <summary>
/// Predict a target using a linear regression model trained with the SDCA trainer.
/// Predict a target using a linear regression model trained with <see cref="SdcaRegressionTrainer"/> and advanced options.
/// </summary>
/// <param name="catalog">The regression catalog trainer object.</param>
/// <param name="options">Advanced arguments to the algorithm.</param>
/// <param name="options">Trainer options.</param>
/// <example>
/// <format type="text/markdown">
/// <![CDATA[
/// [!code-csharp[SDCA](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/Regression/StochasticDualCoordinateAscentWithOptions.cs)]
/// ]]></format>
/// </example>
public static SdcaRegressionTrainer StochasticDualCoordinateAscent(this RegressionCatalog.RegressionTrainers catalog,
SdcaRegressionTrainer.Options options)
SdcaRegressionTrainer.Options options)
{
Contracts.CheckValue(catalog, nameof(catalog));
Contracts.CheckValue(options, nameof(options));
@@ -151,21 +162,19 @@ public static SdcaRegressionTrainer StochasticDualCoordinateAscent(this Regressi
}

/// <summary>
/// Predict a target using a logistic regression model trained with the SDCA trainer.
/// The trained model can produce probability by feeding the output value of the linear
/// function to a <see cref="PlattCalibrator"/>.
/// Predict a target using a linear classification model trained with <see cref="SdcaBinaryTrainer"/>.
/// </summary>
/// <param name="catalog">The binary classification catalog trainer object.</param>
/// <param name="labelColumnName">The name of the label column.</param>
/// <param name="featureColumnName">The name of the feature column.</param>
/// <param name="exampleWeightColumnName">The name of the example weight column (optional).</param>
/// <param name="l2Const">The L2 regularization hyperparameter.</param>
/// <param name="l1Threshold">The L1 regularization hyperparameter. Higher values will tend to lead to more sparse model.</param>
/// <param name="l2Const">The L2 <a href='tmpurl_regularization'>regularization</a> hyperparameter.</param>
/// <param name="l1Threshold">The L1 <a href='tmpurl_regularization'>regularization</a> hyperparameter. Higher values will tend to lead to more sparse model.</param>
/// <param name="maxIterations">The maximum number of passes to perform over the data.</param>
/// <example>
/// <format type="text/markdown">
/// <![CDATA[
/// [!code-csharp[SDCA](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/BinaryClassification/SDCALogisticRegression.cs)]
/// [!code-csharp[SDCA](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/BinaryClassification/StochasticDualCoordinateAscent.cs)]
/// ]]></format>
/// </example>
public static SdcaBinaryTrainer StochasticDualCoordinateAscent(
@@ -183,13 +192,16 @@ public static SdcaBinaryTrainer StochasticDualCoordinateAscent(
}

/// <summary>
/// Predict a target using a logistic regression model trained with the SDCA trainer.
/// The trained model can produce probability via feeding output value of the linear
/// function to a <see cref="PlattCalibrator"/>. Compared with <see cref="StochasticDualCoordinateAscent(BinaryClassificationCatalog.BinaryClassificationTrainers, string, string, string, float?, float?, int?)"/>,
/// this function allows more advanced settings by accepting <see cref="SdcaBinaryTrainer.Options"/>.
/// Predict a target using a linear classification model trained with <see cref="SdcaBinaryTrainer"/> and advanced options.
/// </summary>
/// <param name="catalog">The binary classification catalog trainer object.</param>
/// <param name="options">Advanced arguments to the algorithm.</param>
/// <param name="options">Trainer options.</param>
/// <example>
/// <format type="text/markdown">
/// <![CDATA[
/// [!code-csharp[SDCA](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/BinaryClassification/StochasticDualCoordinateAscentWithOptions.cs)]
/// ]]></format>
/// </example>
public static SdcaBinaryTrainer StochasticDualCoordinateAscent(
this BinaryClassificationCatalog.BinaryClassificationTrainers catalog,
SdcaBinaryTrainer.Options options)
@@ -202,20 +214,20 @@ public static SdcaBinaryTrainer StochasticDualCoordinateAscent(
}

/// <summary>
/// Predict a target using a linear binary classification model trained with the SDCA trainer.
/// Predict a target using a linear classification model trained with <see cref="SdcaNonCalibratedBinaryTrainer"/>.
/// </summary>
/// <param name="catalog">The binary classification catalog trainer object.</param>
/// <param name="labelColumnName">The name of the label column.</param>
/// <param name="featureColumnName">The name of the feature column.</param>
/// <param name="exampleWeightColumnName">The name of the example weight column (optional).</param>
/// <param name="loss">The custom loss. Defaults to log-loss if not specified.</param>
/// <param name="l2Const">The L2 regularization hyperparameter.</param>
/// <param name="l1Threshold">The L1 regularization hyperparameter. Higher values will tend to lead to more sparse model.</param>
/// <param name="loss">The custom <a href="tmpurl_loss">loss</a>. Defaults to <see cref="LogLoss"/> if not specified.</param>
/// <param name="l2Const">The L2 <a href='tmpurl_regularization'>regularization</a> hyperparameter.</param>
/// <param name="l1Threshold">The L1 <a href='tmpurl_regularization'>regularization</a> hyperparameter. Higher values will tend to lead to more sparse model.</param>
/// <param name="maxIterations">The maximum number of passes to perform over the data.</param>
/// <example>
/// <format type="text/markdown">
/// <![CDATA[
/// [!code-csharp[SDCA](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/BinaryClassification/SDCASupportVectorMachine.cs)]
/// [!code-csharp[SDCA](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/BinaryClassification/StochasticDualCoordinateAscentNonCalibrated.cs)]
/// ]]></format>
/// </example>
public static SdcaNonCalibratedBinaryTrainer StochasticDualCoordinateAscentNonCalibrated(
@@ -234,10 +246,10 @@ public static SdcaNonCalibratedBinaryTrainer StochasticDualCoordinateAscentNonCa
}

/// <summary>
/// Predict a target using a linear binary classification model trained with the SDCA trainer.
/// Predict a target using a linear classification model trained with <see cref="SdcaNonCalibratedBinaryTrainer"/> and advanced options.
/// </summary>
/// <param name="catalog">The binary classification catalog trainer object.</param>
/// <param name="options">Advanced arguments to the algorithm.</param>
/// <param name="options">Trainer options.</param>
public static SdcaNonCalibratedBinaryTrainer StochasticDualCoordinateAscentNonCalibrated(
this BinaryClassificationCatalog.BinaryClassificationTrainers catalog,
SdcaNonCalibratedBinaryTrainer.Options options)
@@ -250,18 +262,24 @@ public static SdcaNonCalibratedBinaryTrainer StochasticDualCoordinateAscentNonCa
}

/// <summary>
/// Predict a target using a linear multiclass classification model trained with the SDCA trainer.
/// Predict a target using a linear multiclass classification model trained with <see cref="SdcaMultiClassTrainer"/>.
/// </summary>
/// <param name="catalog">The multiclass classification catalog trainer object.</param>
/// <param name="labelColumnName">The name of the label column.</param>
/// <param name="featureColumnName">The name of the feature column.</param>
/// <param name="exampleWeightColumnName">The name of the example weight column (optional).</param>
/// <param name="loss">The optional custom loss.</param>
/// <param name="l2Const">The L2 regularization hyperparameter.</param>
/// <param name="l1Threshold">The L1 regularization hyperparameter. Higher values will tend to lead to more sparse model.</param>
/// <param name="loss">The custom <a href="tmpurl_loss">loss</a>. Defaults to <see cref="LogLoss"/> if not specified.</param>
/// <param name="l2Const">The L2 <a href='tmpurl_regularization'>regularization</a> hyperparameter.</param>
/// <param name="l1Threshold">The L1 <a href='tmpurl_regularization'>regularization</a> hyperparameter. Higher values will tend to lead to more sparse model.</param>
/// <param name="maxIterations">The maximum number of passes to perform over the data.</param>
/// <example>
/// <format type="text/markdown">
/// <![CDATA[
/// [!code-csharp[SDCA](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/MulticlassClassification/StochasticDualCoordinateAscent.cs)]
/// ]]></format>
/// </example>
public static SdcaMultiClassTrainer StochasticDualCoordinateAscent(this MulticlassClassificationCatalog.MulticlassClassificationTrainers catalog,
string labelColumnName = DefaultColumnNames.Label,
string labelColumnName = DefaultColumnNames.Label,
string featureColumnName = DefaultColumnNames.Features,
string exampleWeightColumnName = null,
ISupportSdcaClassificationLoss loss = null,
@@ -275,12 +293,18 @@ public static SdcaMultiClassTrainer StochasticDualCoordinateAscent(this Multicla
}

/// <summary>
/// Predict a target using a linear multiclass classification model trained with the SDCA trainer.
/// Predict a target using a linear multiclass classification model trained with <see cref="SdcaMultiClassTrainer"/> and advanced options.
/// </summary>
/// <param name="catalog">The multiclass classification catalog trainer object.</param>
/// <param name="options">Advanced arguments to the algorithm.</param>
/// <param name="options">Trainer options.</param>
/// <example>
/// <format type="text/markdown">
/// <![CDATA[
/// [!code-csharp[SDCA](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/MulticlassClassification/StochasticDualCoordinateAscentWithOptions.cs)]
/// ]]></format>
/// </example>
public static SdcaMultiClassTrainer StochasticDualCoordinateAscent(this MulticlassClassificationCatalog.MulticlassClassificationTrainers catalog,
SdcaMultiClassTrainer.Options options)
SdcaMultiClassTrainer.Options options)
{
Contracts.CheckValue(catalog, nameof(catalog));
Contracts.CheckValue(options, nameof(options));