-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Added samples & docs for BinaryClassification.StochasticGradientDescent #2688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
c3aebc7
16d136e
10a01c1
ef4168d
f2a42eb
6909bfc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
using Microsoft.ML; | ||
|
||
namespace Microsoft.ML.Samples.Dynamic.Trainers.BinaryClassification | ||
{ | ||
public static class StochasticGradientDescent | ||
{ | ||
// In this examples we will use the adult income dataset. The goal is to predict | ||
// if a person's income is above $50K or not, based on demographic information about that person. | ||
// For more details about this dataset, please see https://archive.ics.uci.edu/ml/datasets/adult. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Broken link. Or the site is down right now.... #Resolved There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think they're just temporarily down. This is prominent link. It's the first google result for "uci adult". In reply to: 259450232 [](ancestors = 259450232) |
||
public static void Example() | ||
{ | ||
// Create a new context for ML.NET operations. It can be used for exception tracking and logging, | ||
// as a catalog of available operations and as the source of randomness. | ||
// Setting the seed to a fixed number in this example to make outputs deterministic. | ||
var mlContext = new MLContext(seed: 0); | ||
|
||
// Download and featurize the dataset. | ||
var data = SamplesUtils.DatasetUtils.LoadFeaturizedAdultDataset(mlContext); | ||
|
||
// Leave out 10% of data for testing. | ||
var trainTestData = mlContext.BinaryClassification.TrainTestSplit(data, testFraction: 0.1); | ||
|
||
// Create data training pipeline. | ||
var pipeline = mlContext.BinaryClassification.Trainers.StochasticGradientDescent(); | ||
|
||
// Fit this pipeline to the training data. | ||
var model = pipeline.Fit(trainTestData.TrainSet); | ||
|
||
// Evaluate how the model is doing on the test data. | ||
var dataWithPredictions = model.Transform(trainTestData.TestSet); | ||
var metrics = mlContext.BinaryClassification.Evaluate(dataWithPredictions); | ||
SamplesUtils.ConsoleUtils.PrintMetrics(metrics); | ||
|
||
// Expected output: | ||
// Accuracy: 0.85 | ||
// AUC: 0.90 | ||
// F1 Score: 0.67 | ||
// Negative Precision: 0.90 | ||
// Negative Recall: 0.91 | ||
// Positive Precision: 0.68 | ||
// Positive Recall: 0.65 | ||
// LogLoss: 0.48 | ||
// LogLossReduction: 38.31 | ||
// Entropy: 0.78 | ||
} | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
using Microsoft.ML; | ||
using Microsoft.ML.Trainers; | ||
|
||
namespace Microsoft.ML.Samples.Dynamic.Trainers.BinaryClassification | ||
{ | ||
public static class StochasticGradientDescentWithOptions | ||
{ | ||
// In this examples we will use the adult income dataset. The goal is to predict | ||
// if a person's income is above $50K or not, based on demographic information about that person. | ||
// For more details about this dataset, please see https://archive.ics.uci.edu/ml/datasets/adult. | ||
public static void Example() | ||
{ | ||
// Create a new context for ML.NET operations. It can be used for exception tracking and logging, | ||
// as a catalog of available operations and as the source of randomness. | ||
// Setting the seed to a fixed number in this example to make outputs deterministic. | ||
var mlContext = new MLContext(seed: 0); | ||
|
||
// Download and featurize the dataset. | ||
var data = SamplesUtils.DatasetUtils.LoadFeaturizedAdultDataset(mlContext); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please add examples using in-memory data structures if possible and show how to inspect those in-memory examples' predictions. If you search for "InMemory" in VS Text Explorer, you will find a few examples. #Resolved There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since these are in the API docs on the website, I actually would prefer to have the data loading part to be terse. Let's make other samples focused on the data loading aspects, and keep this spare. In reply to: 259176526 [](ancestors = 259176526) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with Rogan. That would be a scenario tutorial than API sample. In reply to: 259451392 [](ancestors = 259451392,259176526) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So this is API doc for documentation website? If yes, it makes my feeling even stronger --- ideally, to fit this example into user's own scenario, user should be able to just make minor changes. Having a text loader decreases the flexibility of this example and forces user to go outside Visual Studio because they need to prepare a text file and make sure that file can be loaded correctly. Thinking about scikit-learn, I don't find many of them using text file and it's super easy to start working on their modules. #Pending There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not following this comment. LoadFeaturizedAdultDataset downloads the dataset and loads it into memory. It's functionally similar to sklearn datasets module which is used in many of the sklearn examples.
And we have SamplesUtils.DatasetUtils.LoadFeaturizedAdultDataset(). Users are able to copy-paste these samples and run them as-is. Please IM me if you want to further discuss this. :) In reply to: 259558223 [](ancestors = 259558223) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll be using this same template for all the new API samples. If you think there's a better template, please let me know soon. In reply to: 259947957 [](ancestors = 259947957,259558223) |
||
|
||
// Leave out 10% of data for testing. | ||
var trainTestData = mlContext.BinaryClassification.TrainTestSplit(data, testFraction: 0.1); | ||
|
||
// Define the trainer options. | ||
var options = new SgdBinaryTrainer.Options() | ||
{ | ||
// Make the convergence tolerance tighter. | ||
ConvergenceTolerance = 5e-5, | ||
// Increase the maximum number of passes over training data. | ||
MaxIterations = 30, | ||
// Give the instances of the positive class slightly more weight. | ||
PositiveInstanceWeight = 1.2f, | ||
}; | ||
|
||
// Create data training pipeline. | ||
var pipeline = mlContext.BinaryClassification.Trainers.StochasticGradientDescent(options); | ||
|
||
// Fit this pipeline to the training data. | ||
var model = pipeline.Fit(trainTestData.TrainSet); | ||
|
||
// Evaluate how the model is doing on the test data. | ||
var dataWithPredictions = model.Transform(trainTestData.TestSet); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Print out metric is a bit far from practical uses in production where we create prediction per example and then make decision based those prediction values. #Resolved There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think purpose of this samples is to show up how you can call specific trainer. And set different in options. In reply to: 259176767 [](ancestors = 259176767) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ivan is correct. The API samples have a narrow scope to showcase how to use a single API. We have tutorials, and end-to-end samples repo that cover the practical cases, which involve using multiple APIs. In reply to: 259443273 [](ancestors = 259443273,259176767) |
||
var metrics = mlContext.BinaryClassification.Evaluate(dataWithPredictions); | ||
SamplesUtils.ConsoleUtils.PrintMetrics(metrics); | ||
|
||
// Expected output: | ||
// Accuracy: 0.85 | ||
// AUC: 0.90 | ||
// F1 Score: 0.67 | ||
// Negative Precision: 0.91 | ||
// Negative Recall: 0.89 | ||
// Positive Precision: 0.65 | ||
// Positive Recall: 0.70 | ||
// LogLoss: 0.48 | ||
// LogLossReduction: 37.52 | ||
// Entropy: 0.78 | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's much easier to call StochasticGradientDescent.Example in Program.cs if it's a truncated namespace. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the pattern I've been using is to have the class name be the same as the API name, in this case StochasticGradientDescent. In many cases we have the same trainer in multiple catalogs. So we either have to keep the namespaces distinct, or change the class names here to StochasticGradientDescentBinary or StochasticGradientDescentBinaryClassificaiton. I prefer to use the names spaces and mirror the catalog structure. Makes sense?
In reply to: 259442659 [](ancestors = 259442659)