Skip to content

Fixing names of trainer estimators #2903

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Mar 12, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions docs/code/MlNetCookBook.md
Original file line number Diff line number Diff line change
Expand Up @@ -244,7 +244,7 @@ We tried to make `Preview` debugger-friendly: our expectation is that, if you en
Here is the code sample:
```csharp
var estimator = mlContext.Transforms.Categorical.MapValueToKey("Label")
.Append(mlContext.MulticlassClassification.Trainers.StochasticDualCoordinateAscent())
.Append(mlContext.MulticlassClassification.Trainers.Sdca())
.Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel"));

var data = mlContext.Data.LoadFromTextFile(new TextLoader.Column[] {
Expand Down Expand Up @@ -355,7 +355,7 @@ var pipeline =
// once so adding a caching step before it is not helpful.
.AppendCacheCheckpoint(mlContext)
// Add the SDCA regression trainer.
.Append(mlContext.Regression.Trainers.StochasticDualCoordinateAscent(labelColumnName: "Target", featureColumnName: "FeatureVector"));
.Append(mlContext.Regression.Trainers.Sdca(labelColumnName: "Target", featureColumnName: "FeatureVector"));

// Step three. Fit the pipeline to the training data.
var model = pipeline.Fit(trainData);
Expand Down Expand Up @@ -423,7 +423,7 @@ var pipeline =
// Cache data in memory for steps after the cache check point stage.
.AppendCacheCheckpoint(mlContext)
// Use the multi-class SDCA model to predict the label using features.
.Append(mlContext.MulticlassClassification.Trainers.StochasticDualCoordinateAscent())
.Append(mlContext.MulticlassClassification.Trainers.Sdca())
// Apply the inverse conversion from 'PredictedLabel' column back to string value.
.Append(mlContext.Transforms.Conversion.MapKeyToValue(("PredictedLabel", "Data")));

Expand Down Expand Up @@ -547,7 +547,7 @@ var pipeline =
// Cache data in memory for steps after the cache check point stage.
.AppendCacheCheckpoint(mlContext)
// Use the multi-class SDCA model to predict the label using features.
.Append(mlContext.MulticlassClassification.Trainers.StochasticDualCoordinateAscent());
.Append(mlContext.MulticlassClassification.Trainers.Sdca());

// Train the model.
var trainedModel = pipeline.Fit(trainData);
Expand Down Expand Up @@ -822,7 +822,7 @@ var pipeline =
// Notice that unused part in the data may not be cached.
.AppendCacheCheckpoint(mlContext)
// Use the multi-class SDCA model to predict the label using features.
.Append(mlContext.MulticlassClassification.Trainers.StochasticDualCoordinateAscent());
.Append(mlContext.MulticlassClassification.Trainers.Sdca());

// Split the data 90:10 into train and test sets, train and evaluate.
var split = mlContext.Data.TrainTestSplit(data, testFraction: 0.1);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ public static void Example()
var transformPipeline = mlContext.Transforms.Concatenate("Features", "CrimesPerCapita", "PercentResidental",
"PercentNonRetail", "CharlesRiver", "NitricOxides", "RoomsPerDwelling", "PercentPre40s",
"EmploymentDistance", "HighwayDistance", "TaxRate", "TeacherRatio");
var learner = mlContext.Regression.Trainers.OrdinaryLeastSquares(
var learner = mlContext.Regression.Trainers.Ols(
labelColumnName: "MedianHomeValue", featureColumnName: "Features");

var transformedData = transformPipeline.Fit(data).Transform(data);
Expand All @@ -40,7 +40,7 @@ public static void Example()
// FeatureContributionCalculatingEstimator can be use as an intermediary step in a pipeline.
// The features retained by FeatureContributionCalculatingEstimator will be in the FeatureContribution column.
var pipeline = mlContext.Model.Explainability.FeatureContributionCalculation(model.Model, model.FeatureColumn, numPositiveContributions: 11)
.Append(mlContext.Regression.Trainers.OrdinaryLeastSquares(featureColumnName: "FeatureContributions"));
.Append(mlContext.Regression.Trainers.Ols(featureColumnName: "FeatureContributions"));
var outData = featureContributionCalculator.Fit(scoredData).Transform(scoredData);

// Let's extract the weights from the linear model to use as a comparison
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ public static void Example()
.Where(name => name != labelName) // Drop the Label
.ToArray();
var pipeline = mlContext.Transforms.Concatenate("Features", featureNames)
.Append(mlContext.Regression.Trainers.GeneralizedAdditiveModels(
.Append(mlContext.Regression.Trainers.Gam(
labelColumnName: labelName, featureColumnName: "Features", maximumBinCountPerFeature: 16));
var fitPipeline = pipeline.Fit(data);

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ public static void Example()
// Then append a linear regression trainer.
var pipeline = mlContext.Transforms.Concatenate("Features", featureNames)
.Append(mlContext.Transforms.Normalize("Features"))
.Append(mlContext.Regression.Trainers.OrdinaryLeastSquares(
.Append(mlContext.Regression.Trainers.Ols(
labelColumnName: labelName, featureColumnName: "Features"));
var model = pipeline.Fit(data);

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ public static void Example()
var data = mlContext.Data.LoadFromEnumerable(samples);

// Create an anomaly detector. Its underlying algorithm is randomized PCA.
var pipeline = mlContext.AnomalyDetection.Trainers.AnalyzeRandomizedPrincipalComponents(featureColumnName: nameof(DataPoint.Features), rank: 1, ensureZeroMean: false);
var pipeline = mlContext.AnomalyDetection.Trainers.RandomizedPca(featureColumnName: nameof(DataPoint.Features), rank: 1, ensureZeroMean: false);

// Train the anomaly detector.
var model = pipeline.Fit(data);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,15 +28,15 @@ public static void Example()
// Convert the List<DataPoint> to IDataView, a consumble format to ML.NET functions.
var data = mlContext.Data.LoadFromEnumerable(samples);

var options = new ML.Trainers.RandomizedPrincipalComponentAnalyzer.Options()
var options = new ML.Trainers.RandomizedPcaTrainer.Options()
{
FeatureColumnName = nameof(DataPoint.Features),
Rank = 1,
Seed = 10,
};

// Create an anomaly detector. Its underlying algorithm is randomized PCA.
var pipeline = mlContext.AnomalyDetection.Trainers.AnalyzeRandomizedPrincipalComponents(options);
var pipeline = mlContext.AnomalyDetection.Trainers.RandomizedPca(options);

// Train the anomaly detector.
var model = pipeline.Fit(data);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ public static void Example()
var pipeline = new EstimatorChain<ITransformer>().AppendCacheCheckpoint(mlContext)
.Append(mlContext.BinaryClassification.Trainers.
FieldAwareFactorizationMachine(
new FieldAwareFactorizationMachineBinaryClassificationTrainer.Options
new FieldAwareFactorizationMachineTrainer.Options
{
FeatureColumnName = "Features",
LabelColumnName = "Sentiment",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ public static void Example()
// the "Features" column produced by FeaturizeText as the features column.
var pipeline = mlContext.Transforms.Text.FeaturizeText("SentimentText", "Features")
.AppendCacheCheckpoint(mlContext) // Add a data-cache step within a pipeline.
.Append(mlContext.BinaryClassification.Trainers.StochasticDualCoordinateAscent(labelColumnName: "Sentiment", featureColumnName: "Features", l2Regularization: 0.001f));
.Append(mlContext.BinaryClassification.Trainers.SdcaNonCalibrated(labelColumnName: "Sentiment", featureColumnName: "Features", l2Regularization: 0.001f));

// Step 3: Run Cross-Validation on this pipeline.
var cvResults = mlContext.BinaryClassification.CrossValidate(data, pipeline, labelColumn: "Sentiment");
Expand All @@ -60,8 +60,8 @@ public static void Example()
// If we wanted to specify more advanced parameters for the algorithm,
// we could do so by tweaking the 'advancedSetting'.
var advancedPipeline = mlContext.Transforms.Text.FeaturizeText("SentimentText", "Features")
.Append(mlContext.BinaryClassification.Trainers.StochasticDualCoordinateAscent(
new SdcaBinaryTrainer.Options {
.Append(mlContext.BinaryClassification.Trainers.SdcaCalibrated(
new SdcaCalibratedBinaryClassificationTrainer.Options {
LabelColumnName = "Sentiment",
FeatureColumnName = "Features",
ConvergenceTolerance = 0.01f, // The learning rate for adjusting bias from being regularized
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ public static void Example()

// Step 2: Create a binary classifier. This trainer may produce a logistic regression model.
// We set the "Label" column as the label of the dataset, and the "Features" column as the features column.
var pipeline = mlContext.BinaryClassification.Trainers.StochasticDualCoordinateAscentNonCalibrated(
var pipeline = mlContext.BinaryClassification.Trainers.SdcaNonCalibrated(
labelColumnName: "Label", featureColumnName: "Features", loss: new HingeLoss(), l2Regularization: 0.001f);

// Step 3: Train the pipeline created.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ public static void Example()
var trainTestData = mlContext.Data.TrainTestSplit(data, testFraction: 0.1);

// Define the trainer options.
var options = new SdcaBinaryTrainer.Options()
var options = new SdcaCalibratedBinaryClassificationTrainer.Options()
{
// Make the convergence tolerance tighter.
ConvergenceTolerance = 0.05f,
Expand All @@ -33,7 +33,7 @@ public static void Example()
};

// Create data training pipeline.
var pipeline = mlContext.BinaryClassification.Trainers.StochasticDualCoordinateAscent(options);
var pipeline = mlContext.BinaryClassification.Trainers.SdcaCalibrated(options);

// Fit this pipeline to the training data.
var model = pipeline.Fit(trainTestData.TrainSet);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ public static void Example()
var trainTestData = mlContext.Data.TrainTestSplit(data, testFraction: 0.1);

// Create data training pipeline.
var pipeline = mlContext.BinaryClassification.Trainers.StochasticGradientDescent();
var pipeline = mlContext.BinaryClassification.Trainers.SgdCalibrated();

// Fit this pipeline to the training data.
var model = pipeline.Fit(trainTestData.TrainSet);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ public static void Example()
var trainTestData = mlContext.Data.TrainTestSplit(data, testFraction: 0.1);

// Create data training pipeline.
var pipeline = mlContext.BinaryClassification.Trainers.StochasticGradientDescentNonCalibrated();
var pipeline = mlContext.BinaryClassification.Trainers.SgdNonCalibrated();

// Fit this pipeline to the training data.
var model = pipeline.Fit(trainTestData.TrainSet);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,8 @@ public static void Example()

// Create data training pipeline.
var pipeline = mlContext.BinaryClassification
.Trainers.StochasticGradientDescentNonCalibrated(
new SgdNonCalibratedBinaryTrainer.Options
.Trainers.SgdNonCalibrated(
new SgdNonCalibratedTrainer.Options
{
InitialLearningRate = 0.01,
NumberOfIterations = 10,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ public static void Example()
var trainTestData = mlContext.Data.TrainTestSplit(data, testFraction: 0.1);

// Define the trainer options.
var options = new SgdBinaryTrainer.Options()
var options = new SgdCalibratedTrainer.Options()
{
// Make the convergence tolerance tighter.
ConvergenceTolerance = 5e-5,
Expand All @@ -32,7 +32,7 @@ public static void Example()
};

// Create data training pipeline.
var pipeline = mlContext.BinaryClassification.Trainers.StochasticGradientDescent(options);
var pipeline = mlContext.BinaryClassification.Trainers.SgdCalibrated(options);

// Fit this pipeline to the training data.
var model = pipeline.Fit(trainTestData.TrainSet);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ public static void Example()
// Leave out 10% of data for testing.
var split = mlContext.Data.TrainTestSplit(data, testFraction: 0.1);
// Create data training pipeline.
var pipeline = mlContext.BinaryClassification.Trainers.SymbolicStochasticGradientDescent(labelColumnName: "IsOver50K", numberOfIterations: 25);
var pipeline = mlContext.BinaryClassification.Trainers.SymbolicSgd(labelColumnName: "IsOver50K", numberOfIterations: 25);
var model = pipeline.Fit(split.TrainSet);

// Evaluate how the model is doing on the test data.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,8 @@ public static void Example()
// Leave out 10% of data for testing.
var split = mlContext.Data.TrainTestSplit(data, testFraction: 0.1);
// Create data training pipeline
var pipeline = mlContext.BinaryClassification.Trainers.SymbolicStochasticGradientDescent(
new ML.Trainers.SymbolicStochasticGradientDescentClassificationTrainer.Options()
var pipeline = mlContext.BinaryClassification.Trainers.SymbolicSgd(
new ML.Trainers.SymbolicSgdTrainer.Options()
{
LearningRate = 0.2f,
NumberOfIterations = 10,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ public static void Example()
string outputColumnName = "Features";
var pipeline = ml.Transforms.Concatenate(outputColumnName, new[] { "Age", "Parity", "Induced" })
.Append(ml.Clustering.Trainers.KMeans(
new KMeansPlusPlusTrainer.Options
new KMeansTrainer.Options
{
FeatureColumnName = outputColumnName,
NumberOfClusters = 2,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ public static void Example()
// Convert the string labels into key types.
mlContext.Transforms.Conversion.MapValueToKey("Label")
// Apply StochasticDualCoordinateAscent multiclass trainer.
.Append(mlContext.MulticlassClassification.Trainers.StochasticDualCoordinateAscent());
.Append(mlContext.MulticlassClassification.Trainers.Sdca());

// Split the data into training and test sets. Only training set is used in fitting
// the created pipeline. Metrics are computed on the test.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ public static void Example()
// CC 1.216908,1.248052,1.391902,0.4326252,1.099942,0.9262842,1.334019,1.08762,0.9468155,0.4811099
// DD 0.7871246,1.053327,0.8971719,1.588544,1.242697,1.362964,0.6303943,0.9810045,0.9431419,1.557455

var options = new SdcaMultiClassTrainer.Options
var options = new SdcaMulticlassClassificationTrainer.Options
{
// Add custom loss
LossFunction = new HingeLoss(),
Expand All @@ -41,7 +41,7 @@ public static void Example()
// Convert the string labels into key types.
mlContext.Transforms.Conversion.MapValueToKey("Label")
// Apply StochasticDualCoordinateAscent multiclass trainer.
.Append(mlContext.MulticlassClassification.Trainers.StochasticDualCoordinateAscent(options));
.Append(mlContext.MulticlassClassification.Trainers.Sdca(options));

// Split the data into training and test sets. Only training set is used in fitting
// the created pipeline. Metrics are computed on the test.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ public static void Example()

// Create the estimator, here we only need OrdinaryLeastSquares trainer
// as data is already processed in a form consumable by the trainer
var pipeline = mlContext.Regression.Trainers.OrdinaryLeastSquares();
var pipeline = mlContext.Regression.Trainers.Ols();

var model = pipeline.Fit(split.TrainSet);

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ public static void Example()

// Create the estimator, here we only need OrdinaryLeastSquares trainer
// as data is already processed in a form consumable by the trainer
var pipeline = mlContext.Regression.Trainers.OrdinaryLeastSquares(new OrdinaryLeastSquaresRegressionTrainer.Options()
var pipeline = mlContext.Regression.Trainers.Ols(new OlsTrainer.Options()
{
L2Regularization = 0.1f,
CalculateStatistics = false
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ public static void Example()
var split = mlContext.Data.TrainTestSplit(dataView, testFraction: 0.1);

// Train the model.
var pipeline = mlContext.Regression.Trainers.StochasticDualCoordinateAscent();
var pipeline = mlContext.Regression.Trainers.Sdca();
var model = pipeline.Fit(split.TrainSet);

// Do prediction on the test set.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ public static void Example()
};

// Train the model.
var pipeline = mlContext.Regression.Trainers.StochasticDualCoordinateAscent(options);
var pipeline = mlContext.Regression.Trainers.Sdca(options);
var model = pipeline.Fit(split.TrainSet);

// Do prediction on the test set.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -64,8 +64,8 @@ public Arguments()
// non-default column names. Unfortuantely no method of resolving this temporary strikes me as being any
// less laborious than the proper fix, which is that this "meta" component should itself be a trainer
// estimator, as opposed to a regular trainer.
var trainerEstimator = new MulticlassLogisticRegression(env, LabelColumnName, FeatureColumnName);
return TrainerUtils.MapTrainerEstimatorToTrainer<MulticlassLogisticRegression,
var trainerEstimator = new LogisticRegressionMulticlassClassificationTrainer(env, LabelColumnName, FeatureColumnName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LogisticRegressionMulticlassClassificationTrainer [](start = 55, length = 49)

this will get ppl confused because it has both Regression and Multiclass on the name, but can't think of a good way to deal with it. would it be ok to just call it Logit @wschin @TomFinley

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would especially hesitate to call it Logit . Logits has a different interpretation related to the unnormalized log-probabilities


In reply to: 264516270 [](ancestors = 264516270)

return TrainerUtils.MapTrainerEstimatorToTrainer<LogisticRegressionMulticlassClassificationTrainer,
MulticlassLogisticRegressionModelParameters, MulticlassLogisticRegressionModelParameters>(env, trainerEstimator);
})
};
Expand Down
2 changes: 1 addition & 1 deletion src/Microsoft.ML.FastTree/FastTreeRegression.cs
Original file line number Diff line number Diff line change
Expand Up @@ -391,7 +391,7 @@ internal sealed class ObjectiveImpl : ObjectiveFunctionBase, IStepSearch
{
private readonly float[] _labels;

public ObjectiveImpl(Dataset trainData, RegressionGamTrainer.Options options) :
public ObjectiveImpl(Dataset trainData, GamRegressionTrainer.Options options) :
base(
trainData,
options.LearningRate,
Expand Down
Loading