Skip to content

adding a dependency to the MlNetMklDeps package #594

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 30, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions build/Dependencies.props
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,6 @@
<SystemReflectionEmitLightweightPackageVersion>4.3.0</SystemReflectionEmitLightweightPackageVersion>
<PublishSymbolsPackageVersion>1.0.0-beta-62824-02</PublishSymbolsPackageVersion>
<LightGBMPackageVersion>2.1.2.2</LightGBMPackageVersion>
<MlNetMklDepsPackageVersion>0.0.0.1</MlNetMklDepsPackageVersion>
</PropertyGroup>
</Project>
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
<ProjectReference Include="..\Microsoft.ML.CpuMath\Microsoft.ML.CpuMath.csproj" />
<ProjectReference Include="..\Microsoft.ML.Data\Microsoft.ML.Data.csproj" />
<ProjectReference Include="..\Microsoft.ML\Microsoft.ML.csproj" />
<PackageReference Include="MlNetMklDeps" Version="$(MlNetMklDepsPackageVersion)" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't going to work because the Microsoft.ML package doesn't reference the MlNetMklDeps package (and in my opinion, it should never reference it).

We will need to split OLS (and any other functionaltiy that depends on MKL) out into a separate NuGet package.

In general, if a component takes a dependency on "non-standard" or "large" external dependencies, we can't ship that component in our "Main" nuget package.

</ItemGroup>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. I wonder if OLS really belongs in standard learners if it has native dependencies. Well, we'll possibly resolve that later.


</Project>
26 changes: 21 additions & 5 deletions src/Microsoft.ML.StandardLearners/Standard/OlsLinearRegression.cs
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,11 @@
"OLS Linear Regression Executor",
OlsLinearRegressionPredictor.LoaderSignature)]

[assembly: LoadableClass(typeof(void), typeof(OlsLinearRegressionTrainer), null, typeof(SignatureEntryPointModule), OlsLinearRegressionTrainer.LoadNameValue)]

namespace Microsoft.ML.Runtime.Learners
{
/// <include file='doc.xml' path='doc/members/member[@name="OLS"]/*' />
public sealed class OlsLinearRegressionTrainer : TrainerBase<OlsLinearRegressionPredictor>
{
public sealed class Arguments : LearnerInputBaseWithWeight
Expand All @@ -51,11 +54,6 @@ public sealed class Arguments : LearnerInputBaseWithWeight
public const string ShortName = "ols";
internal const string Summary = "The ordinary least square regression fits the target function as a linear function of the numerical features "
+ "that minimizes the square loss function.";
internal const string Remarks = @"<remarks>
<a href='https://en.wikipedia.org/wiki/Ordinary_least_squares'>Ordinary least squares (OLS)</a> is a parameterized regression method.
It assumes that the conditional mean of the dependent variable follows a linear function of the dependent variables.
By minimizing the squares of the difference between observed values and the predictions, the parameters of the regressor can be estimated.
</remarks>";

private readonly Float _l2Weight;
private readonly bool _perParameterSignificance;
Expand Down Expand Up @@ -463,6 +461,24 @@ public static void Pptri(Layout layout, UpLo uplo, int n, Double[] ap)
}
}
}

[TlcModule.EntryPoint(Name = "Trainers.OrdinaryLeastSquaresRegressor",
Desc = "Train an OLS regression model.",
UserName = UserNameValue,
ShortName = ShortName,
XmlInclude = new[] { @"<include file='../Microsoft.ML.StandardLearners/Standard/doc.xml' path='doc/members/member[@name=""OLS""]/*' />" })]
public static CommonOutputs.RegressionOutput TrainRegression(IHostEnvironment env, Arguments input)
{
Contracts.CheckValue(env, nameof(env));
var host = env.Register("TrainOLS");
host.CheckValue(input, nameof(input));
EntryPointUtils.CheckInputArgs(host, input);

return LearnerEntryPointsUtils.Train<Arguments, CommonOutputs.RegressionOutput>(host, input,
() => new OlsLinearRegressionTrainer(host, input),
() => LearnerEntryPointsUtils.FindColumn(host, input.TrainingData.Schema, input.LabelColumn),
() => LearnerEntryPointsUtils.FindColumn(host, input.TrainingData.Schema, input.WeightColumn));
}
}

/// <summary>
Expand Down
21 changes: 21 additions & 0 deletions src/Microsoft.ML.StandardLearners/Standard/doc.xml
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,27 @@
</code>
</example>
</example>

<member name="OLS">
<summary>
Train an OLS regression model.
</summary>
<remarks>
<a href='https://en.wikipedia.org/wiki/Ordinary_least_squares'>Ordinary least squares (OLS)</a> is a parameterized regression method.
It assumes that the conditional mean of the dependent variable follows a linear function of the dependent variables.
The parameters of the regressor can be estimated by minimizing the squares of the difference between observed values and the predictions.
</remarks>
<example>
<code language="csharp">
new OrdinaryLeastSquaresRegressor
{
L2Weight = 0.1,
PerParameterSignificance = false,
NormalizeFeatures = Microsoft.ML.Models.NormalizeOption.Yes
}
</code>
</example>
</member>

</members>
</doc>
99 changes: 99 additions & 0 deletions src/Microsoft.ML/CSharpApi.cs
Original file line number Diff line number Diff line change
Expand Up @@ -754,6 +754,18 @@ public void Add(Microsoft.ML.Trainers.OnlineGradientDescentRegressor input, Micr
_jsonNodes.Add(Serialize("Trainers.OnlineGradientDescentRegressor", input, output));
}

public Microsoft.ML.Trainers.OrdinaryLeastSquaresRegressor.Output Add(Microsoft.ML.Trainers.OrdinaryLeastSquaresRegressor input)
{
var output = new Microsoft.ML.Trainers.OrdinaryLeastSquaresRegressor.Output();
Add(input, output);
return output;
}

public void Add(Microsoft.ML.Trainers.OrdinaryLeastSquaresRegressor input, Microsoft.ML.Trainers.OrdinaryLeastSquaresRegressor.Output output)
{
_jsonNodes.Add(Serialize("Trainers.OrdinaryLeastSquaresRegressor", input, output));
}

public Microsoft.ML.Trainers.PcaAnomalyDetector.Output Add(Microsoft.ML.Trainers.PcaAnomalyDetector input)
{
var output = new Microsoft.ML.Trainers.PcaAnomalyDetector.Output();
Expand Down Expand Up @@ -8824,6 +8836,93 @@ public OnlineGradientDescentRegressorPipelineStep(Output output)
}
}

namespace Trainers
{

/// <include file='../Microsoft.ML.StandardLearners/Standard/doc.xml' path='doc/members/member[@name="OLS"]/*' />
public sealed partial class OrdinaryLeastSquaresRegressor : Microsoft.ML.Runtime.EntryPoints.CommonInputs.ITrainerInputWithWeight, Microsoft.ML.Runtime.EntryPoints.CommonInputs.ITrainerInputWithLabel, Microsoft.ML.Runtime.EntryPoints.CommonInputs.ITrainerInput, Microsoft.ML.ILearningPipelineItem
{


/// <summary>
/// L2 regularization weight
/// </summary>
[TlcModule.SweepableDiscreteParamAttribute("L2Weight", new object[]{1E-06f, 0.1f, 1f})]
public float L2Weight { get; set; } = 1E-06f;

/// <summary>
/// Whether to calculate per parameter significance statistics
/// </summary>
public bool PerParameterSignificance { get; set; } = true;

/// <summary>
/// Column to use for example weight
/// </summary>
public Microsoft.ML.Runtime.EntryPoints.Optional<string> WeightColumn { get; set; }

/// <summary>
/// Column to use for labels
/// </summary>
public string LabelColumn { get; set; } = "Label";

/// <summary>
/// The data to be used for training
/// </summary>
public Var<Microsoft.ML.Runtime.Data.IDataView> TrainingData { get; set; } = new Var<Microsoft.ML.Runtime.Data.IDataView>();

/// <summary>
/// Column to use for features
/// </summary>
public string FeatureColumn { get; set; } = "Features";

/// <summary>
/// Normalize option for the feature column
/// </summary>
public Microsoft.ML.Models.NormalizeOption NormalizeFeatures { get; set; } = Microsoft.ML.Models.NormalizeOption.Auto;

/// <summary>
/// Whether learner should cache input training data
/// </summary>
public Microsoft.ML.Models.CachingOptions Caching { get; set; } = Microsoft.ML.Models.CachingOptions.Auto;


public sealed class Output : Microsoft.ML.Runtime.EntryPoints.CommonOutputs.IRegressionOutput, Microsoft.ML.Runtime.EntryPoints.CommonOutputs.ITrainerOutput
{
/// <summary>
/// The trained model
/// </summary>
public Var<Microsoft.ML.Runtime.EntryPoints.IPredictorModel> PredictorModel { get; set; } = new Var<Microsoft.ML.Runtime.EntryPoints.IPredictorModel>();

}
public Var<IDataView> GetInputData() => TrainingData;

public ILearningPipelineStep ApplyStep(ILearningPipelineStep previousStep, Experiment experiment)
{
if (previousStep != null)
{
if (!(previousStep is ILearningPipelineDataStep dataStep))
{
throw new InvalidOperationException($"{ nameof(OrdinaryLeastSquaresRegressor)} only supports an { nameof(ILearningPipelineDataStep)} as an input.");
}

TrainingData = dataStep.Data;
}
Output output = experiment.Add(this);
return new OrdinaryLeastSquaresRegressorPipelineStep(output);
}

private class OrdinaryLeastSquaresRegressorPipelineStep : ILearningPipelinePredictorStep
{
public OrdinaryLeastSquaresRegressorPipelineStep(Output output)
{
Model = output.PredictorModel;
}

public Var<IPredictorModel> Model { get; }
}
}
}

namespace Trainers
{

Expand Down
1 change: 1 addition & 0 deletions test/BaselineOutput/Common/EntryPoints/core_ep-list.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ Trainers.LogisticRegressionBinaryClassifier Logistic Regression is a method in s
Trainers.LogisticRegressionClassifier Logistic Regression is a method in statistics used to predict the probability of occurrence of an event and can be used as a classification algorithm. The algorithm predicts the probability of occurrence of an event by fitting data to a logistical function. Microsoft.ML.Runtime.Learners.LogisticRegression TrainMultiClass Microsoft.ML.Runtime.Learners.MulticlassLogisticRegression+Arguments Microsoft.ML.Runtime.EntryPoints.CommonOutputs+MulticlassClassificationOutput
Trainers.NaiveBayesClassifier Train a MultiClassNaiveBayesTrainer. Microsoft.ML.Runtime.Learners.MultiClassNaiveBayesTrainer TrainMultiClassNaiveBayesTrainer Microsoft.ML.Runtime.Learners.MultiClassNaiveBayesTrainer+Arguments Microsoft.ML.Runtime.EntryPoints.CommonOutputs+MulticlassClassificationOutput
Trainers.OnlineGradientDescentRegressor Train a Online gradient descent perceptron. Microsoft.ML.Runtime.Learners.OnlineGradientDescentTrainer TrainRegression Microsoft.ML.Runtime.Learners.OnlineGradientDescentTrainer+Arguments Microsoft.ML.Runtime.EntryPoints.CommonOutputs+RegressionOutput
Trainers.OrdinaryLeastSquaresRegressor Train an OLS regression model. Microsoft.ML.Runtime.Learners.OlsLinearRegressionTrainer TrainRegression Microsoft.ML.Runtime.Learners.OlsLinearRegressionTrainer+Arguments Microsoft.ML.Runtime.EntryPoints.CommonOutputs+RegressionOutput
Trainers.PcaAnomalyDetector Train an PCA Anomaly model. Microsoft.ML.Runtime.PCA.RandomizedPcaTrainer TrainPcaAnomaly Microsoft.ML.Runtime.PCA.RandomizedPcaTrainer+Arguments Microsoft.ML.Runtime.EntryPoints.CommonOutputs+AnomalyDetectionOutput
Trainers.PoissonRegressor Train an Poisson regression model. Microsoft.ML.Runtime.Learners.PoissonRegression TrainRegression Microsoft.ML.Runtime.Learners.PoissonRegression+Arguments Microsoft.ML.Runtime.EntryPoints.CommonOutputs+RegressionOutput
Trainers.StochasticDualCoordinateAscentBinaryClassifier Train an SDCA binary model. Microsoft.ML.Runtime.Learners.Sdca TrainBinary Microsoft.ML.Runtime.Learners.LinearClassificationTrainer+Arguments Microsoft.ML.Runtime.EntryPoints.CommonOutputs+BinaryClassificationOutput
Expand Down
143 changes: 143 additions & 0 deletions test/BaselineOutput/Common/EntryPoints/core_manifest.json
Original file line number Diff line number Diff line change
Expand Up @@ -14050,6 +14050,149 @@
"ITrainerOutput"
]
},
{
"Name": "Trainers.OrdinaryLeastSquaresRegressor",
"Desc": "Train an OLS regression model.",
"FriendlyName": "Ordinary Least Squares (Regression)",
"ShortName": "ols",
"Inputs": [
{
"Name": "TrainingData",
"Type": "DataView",
"Desc": "The data to be used for training",
"Aliases": [
"data"
],
"Required": true,
"SortOrder": 1.0,
"IsNullable": false
},
{
"Name": "FeatureColumn",
"Type": "String",
"Desc": "Column to use for features",
"Aliases": [
"feat"
],
"Required": false,
"SortOrder": 2.0,
"IsNullable": false,
"Default": "Features"
},
{
"Name": "LabelColumn",
"Type": "String",
"Desc": "Column to use for labels",
"Aliases": [
"lab"
],
"Required": false,
"SortOrder": 3.0,
"IsNullable": false,
"Default": "Label"
},
{
"Name": "WeightColumn",
"Type": "String",
"Desc": "Column to use for example weight",
"Aliases": [
"weight"
],
"Required": false,
"SortOrder": 4.0,
"IsNullable": false,
"Default": "Weight"
},
{
"Name": "NormalizeFeatures",
"Type": {
"Kind": "Enum",
"Values": [
"No",
"Warn",
"Auto",
"Yes"
]
},
"Desc": "Normalize option for the feature column",
"Aliases": [
"norm"
],
"Required": false,
"SortOrder": 5.0,
"IsNullable": false,
"Default": "Auto"
},
{
"Name": "Caching",
"Type": {
"Kind": "Enum",
"Values": [
"Auto",
"Memory",
"Disk",
"None"
]
},
"Desc": "Whether learner should cache input training data",
"Aliases": [
"cache"
],
"Required": false,
"SortOrder": 6.0,
"IsNullable": false,
"Default": "Auto"
},
{
"Name": "L2Weight",
"Type": "Float",
"Desc": "L2 regularization weight",
"Aliases": [
"l2"
],
"Required": false,
"SortOrder": 50.0,
"IsNullable": false,
"Default": 1E-06,
"SweepRange": {
"RangeType": "Discrete",
"Values": [
1E-06,
0.1,
1.0
]
}
},
{
"Name": "PerParameterSignificance",
"Type": "Bool",
"Desc": "Whether to calculate per parameter significance statistics",
"Aliases": [
"sig"
],
"Required": false,
"SortOrder": 150.0,
"IsNullable": false,
"Default": true
}
],
"Outputs": [
{
"Name": "PredictorModel",
"Type": "PredictorModel",
"Desc": "The trained model"
}
],
"InputKind": [
"ITrainerInputWithWeight",
"ITrainerInputWithLabel",
"ITrainerInput"
],
"OutputKind": [
"IRegressionOutput",
"ITrainerOutput"
]
},
{
"Name": "Trainers.PcaAnomalyDetector",
"Desc": "Train an PCA Anomaly model.",
Expand Down
33 changes: 33 additions & 0 deletions test/BaselineOutput/SingleDebug/OLS/OLS-CV-wine-out.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
maml.exe CV tr=OLS threads=- norm=No dout=%Output% loader=Text{col=Label:R4:11 col=Features:R4:0-10 sep=; header+} data=%Data% seed=1
Not adding a normalizer.
Trainer solving for 12 parameters across 2409 examples
Coefficient of determination R2 = 0.291173667189042, or 0.287920813763543 (adjusted)
Not training a calibrator because it is not needed.
Not adding a normalizer.
Trainer solving for 12 parameters across 2489 examples
Coefficient of determination R2 = 0.280280855195625, or 0.277084686203761 (adjusted)
Not training a calibrator because it is not needed.
L1(avg): 0.586798
L2(avg): 0.573048
RMS(avg): 0.756999
Loss-fn(avg): 0.573048
R Squared: 0.263841
L1(avg): 0.587999
L2(avg): 0.571859
RMS(avg): 0.756214
Loss-fn(avg): 0.571859
R Squared: 0.276072

OVERALL RESULTS
---------------------------------------
L1(avg): 0.587398 (0.0006)
L2(avg): 0.572454 (0.0006)
RMS(avg): 0.756606 (0.0004)
Loss-fn(avg): 0.572454 (0.0006)
R Squared: 0.269956 (0.0061)

---------------------------------------
Physical memory usage(MB): %Number%
Virtual memory usage(MB): %Number%
%DateTime% Time elapsed(s): %Number%

Loading