Skip to content

Conversion of Logistic, Multiclass Logistic, and Poisson Regression to estimators #957

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Sep 25, 2018

Conversation

artidoro
Copy link
Contributor

@artidoro artidoro commented Sep 19, 2018

Fixes #956.

Conversion of Logistic, Multiclass Logistic, and Poisson Regression to estimators by deriving the base class LbfgsTrainerBase from TrainerEstimatorBase.

Working on testing the converted classes using TestEstimatorCore.

@artidoro artidoro added the API Issues pertaining the friendly API label Sep 19, 2018
@sfilipi
Copy link
Member

sfilipi commented Sep 20, 2018

    }

add a second constructor taking in the label, feature, weight column. #Closed


Refers to: src/Microsoft.ML.StandardLearners/Standard/PoissonRegression/PoissonRegression.cs:47 in c7a996e. [](commit_id = c7a996e, deletion_comment = False)

@sfilipi
Copy link
Member

sfilipi commented Sep 20, 2018

    }

add the other constructor. #Closed


Refers to: src/Microsoft.ML.StandardLearners/Standard/LogisticRegression/MulticlassLogisticRegression.cs:76 in c7a996e. [](commit_id = c7a996e, deletion_comment = False)

{
new SchemaShape.Column(DefaultColumnNames.Score, SchemaShape.Column.VectorKind.Scalar, NumberType.R4, false, new SchemaShape(MetadataUtils.GetTrainerOutputMetadata())),
new SchemaShape.Column(DefaultColumnNames.Probability, SchemaShape.Column.VectorKind.Scalar, NumberType.R4, false, new SchemaShape(MetadataUtils.GetTrainerOutputMetadata(true))),
new SchemaShape.Column(DefaultColumnNames.PredictedLabel, SchemaShape.Column.VectorKind.Scalar, BoolType.Instance, false, new SchemaShape(MetadataUtils.GetTrainerOutputMetadata()))
Copy link
Member

@sfilipi sfilipi Sep 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don;t think we get probability for multi-class #Closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually Score is the probability vector for the case of multiclass.


In reply to: 219044096 [](ancestors = 219044096)

{
new SchemaShape.Column(DefaultColumnNames.Score, SchemaShape.Column.VectorKind.Scalar, NumberType.R4, false, new SchemaShape(MetadataUtils.GetTrainerOutputMetadata())),
new SchemaShape.Column(DefaultColumnNames.Probability, SchemaShape.Column.VectorKind.Scalar, NumberType.R4, false, new SchemaShape(MetadataUtils.GetTrainerOutputMetadata(true))),
new SchemaShape.Column(DefaultColumnNames.PredictedLabel, SchemaShape.Column.VectorKind.Scalar, BoolType.Instance, false, new SchemaShape(MetadataUtils.GetTrainerOutputMetadata()))
Copy link
Member

@sfilipi sfilipi Sep 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only score. #Closed

@Zruty0
Copy link
Contributor

Zruty0 commented Sep 20, 2018

    public LogisticRegression(IHostEnvironment env, Arguments args)

let's add a ctor that has Action<args> #Resolved


Refers to: src/Microsoft.ML.StandardLearners/Standard/LogisticRegression/LogisticRegression.cs:52 in c7a996e. [](commit_id = c7a996e, deletion_comment = False)

@Zruty0
Copy link
Contributor

Zruty0 commented Sep 20, 2018

    public LogisticRegression(IHostEnvironment env, Arguments args)

and the necessary column values


In reply to: 423302557 [](ancestors = 423302557)


Refers to: src/Microsoft.ML.StandardLearners/Standard/LogisticRegression/LogisticRegression.cs:52 in c7a996e. [](commit_id = c7a996e, deletion_comment = False)

@@ -138,7 +140,7 @@ public abstract class ArgumentsBase : LearnerInputBaseWithWeight
public override TrainerInfo Info => _info;

internal LbfgsTrainerBase(ArgumentsBase args, IHostEnvironment env, string name, bool showTrainingStats = false)
: base(env, name)
: base(Contracts.CheckRef(env, nameof(env)).Register(name), MakeFeatureColumn(args.FeatureColumn), MakeFeatureColumn(args.LabelColumn), MakeWeightColumn(args.WeightColumn))
Copy link
Contributor

@Zruty0 Zruty0 Sep 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

args.FeatureColumn [](start = 90, length = 18)

I'd pass feature, label and weights separately from args #Resolved

@Zruty0
Copy link
Contributor

Zruty0 commented Sep 20, 2018

tests that contain TestEstimatorCore please #Resolved


namespace Microsoft.ML.Tests
{
public class LbfgsTests : TestDataPipeBase
Copy link
Member

@sfilipi sfilipi Sep 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class Lbfgs [](start = 11, length = 11)

if you sync to latest, you can make it a TrainerEstimators partial class. #Closed

{
private IDataView GetIrisDataview()
{
var dataPath = GetDataPath("iris.txt");
Copy link
Member

@sfilipi sfilipi Sep 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"iris.txt") [](start = 39, length = 11)

imo, tend to use the TestDatasets classes #Closed

public void TestEstimatorLogisticRegression()
{
var dataView = GetBreastCancerDataview();
dataView = Env.CreateTransform("Term{col=F1}", dataView);
Copy link
Member

@sfilipi sfilipi Sep 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Env.CreateTransform("Term{col=F1}", dataView); [](start = 23, length = 46)

should aim at not using maml #Closed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are utility methods to get pipelines for tasks on the LightGBM PR. maybe they'll be useful.


In reply to: 219322724 [](ancestors = 219322724)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I'll look into changing this then.


In reply to: 219323059 [](ancestors = 219323059,219322724)

string groupIdColumn = null, string weightColumn = null, Action<Arguments> advancedSettings = null)
: base(env, featureColumn, labelColumn, weightColumn, groupIdColumn, advancedSettings)
{
}
Copy link
Member

@sfilipi sfilipi Sep 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add Host.CheckNonEmpty for label, feature #Closed

return new[]
{
new SchemaShape.Column(DefaultColumnNames.Score, SchemaShape.Column.VectorKind.Scalar, NumberType.R4, false, new SchemaShape(MetadataUtils.GetTrainerOutputMetadata())),
new SchemaShape.Column(DefaultColumnNames.PredictedLabel, SchemaShape.Column.VectorKind.Scalar, BoolType.Instance, false, new SchemaShape(MetadataUtils.GetTrainerOutputMetadata()))
Copy link
Member

@sfilipi sfilipi Sep 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new SchemaSh [](start = 138, length = 12)

look at what SDCA multiclass is doing for the metadata append the Label metadata here. #Closed

@@ -69,8 +70,14 @@ public sealed class Arguments : ArgumentsBase

protected override int ClassCount => _numClasses;

public MulticlassLogisticRegression(IHostEnvironment env, string featureColumn, string labelColumn,
string groupIdColumn = null, string weightColumn = null, Action<Arguments> advancedSettings = null)
: base(env, featureColumn, labelColumn, weightColumn, groupIdColumn, advancedSettings)
Copy link
Member

@sfilipi sfilipi Sep 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

labelColumn [](start = 39, length = 11)

shape of the label column for multi-class is different than regression, will want to construct it here. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept it as it is now and the tests seem to be passing correctly. Am I missing something?


In reply to: 219326153 [](ancestors = 219326153)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the label for multiclass is a key, and item type U4. R4 is probably acceptable, but you need it to be a key.


In reply to: 219349397 [](ancestors = 219349397,219326153)

NumThreads = Args.NumThreads;
DenseOptimizer = Args.DenseOptimizer;
//ShowTrainingStats = showTrainingStats; // TODO
EnforceNonNegativity = Args.EnforceNonNegativity;
Copy link
Member

@sfilipi sfilipi Sep 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try to avoid repetition between the two constructors #Closed

var dataView = GetIrisDataview();
var data = FeatureCombiner.PrepareFeatures(Env, new FeatureCombiner.FeatureCombinerInput() { Data = dataView, Features = new[] { "SepalLength", "SepalWidth", "PetalLength", "PetalWidth" } }).OutputData;

var pipe = new LogisticRegression(Env, "Features", "Label");
Copy link
Member

@sfilipi sfilipi Sep 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LogisticRegression [](start = 27, length = 18)

did you mean multiclassLR? #Closed

public void TestEstimatorPoissonRegression()
{
var dataView = GetGeneratedRegressionDataview();
var pipe = new LogisticRegression(Env, "Features", "Label");
Copy link
Member

@sfilipi sfilipi Sep 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LogisticRegression [](start = 27, length = 18)

Possion #Closed

}

internal LbfgsTrainerBase(IHostEnvironment env, TArgs args)
: base(Contracts.CheckRef(env, nameof(env)).Register(RegisterName), MakeFeatureColumn(args.FeatureColumn), MakeFeatureColumn(args.LabelColumn), MakeWeightColumn(args.WeightColumn))
Copy link
Contributor

@zeahmed zeahmed Sep 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MakeFeatureColumn [](start = 119, length = 17)

MakeLabelColumn? #Closed

public void TestEstimatorLogisticRegression()
{
var dataView = GetBreastCancerDataview();
//var args = new LogisticRegression.Arguments();
Copy link
Contributor

@zeahmed zeahmed Sep 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

//var args = new LogisticRegression.Arguments(); [](start = 12, length = 48)

please remove commented code. #Closed

InitArguments();
}

internal LbfgsTrainerBase(IHostEnvironment env, TArgs args)
Copy link
Contributor

@zeahmed zeahmed Sep 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LbfgsTrainerBase [](start = 17, length = 16)

I think you can call this constructor from the other one and make all the fields back to readonly #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might be a bit hard. The arguments would not be in Args before running the code in the original constructor. And I believe that happens after calling the other constructor.


In reply to: 219355643 [](ancestors = 219355643)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to keep them readonly though.


In reply to: 219593965 [](ancestors = 219593965,219355643)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually for TAgrs, you could write a static method that would return the TArgs and that you could use in this() call as this(...CreateArgs(...)..).


In reply to: 219594177 [](ancestors = 219594177,219593965,219355643)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's a good idea! I will look into that.


In reply to: 219653751 [](ancestors = 219653751,219594177,219593965,219355643)

InitArguments();
}

protected void InitArguments()
Copy link
Member

@sfilipi sfilipi Sep 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

protected [](start = 8, length = 9)

would private suffice? #Closed

UseThreads = Args.UseThreads;
NumThreads = Args.NumThreads;
DenseOptimizer = Args.DenseOptimizer;
EnforceNonNegativity = Args.EnforceNonNegativity;
Copy link
Member

@sfilipi sfilipi Sep 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry just notice that those are gone from the constructor. You still need them; you can remove readonly and set them in a method, call that from both constructors. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are been called by both constructors. I set it up so that the first constructor calls the second using this(...). It was suggested by Zeeshan A.


In reply to: 219994227 [](ancestors = 219994227)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, missed.


In reply to: 220022095 [](ancestors = 220022095,219994227)


/// <summary>
/// Initializes a new instance of <see cref="MulticlassLogisticRegression"/>
/// </summary>
public MulticlassLogisticRegression(IHostEnvironment env, Arguments args)
Copy link
Member

@sfilipi sfilipi Sep 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public [](start = 8, length = 6)

is internal enough? #Closed

@@ -27,7 +26,7 @@
namespace Microsoft.ML.Runtime.Learners
{
/// <include file='doc.xml' path='doc/members/member[@name="PoissonRegression"]/*' />
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Sep 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PoissonRegression [](start = 64, length = 17)

If you look in doc.xml in our code sample we have this:

new PoissonRegressor
          {
            MaxIterations = 100,
            L2Weight = 0.6f
          }

Which is no longer are valid statement. #ByDesign

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think it's a much broader issue, and need to be resolved for all converted to estimators classes.


In reply to: 220025467 [](ancestors = 220025467)

@@ -38,7 +37,7 @@ namespace Microsoft.ML.Runtime.Learners
{
/// <include file = 'doc.xml' path='doc/members/member[@name="LBFGS"]/*' />
/// <include file = 'doc.xml' path='docs/members/example[@name="LogisticRegressionClassifier"]/*' />
public sealed class MulticlassLogisticRegression : LbfgsTrainerBase<VBuffer<Float>, MulticlassLogisticRegressionPredictor>
public sealed class MulticlassLogisticRegression : LbfgsTrainerBase<MulticlassLogisticRegression.Arguments, MulticlassPredictionTransformer<MulticlassLogisticRegressionPredictor>, MulticlassLogisticRegressionPredictor>
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Sep 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you break this line? #Closed

{
var opt = base.InitializeOptimizer(ch, cursorFactory, out init, out terminationCriterion);

// MeanImprovementCriterion:
// Terminates when the geometrically-weighted average improvement falls below the tolerance
terminationCriterion = new MeanImprovementCriterion(OptTol, (Float)0.25, MaxIterations);
terminationCriterion = new MeanImprovementCriterion(OptTol, (float)0.25, MaxIterations);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Sep 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(float)0.25 [](start = 72, length = 11)

0.25f
no reason to cast it. #Closed

{
var cols = new List<SchemaShape.Column>();
foreach (SchemaShape.Column col in MetadataUtils.GetTrainerOutputMetadata())
{
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Sep 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{ [](start = 12, length = 1)

I think we have tendency to omit braces for one liners. #Closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, why can't you do cols.AddRange(MetadataUtils.GetTrainerOutputMetadata()) ?


In reply to: 220026583 [](ancestors = 220026583)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks I think that that works!!


In reply to: 220026684 [](ancestors = 220026684,220026583)

@@ -646,7 +698,7 @@ protected override void SaveCore(ModelSaveContext ctx)
}

// REVIEW: Destroy.
private static int NonZeroCount(ref VBuffer<Float> vector)
private static int NonZeroCount(ref VBuffer<float> vector)
{
if (!vector.IsDense)
return vector.Count;
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Sep 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to your changes, but I think this is wrong.
Also, great review comment for this method! #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think it's a pretty extreme review comment! haha


In reply to: 220027114 [](ancestors = 220027114)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we file an issue on this?


In reply to: 220044369 [](ancestors = 220044369,220027114)

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Sep 25, 2018

        writer.WriteLine("MulticlassLogisticRegression bias and non-zero weights");

can you use nameof(MulticlassLogisticRegression) instead? #Resolved


Refers to: src/Microsoft.ML.StandardLearners/Standard/LogisticRegression/MulticlassLogisticRegression.cs:790 in 4dcb4ab. [](commit_id = 4dcb4ab, deletion_comment = False)

Copy link
Contributor

@zeahmed zeahmed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@zeahmed zeahmed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@@ -92,77 +93,89 @@ public abstract class ArgumentsBase : LearnerInputBaseWithWeight
public bool EnforceNonNegativity = false;
}

private const string RegisterName = "LbfgsTraining";
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Sep 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RegisterName [](start = 29, length = 12)

you can use nameof(LbfgsTrainerBase) in code instead of this constant. #Closed

for (int j = 0; j < initWeights.Length; j++)
initWeights[j] = InitWtsDiameter * (Host.Rand.NextSingle() - (Float)0.5);
init = new VBuffer<Float>(initWeights.Length, initWeights);
initWeights[j] = InitWtsDiameter * (Host.Rand.NextSingle() - (float)0.5);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Sep 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(float)0.5 [](start = 81, length = 10)

0.5f, no reason for cast. #Closed

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@Ivanidzo4ka
Copy link
Contributor

        writer.WriteLine("MulticlassLogisticRegression bias and non-zero weights");

I don't meant to remove "bias and non-zero weights" from output, just replace string constant of class to nameof class.


In reply to: 424166489 [](ancestors = 424166489)


Refers to: src/Microsoft.ML.StandardLearners/Standard/LogisticRegression/MulticlassLogisticRegression.cs:790 in 4dcb4ab. [](commit_id = 4dcb4ab, deletion_comment = False)

@artidoro
Copy link
Contributor Author

        writer.WriteLine("MulticlassLogisticRegression bias and non-zero weights");

Sorry I didn't understood right away. I just fixed it.


In reply to: 424420860 [](ancestors = 424420860,424166489)


Refers to: src/Microsoft.ML.StandardLearners/Standard/LogisticRegression/MulticlassLogisticRegression.cs:790 in 4dcb4ab. [](commit_id = 4dcb4ab, deletion_comment = False)

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Sep 25, 2018

Please merge it with master. #Resolved

@artidoro artidoro merged commit c0545a7 into dotnet:master Sep 25, 2018
@artidoro artidoro deleted the lrandmultilr branch January 5, 2019 00:02
@ghost ghost locked as resolved and limited conversation to collaborators Mar 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants