Skip to content

Scrubbing LogisticRegression learners #2761

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Mar 4, 2019

Conversation

abgoswam
Copy link
Member

@abgoswam abgoswam commented Feb 27, 2019

Fixes #2615. Related #2613

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Feb 27, 2019

        public float L2Weight = Defaults.L2Weight;

does L2Regularization sound better for you? #Resolved


Refers to: src/Microsoft.ML.StandardLearners/Standard/LogisticRegression/LbfgsPredictorBase.cs:28 in 9ea2076. [](commit_id = 9ea2076, deletion_comment = False)

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Feb 27, 2019

        public int MemorySize = Defaults.MemorySize;

I'm looking on this one, and I think it would be amount of memory be allocated in MB during training.
But in reality it's

/// <summary>
/// Number of previous iterations to remember for estimate of Hessian.
/// </summary>
``` #Resolved

---
Refers to: src/Microsoft.ML.StandardLearners/Standard/LogisticRegression/LbfgsPredictorBase.cs:45 in 9ea2076. [](commit_id = 9ea2076862712cb9856ecaade204d658cf2061f4, deletion_comment = False)

@@ -47,7 +47,7 @@ public abstract class OptionsBase : LearnerInputBaseWithWeight
[Argument(ArgumentType.AtMostOnce, HelpText = "Maximum iterations.", ShortName = "maxiter")]
[TGUI(Label = "Max Number of Iterations")]
[TlcModule.SweepableLongParamAttribute("MaxIterations", 1, int.MaxValue)]
public int MaxIterations = Defaults.MaxIterations;
public int MaximumIterations = Defaults.MaximumIterations;
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MaximumIterations [](start = 23, length = 17)

let's stick to NumberOfIterations across all our learners. #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


In reply to: 260873873 [](ancestors = 260873873)

@@ -71,7 +71,7 @@ public abstract class OptionsBase : LearnerInputBaseWithWeight
[Argument(ArgumentType.LastOccurenceWins, HelpText = "Init weights diameter", ShortName = "initwts", SortOrder = 140)]
[TGUI(Label = "Initial Weights Scale", SuggestedSweeps = "0,0.1,0.5,1")]
[TlcModule.SweepableFloatParamAttribute("InitWtsDiameter", 0.0f, 1.0f, numSteps: 5)]
public float InitWtsDiameter = 0;
public float InitialWeightsDiameter = 0;

// Deprecated
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Deprecated [](start = 12, length = 13)

:)
can you double check it's true? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like its used... one of our tests MulticlassLogisticRegressionOnnxConversionTest even uses it by setting it to false

i have added a comment


In reply to: 260874466 [](ancestors = 260874466)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question is it actually used in logisticRegression code?
Across all other learners we just use NumberOfThreads to pass how many threads we want to use.
I just don't see reason why this one should be remain public.


In reply to: 260904491 [](ancestors = 260904491,260874466)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made internal for now.

I see several usage of this in TestEntryPoint.cs . As you pointed, we should look into whether we can remove this completely and only use NumberOfThreads even from EntryPoints


In reply to: 260905721 [](ancestors = 260905721,260904491,260874466)

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Feb 27, 2019

        public float L1Weight = Defaults.L1Weight;

L1Regularization ? #Resolved


Refers to: src/Microsoft.ML.StandardLearners/Standard/LogisticRegression/LbfgsPredictorBase.cs:33 in 9ea2076. [](commit_id = 9ea2076, deletion_comment = False)

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Feb 27, 2019

        public bool ShowTrainingStats = false;

you change it to ShowTrainingStatistics in LogisticRegression #Resolved


Refers to: src/Microsoft.ML.StandardLearners/Standard/LogisticRegression/MulticlassLogisticRegression.cs:49 in 9ea2076. [](commit_id = 9ea2076, deletion_comment = False)

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Feb 27, 2019

    public static LogisticRegression LogisticRegression(this BinaryClassificationCatalog.BinaryClassificationTrainers catalog,

I think we trying to have following pattern for trainer names:
{AlgoName}{PredictionKind}Trainer
like above we have
SdcaBinaryTrainer (not a great example tho. it should be StochasticDualCoordinateAscentBinaryTrainer) or LightGBMMulticlassTrainer.

Actually created issue regarding that: #2762 stay tuned!

#Pending


Refers to: src/Microsoft.ML.StandardLearners/StandardLearnersCatalog.cs:427 in 9ea2076. [](commit_id = 9ea2076, deletion_comment = False)

@abgoswam
Copy link
Member Author

abgoswam commented Feb 27, 2019

        public int MemorySize = Defaults.MemorySize;

i will add that as comment ..

in some places its also commented as follows without reference to the Hessian:

/// The number of previous iterations to store

i think its fine to keep the name as MemorySize. what do u think ?


In reply to: 467967467 [](ancestors = 467967467)


Refers to: src/Microsoft.ML.StandardLearners/Standard/LogisticRegression/LbfgsPredictorBase.cs:45 in 9ea2076. [](commit_id = 9ea2076, deletion_comment = False)

@Ivanidzo4ka
Copy link
Contributor

        public int MemorySize = Defaults.MemorySize;

I would rather change parameter name to reflect what it does than update documentation.


In reply to: 467984224 [](ancestors = 467984224,467967467)


Refers to: src/Microsoft.ML.StandardLearners/Standard/LogisticRegression/LbfgsPredictorBase.cs:45 in 9ea2076. [](commit_id = 9ea2076, deletion_comment = False)

@@ -71,7 +71,7 @@ public abstract class OptionsBase : LearnerInputBaseWithWeight
[Argument(ArgumentType.LastOccurenceWins, HelpText = "Init weights diameter", ShortName = "initwts", SortOrder = 140)]
[TGUI(Label = "Initial Weights Scale", SuggestedSweeps = "0,0.1,0.5,1")]
[TlcModule.SweepableFloatParamAttribute("InitWtsDiameter", 0.0f, 1.0f, numSteps: 5)]
public float InitWtsDiameter = 0;
public float InitialWeightsDiameter = 0;
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InitialWeightsDiameter [](start = 25, length = 22)

So when you do this, you need to take old name and put it in ShortName in ArgumentAttribute like this ShortName="initwts,initwtsDiameter"
#Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. does it apply to all the other renamings as well ? .
  2. why we adding it to ShortName like that ?

In reply to: 260893400 [](ancestors = 260893400)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yes
  2. for cmd back compatibility.

In reply to: 260894964 [](ancestors = 260894964,260893400)

@abgoswam
Copy link
Member Author

        public float L1Weight = Defaults.L1Weight;

fixed


In reply to: 467969539 [](ancestors = 467969539)


Refers to: src/Microsoft.ML.StandardLearners/Standard/LogisticRegression/LbfgsPredictorBase.cs:33 in 9ea2076. [](commit_id = 9ea2076, deletion_comment = False)

@abgoswam
Copy link
Member Author

        public float L2Weight = Defaults.L2Weight;

fixed


In reply to: 467966407 [](ancestors = 467966407)


Refers to: src/Microsoft.ML.StandardLearners/Standard/LogisticRegression/LbfgsPredictorBase.cs:28 in 9ea2076. [](commit_id = 9ea2076, deletion_comment = False)

@abgoswam abgoswam changed the title WIP : Scrubbing LogisticRegression learners Scrubbing LogisticRegression learners Feb 28, 2019
@abgoswam
Copy link
Member Author

    public static LogisticRegression LogisticRegression(this BinaryClassificationCatalog.BinaryClassificationTrainers catalog,

marking as pending . maybe outside purview of this PR

we should have separate PR for #2762 once names are finalized


In reply to: 467981004 [](ancestors = 467981004)


Refers to: src/Microsoft.ML.StandardLearners/StandardLearnersCatalog.cs:427 in 9ea2076. [](commit_id = 9ea2076, deletion_comment = False)

@codecov
Copy link

codecov bot commented Feb 28, 2019

Codecov Report

Merging #2761 into master will increase coverage by 0.03%.
The diff coverage is 84.26%.

@@            Coverage Diff             @@
##           master    #2761      +/-   ##
==========================================
+ Coverage   71.68%   71.72%   +0.03%     
==========================================
  Files         808      809       +1     
  Lines      142402   142491      +89     
  Branches    16113    16116       +3     
==========================================
+ Hits       102076   102195     +119     
+ Misses      35893    35860      -33     
- Partials     4433     4436       +3
Flag Coverage Δ
#Debug 71.72% <84.26%> (+0.03%) ⬆️
#production 67.95% <77.19%> (+0.02%) ⬆️
#test 85.9% <96.87%> (+0.04%) ⬆️
Impacted Files Coverage Δ
...L.Mkl.Components/ComputeLRTrainingStdThroughHal.cs 92.85% <ø> (ø) ⬆️
...rs/Standard/PoissonRegression/PoissonRegression.cs 88.57% <ø> (ø) ⬆️
src/Microsoft.ML.Data/Prediction/Calibrator.cs 72.67% <ø> (ø) ⬆️
...StandardLearners/Standard/LinearModelParameters.cs 60.63% <ø> (ø) ⬆️
src/Microsoft.ML.StaticPipe/LbfgsStatic.cs 53.15% <0%> (ø) ⬆️
....ML.Benchmarks/KMeansAndLogisticRegressionBench.cs 0% <0%> (ø) ⬆️
...soft.ML.Tests/PermutationFeatureImportanceTests.cs 100% <100%> (ø) ⬆️
...est/Microsoft.ML.StaticPipelineTesting/Training.cs 99.25% <100%> (ø) ⬆️
...est/Microsoft.ML.Predictor.Tests/TestPredictors.cs 63.84% <100%> (ø) ⬆️
test/Microsoft.ML.Tests/OnnxConversionTest.cs 97.16% <100%> (ø) ⬆️
... and 19 more

@Ivanidzo4ka Ivanidzo4ka requested review from zeahmed and wschin March 1, 2019 23:36
@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Mar 1, 2019

    protected virtual void GetFeatureContributions(in VBuffer<float> features, ref VBuffer<float> contributions, int top, int bottom, bool normalize)

should be private protected #Closed


Refers to: src/Microsoft.ML.StandardLearners/Standard/LinearModelParameters.cs:265 in 04a4f7a. [](commit_id = 04a4f7a, deletion_comment = False)

@Ivanidzo4ka
Copy link
Contributor

        public int MemorySize = Defaults.MemorySize;

Feels a bit too long.
IterationsToRemember, IterationsHistory, IterationsToKeep?


In reply to: 467985202 [](ancestors = 467985202,467984224,467967467)


Refers to: src/Microsoft.ML.StandardLearners/Standard/LogisticRegression/LbfgsPredictorBase.cs:45 in 9ea2076. [](commit_id = 9ea2076, deletion_comment = False)


/// <summary>
/// Run SGD to initialize LR weights, converging to this tolerance.
/// </summary>
[Argument(ArgumentType.AtMostOnce, HelpText = "Run SGD to initialize LR weights, converging to this tolerance",
ShortName = "sgd")]
public float SgdInitializationTolerance = 0;
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Mar 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SgdInitializationTolerance [](start = 25, length = 26)

StochasticGradientDescentInitilaizationTolerance ? #Resolved

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Mar 1, 2019

                    Console.Write(".");

Oh god, It still exist.
this need to be ported to progress channels. #Resolved


Refers to: src/Microsoft.ML.StandardLearners/Standard/LogisticRegression/LbfgsPredictorBase.cs:337 in 04a4f7a. [](commit_id = 04a4f7a, deletion_comment = False)

@Ivanidzo4ka
Copy link
Contributor

                    Console.Write(".");

Not part of your PR, so please ignore.


In reply to: 468851424 [](ancestors = 468851424)


Refers to: src/Microsoft.ML.StandardLearners/Standard/LogisticRegression/LbfgsPredictorBase.cs:337 in 04a4f7a. [](commit_id = 04a4f7a, deletion_comment = False)

MaxIterations = LbfgsTrainerOptions.MaxIterations;
SgdInitializationTolerance = LbfgsTrainerOptions.SgdInitializationTolerance;
Host.CheckUserArg(!LbfgsTrainerOptions.UseThreads || LbfgsTrainerOptions.NumberOfThreads > 0 || LbfgsTrainerOptions.NumberOfThreads == null,
nameof(LbfgsTrainerOptions.NumberOfThreads), "numThreads must be positive (or empty for default)");
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Mar 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numThreads [](start = 60, length = 10)

Must be positive, if provided.

/// If set to <value>true</value>training statistics will be generated at the end of training.
/// </summary>
[Argument(ArgumentType.AtMostOnce, HelpText = "Show statistics of training examples.", ShortName = "stat, ShowTrainingStats", SortOrder = 50)]
public bool ShowTrainingStatistics = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ShowTrainingStatistics [](start = 24, length = 22)

Can you double check it's used?
It strange binary LR and MultiLR has different options (binary LR has extra thing ComputeLogisticRegressionStandardDeviation)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do see it being used for both binary LR, and multiclass LR

if (ShowTrainingStats)
ProcessPriorDistribution(cursor.Label, cursor.Weight);

and here

if (ShowTrainingStats)
ComputeTrainingStatistics(ch, cursorFactory, loss, numParams);

Binary LR has the additional ComputeLogisticRegressionStandardDeviation as you noted

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@abgoswam abgoswam requested a review from artidoro March 4, 2019 18:12
@artidoro
Copy link
Contributor

artidoro commented Mar 4, 2019

public abstract class LbfgsTrainerBase<TOptions, TTransformer, TModel> : TrainerEstimatorBase<TTransformer, TModel>

Should we rname this without acronyms? #Resolved


Refers to: src/Microsoft.ML.StandardLearners/Standard/LogisticRegression/LbfgsPredictorBase.cs:18 in 7a7ac78. [](commit_id = 7a7ac78, deletion_comment = False)

@abgoswam
Copy link
Member Author

abgoswam commented Mar 4, 2019

public abstract class LbfgsTrainerBase<TOptions, TTransformer, TModel> : TrainerEstimatorBase<TTransformer, TModel>

There is a separate issue #2762 from trainer names..maybe we can address changes (if any) as part of #2762

the scrubbing PRs are focussed on the items mentioned in #2613


In reply to: 469360727 [](ancestors = 469360727)


Refers to: src/Microsoft.ML.StandardLearners/Standard/LogisticRegression/LbfgsPredictorBase.cs:18 in 7a7ac78. [](commit_id = 7a7ac78, deletion_comment = False)

@artidoro
Copy link
Contributor

artidoro commented Mar 4, 2019

public abstract class LbfgsTrainerBase<TOptions, TTransformer, TModel> : TrainerEstimatorBase<TTransformer, TModel>

I see "no short names" in the items mentioned in #2613. I thought it was part of the scrubbing tasks?


In reply to: 469362212 [](ancestors = 469362212,469360727)


Refers to: src/Microsoft.ML.StandardLearners/Standard/LogisticRegression/LbfgsPredictorBase.cs:18 in 7a7ac78. [](commit_id = 7a7ac78, deletion_comment = False)

[Argument(ArgumentType.AtMostOnce, HelpText = "Show statistics of training examples.", ShortName = "stat", SortOrder = 50)]
public bool ShowTrainingStats = false;
/// <summary>
/// If set to <value>true</value>training statistics will be generated at the end of training.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a space before training?

@abgoswam
Copy link
Member Author

abgoswam commented Mar 4, 2019

public abstract class LbfgsTrainerBase<TOptions, TTransformer, TModel> : TrainerEstimatorBase<TTransformer, TModel>

please note : "no short names" is for Options , not trainer names

(at least thats what i am following. keeping trainer names for #2762)


In reply to: 469364309 [](ancestors = 469364309,469362212,469360727)


Refers to: src/Microsoft.ML.StandardLearners/Standard/LogisticRegression/LbfgsPredictorBase.cs:18 in 7a7ac78. [](commit_id = 7a7ac78, deletion_comment = False)

@artidoro
Copy link
Contributor

artidoro commented Mar 4, 2019

public abstract class LbfgsTrainerBase<TOptions, TTransformer, TModel> : TrainerEstimatorBase<TTransformer, TModel>

OK I see!


In reply to: 469376815 [](ancestors = 469376815,469364309,469362212,469360727)


Refers to: src/Microsoft.ML.StandardLearners/Standard/LogisticRegression/LbfgsPredictorBase.cs:18 in 7a7ac78. [](commit_id = 7a7ac78, deletion_comment = False)

@artidoro
Copy link
Contributor

artidoro commented Mar 4, 2019

public abstract class LbfgsTrainerBase<TOptions, TTransformer, TModel> : TrainerEstimatorBase<TTransformer, TModel>

Sorry for this then


In reply to: 469377354 [](ancestors = 469377354,469376815,469364309,469362212,469360727)


Refers to: src/Microsoft.ML.StandardLearners/Standard/LogisticRegression/LbfgsPredictorBase.cs:18 in 7a7ac78. [](commit_id = 7a7ac78, deletion_comment = False)

Copy link
Contributor

@artidoro artidoro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@abgoswam
Copy link
Member Author

abgoswam commented Mar 4, 2019

public abstract class LbfgsTrainerBase<TOptions, TTransformer, TModel> : TrainerEstimatorBase<TTransformer, TModel>

no worries at all .. thanks for noticing actually :)


In reply to: 469377514 [](ancestors = 469377514,469377354,469376815,469364309,469362212,469360727)


Refers to: src/Microsoft.ML.StandardLearners/Standard/LogisticRegression/LbfgsPredictorBase.cs:18 in 7a7ac78. [](commit_id = 7a7ac78, deletion_comment = False)

@abgoswam abgoswam merged commit 24926ee into dotnet:master Mar 4, 2019
@abgoswam abgoswam deleted the abgoswam/scrub_LR branch March 20, 2019 20:13
@ghost ghost locked as resolved and limited conversation to collaborators Mar 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants