Skip to content

Scrubbing FieldAwareFactorizationMachine learner. #2730

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Feb 27, 2019

Conversation

zeahmed
Copy link
Contributor

@zeahmed zeahmed commented Feb 26, 2019

This PR partially addressed #2620

In FieldAwareFactorizationMachine related classes, the following tasks were performed.

  • Checking to make sure that unnecessary public methods/properties be internal.
  • Renaming parameters according to standard.
  • Creating/Refactoring samples according to standards.

@@ -78,14 +78,48 @@ public sealed class HousingRegression
/// <summary>
/// Downloads the wikipedia detox dataset from the ML.NET repo.
/// </summary>
public static string DownloadSentimentDataset()
=> Download("https://github.com/raw/dotnet/machinelearning/76cb2cdf5cc8b6c88ca44b8969153836e589df04/test/data/wikipedia-detox-250-line-data.tsv", "sentiment.tsv");
public static (string trainFile, string testFile) DownloadSentimentDataset()
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(string trainFile, string testFile) [](start = 22, length = 35)

#2501

Can we not use value tuples? They nice, but are they really necessary?
Imagine we will need to write samples to F#, and we want to reuse this class. Would be quite bad. #Resolved

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Feb 26, 2019

        public int Iters = 5;

Samples are good, but I would prefer you to focus on public surface. #Resolved


Refers to: src/Microsoft.ML.StandardLearners/FactorizationMachine/FactorizationMachineTrainer.cs:51 in 19d25e2. [](commit_id = 19d25e2, deletion_comment = False)

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Feb 26, 2019

        public int LatentDim = 20;

brrr #Resolved


Refers to: src/Microsoft.ML.StandardLearners/FactorizationMachine/FactorizationMachineTrainer.cs:55 in 19d25e2. [](commit_id = 19d25e2, deletion_comment = False)

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Feb 26, 2019

        public bool Norm = true;

brr #Resolved


Refers to: src/Microsoft.ML.StandardLearners/FactorizationMachine/FactorizationMachineTrainer.cs:66 in 19d25e2. [](commit_id = 19d25e2, deletion_comment = False)

@@ -156,12 +156,12 @@ internal FieldAwareFactorizationMachineTrainer(IHostEnvironment env, Options opt
/// <param name="env">The private instance of <see cref="IHostEnvironment"/>.</param>
/// <param name="featureColumns">The name of column hosting the features. The i-th element stores feature column of the i-th field.</param>
/// <param name="labelColumn">The name of the label column.</param>
/// <param name="weights">The name of the optional weights' column.</param>
/// <param name="weightColumn">The name of the optional weights' column.</param>
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// The name of the optional weights' column. [](start = 7, length = 81)

/// <param name="exampleWeightColumnName">The name of the example weight column (optional).</param> #Resolved

{
public static void Example()
{
// Creating the ML.Net IHostEnvironment object, needed for the pipeline.
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Creating the ML.Net IHostEnvironment object, needed for the pipeline. [](start = 12, length = 72)

           // Create a new context for ML.NET operations. It can be used for exception tracking and logging, 
            // as a catalog of available operations and as the source of randomness.

I think this is our default comment. #Resolved

// Let's inspect the model parameters.
var featureCount = modelParams.GetFeatureCount();
var fieldCount = modelParams.GetFieldCount();
var latentDim = modelParams.GetLatentDim();
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetLatentDim [](start = 40, length = 12)

brrr #Resolved

var featureCount = modelParams.GetFeatureCount();
var fieldCount = modelParams.GetFieldCount();
var latentDim = modelParams.GetLatentDim();
var linearWeights = modelParams.GetLinearWeights();
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var linearWeights = modelParams.GetLinearWeights(); [](start = 12, length = 51)

I think right now I can do following:
Evaluate(model.Transform(Data)) -> AUC = X1

var linearWeights = modelParams.GetLinearWeights();
linearWeights[0] =100;
linearWeights[1] =200;

Evaluate(model.Transform(Data)) -> AUC = X2

X1 != X2

Which is awful.

Can you check MatrixFactoraztionPredictor and how it handles arrays?

I also don't understand GetFeatureCoun() functions, why can't I just do modelParams.FeatureCount?
#Resolved

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Feb 26, 2019

        public float LearningRate = (float)0.1;

you can at least copy description from HelpText to summary for this fields. #Resolved


Refers to: src/Microsoft.ML.StandardLearners/FactorizationMachine/FactorizationMachineTrainer.cs:47 in 19d25e2. [](commit_id = 19d25e2, deletion_comment = False)

// the "Features" column as the features column.
var pipeline = new EstimatorChain<ITransformer>().AppendCacheCheckpoint(mlContext)
.Append(mlContext.BinaryClassification.Trainers.
FieldAwareFactorizationMachine(labelColumnName: "Sentiment", featureColumnNames: new[] { "Features" }));
Copy link
Member

@sfilipi sfilipi Feb 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation #Resolved

// the "Features" column as the features column.
var pipeline = new EstimatorChain<ITransformer>().AppendCacheCheckpoint(mlContext)
.Append(mlContext.BinaryClassification.Trainers.
FieldAwareFactorizationMachine(
Copy link
Member

@sfilipi sfilipi Feb 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more #Resolved

@@ -26,7 +26,7 @@ namespace Microsoft.ML.FactorizationMachine
{
/*
Train a field-aware factorization machine using ADAGRAD (an advanced stochastic gradient method). See references below
for details. This trainer is essentially faster the one introduced in [2] because of some implementation tricks[3].
for details. This trainer is essentially faster than the one introduced in [2] because of some implementation tricks[3].
Copy link
Member

@wschin wschin Feb 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for details. This trainer is essentially faster than the one introduced in [2] because of some implementation tricks[3].
for details. This trainer is essentially faster than the one introduced in [2] because of some implementation tricks in [3].
``` #Resolved

@@ -156,12 +156,12 @@ internal FieldAwareFactorizationMachineTrainer(IHostEnvironment env, Options opt
/// <param name="env">The private instance of <see cref="IHostEnvironment"/>.</param>
/// <param name="featureColumns">The name of column hosting the features. The i-th element stores feature column of the i-th field.</param>
Copy link
Member

@wschin wschin Feb 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// <param name="featureColumns">The name of column hosting the features. The i-th element stores feature column of the i-th field.</param>
/// <param name="featureColumnNames">The names of columns hosting the features. The i-th element stores feature column of the i-th field.</param>
``` #Resolved

@@ -156,12 +156,12 @@ internal FieldAwareFactorizationMachineTrainer(IHostEnvironment env, Options opt
/// <param name="env">The private instance of <see cref="IHostEnvironment"/>.</param>
/// <param name="featureColumns">The name of column hosting the features. The i-th element stores feature column of the i-th field.</param>
/// <param name="labelColumn">The name of the label column.</param>
Copy link
Member

@wschin wschin Feb 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// <param name="labelColumn">The name of the label column.</param>
/// <param name="labelColumnName">The name of the label column.</param>
``` #Resolved

@codecov
Copy link

codecov bot commented Feb 26, 2019

Codecov Report

Merging #2730 into master will decrease coverage by 0.02%.
The diff coverage is 51.42%.

@@            Coverage Diff             @@
##           master    #2730      +/-   ##
==========================================
- Coverage   71.67%   71.65%   -0.03%     
==========================================
  Files         808      808              
  Lines      142364   142382      +18     
  Branches    16121    16121              
==========================================
- Hits       102043   102023      -20     
- Misses      35879    35921      +42     
+ Partials     4442     4438       -4
Flag Coverage Δ
#Debug 71.65% <51.42%> (-0.03%) ⬇️
#production 67.9% <48.48%> (-0.03%) ⬇️
#test 85.83% <100%> (ø) ⬆️
Impacted Files Coverage Δ
...c/Microsoft.ML.SamplesUtils/SamplesDatasetUtils.cs 23.9% <0%> (-1.3%) ⬇️
...rosoft.ML.Tests/TrainerEstimators/FAFMEstimator.cs 100% <100%> (ø) ⬆️
...actorizationMachine/FactorizationMachineCatalog.cs 50% <50%> (ø) ⬆️
...rosoft.ML.StaticPipe/FactorizationMachineStatic.cs 71.11% <50%> (ø) ⬆️
...e/FieldAwareFactorizationMachineModelParameters.cs 75.59% <63.15%> (+0.88%) ⬆️
...actorizationMachine/FactorizationMachineTrainer.cs 88.18% <90%> (ø) ⬆️
src/Microsoft.ML.Core/Data/ProgressReporter.cs 70.95% <0%> (-6.99%) ⬇️
src/Microsoft.ML.Maml/MAML.cs 24.75% <0%> (-1.46%) ⬇️
...soft.ML.Data/DataLoadSave/Text/TextLoaderCursor.cs 82.86% <0%> (+0.2%) ⬆️
...StandardLearners/Standard/LinearModelParameters.cs 60.9% <0%> (+0.26%) ⬆️

Console.WriteLine("The number of fields is: " + fieldCount);
Console.WriteLine("The latent dimension is: " + latentDim);
Console.WriteLine("The linear weights of some of the features are: " +
string.Concat(Enumerable.Range(1, 10).Select(i => $"{linearWeights[i]:F4} ")));
Copy link
Member

@wschin wschin Feb 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string.Concats are not aligned. #Resolved

Copy link
Member

@wschin wschin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

[TlcModule.SweepableLongParam(1, 100)]
public int Iters = 5;
public int Iterations = 5;
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NumberOfIterations for consistency with other learners. #Resolved

/// <summary>
/// Number of training iterations.
/// </summary>
[Argument(ArgumentType.AtMostOnce, HelpText = "Number of training iterations", ShortName = "iters", SortOrder = 2)]
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iters [](start = 104, length = 5)

iters,iter #Resolved

/// <param name="weights">The name of the optional weights' column.</param>
/// <param name="featureColumnNames">The name of column hosting the features. The i-th element stores feature column of the i-th field.</param>
/// <param name="labelColumnName">The name of the label column.</param>
/// <param name="weightColumnName">The name of the weight column (optional).</param>
[BestFriend]
internal FieldAwareFactorizationMachineTrainer(IHostEnvironment env,
Copy link
Member

@wschin wschin Feb 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
internal FieldAwareFactorizationMachineTrainer(IHostEnvironment env,
internal FieldAwareFactorizationMachineBinaryClassificationTrainer(IHostEnvironment env,
``` #Resolved

/// <summary>
/// The linear coefficients of the features. It's the symbol `w` in the doc: https://github.com/wschin/fast-ffm/blob/master/fast-ffm.pdf
/// </summary>
public float[] GetLinearWeights() => _linearWeights;
public float[] GetLinearWeights()
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Feb 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

float[] [](start = 15, length = 7)

I would reach Tom and double check, but I think we don't want expose arrays. We want expose immutable collections, either ImmutableArray or IReadOnlyList not sure which we prefer.
If you expose array, that
a) force you to copy data.
b) give user idea what he can change it. (which is not true)
#Resolved

Copy link
Member

@wschin wschin Feb 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In tree's public interface, we expose IReadOnlyList for all array attributes. #Resolved

Copy link
Contributor Author

@zeahmed zeahmed Feb 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about ReadOnlyCollection? Any idea @TomFinley? what we are following?


In reply to: 260523116 [](ancestors = 260523116)

Copy link
Contributor Author

@zeahmed zeahmed Feb 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I follow what @wschin suggest. IReadOnlyList #Resolved

@@ -253,11 +253,9 @@ internal void CopyLatentWeightsTo(AlignedArray latentWeights)
/// <summary>
/// The linear coefficients of the features. It's the symbol `w` in the doc: https://github.com/wschin/fast-ffm/blob/master/fast-ffm.pdf
/// </summary>
public float[] GetLinearWeights()
public IReadOnlyList<float> GetLinearWeights()
Copy link
Member

@wschin wschin Feb 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One line function should look like public IReadOnlyList<float> Get() => _linearWeights. #Resolved

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@zeahmed zeahmed merged commit 482bb81 into dotnet:master Feb 27, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants