-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Scrubbing FieldAwareFactorizationMachine learner. #2730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -78,14 +78,48 @@ public sealed class HousingRegression | |||
/// <summary> | |||
/// Downloads the wikipedia detox dataset from the ML.NET repo. | |||
/// </summary> | |||
public static string DownloadSentimentDataset() | |||
=> Download("https://github.com/raw/dotnet/machinelearning/76cb2cdf5cc8b6c88ca44b8969153836e589df04/test/data/wikipedia-detox-250-line-data.tsv", "sentiment.tsv"); | |||
public static (string trainFile, string testFile) DownloadSentimentDataset() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(string trainFile, string testFile) [](start = 22, length = 35)
Can we not use value tuples? They nice, but are they really necessary?
Imagine we will need to write samples to F#, and we want to reuse this class. Would be quite bad. #Resolved
@@ -156,12 +156,12 @@ internal FieldAwareFactorizationMachineTrainer(IHostEnvironment env, Options opt | |||
/// <param name="env">The private instance of <see cref="IHostEnvironment"/>.</param> | |||
/// <param name="featureColumns">The name of column hosting the features. The i-th element stores feature column of the i-th field.</param> | |||
/// <param name="labelColumn">The name of the label column.</param> | |||
/// <param name="weights">The name of the optional weights' column.</param> | |||
/// <param name="weightColumn">The name of the optional weights' column.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// The name of the optional weights' column. [](start = 7, length = 81)
/// <param name="exampleWeightColumnName">The name of the example weight column (optional).</param>
#Resolved
{ | ||
public static void Example() | ||
{ | ||
// Creating the ML.Net IHostEnvironment object, needed for the pipeline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Creating the ML.Net IHostEnvironment object, needed for the pipeline. [](start = 12, length = 72)
// Create a new context for ML.NET operations. It can be used for exception tracking and logging,
// as a catalog of available operations and as the source of randomness.
I think this is our default comment. #Resolved
// Let's inspect the model parameters. | ||
var featureCount = modelParams.GetFeatureCount(); | ||
var fieldCount = modelParams.GetFieldCount(); | ||
var latentDim = modelParams.GetLatentDim(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetLatentDim [](start = 40, length = 12)
brrr #Resolved
var featureCount = modelParams.GetFeatureCount(); | ||
var fieldCount = modelParams.GetFieldCount(); | ||
var latentDim = modelParams.GetLatentDim(); | ||
var linearWeights = modelParams.GetLinearWeights(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var linearWeights = modelParams.GetLinearWeights(); [](start = 12, length = 51)
I think right now I can do following:
Evaluate(model.Transform(Data)) -> AUC = X1
var linearWeights = modelParams.GetLinearWeights();
linearWeights[0] =100;
linearWeights[1] =200;
Evaluate(model.Transform(Data)) -> AUC = X2
X1 != X2
Which is awful.
Can you check MatrixFactoraztionPredictor
and how it handles arrays?
I also don't understand GetFeatureCoun()
functions, why can't I just do modelParams.FeatureCount
?
#Resolved
// the "Features" column as the features column. | ||
var pipeline = new EstimatorChain<ITransformer>().AppendCacheCheckpoint(mlContext) | ||
.Append(mlContext.BinaryClassification.Trainers. | ||
FieldAwareFactorizationMachine(labelColumnName: "Sentiment", featureColumnNames: new[] { "Features" })); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation #Resolved
// the "Features" column as the features column. | ||
var pipeline = new EstimatorChain<ITransformer>().AppendCacheCheckpoint(mlContext) | ||
.Append(mlContext.BinaryClassification.Trainers. | ||
FieldAwareFactorizationMachine( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one more #Resolved
@@ -26,7 +26,7 @@ namespace Microsoft.ML.FactorizationMachine | |||
{ | |||
/* | |||
Train a field-aware factorization machine using ADAGRAD (an advanced stochastic gradient method). See references below | |||
for details. This trainer is essentially faster the one introduced in [2] because of some implementation tricks[3]. | |||
for details. This trainer is essentially faster than the one introduced in [2] because of some implementation tricks[3]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for details. This trainer is essentially faster than the one introduced in [2] because of some implementation tricks[3]. | |
for details. This trainer is essentially faster than the one introduced in [2] because of some implementation tricks in [3]. | |
``` #Resolved |
@@ -156,12 +156,12 @@ internal FieldAwareFactorizationMachineTrainer(IHostEnvironment env, Options opt | |||
/// <param name="env">The private instance of <see cref="IHostEnvironment"/>.</param> | |||
/// <param name="featureColumns">The name of column hosting the features. The i-th element stores feature column of the i-th field.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// <param name="featureColumns">The name of column hosting the features. The i-th element stores feature column of the i-th field.</param> | |
/// <param name="featureColumnNames">The names of columns hosting the features. The i-th element stores feature column of the i-th field.</param> | |
``` #Resolved |
@@ -156,12 +156,12 @@ internal FieldAwareFactorizationMachineTrainer(IHostEnvironment env, Options opt | |||
/// <param name="env">The private instance of <see cref="IHostEnvironment"/>.</param> | |||
/// <param name="featureColumns">The name of column hosting the features. The i-th element stores feature column of the i-th field.</param> | |||
/// <param name="labelColumn">The name of the label column.</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// <param name="labelColumn">The name of the label column.</param> | |
/// <param name="labelColumnName">The name of the label column.</param> | |
``` #Resolved |
Codecov Report
@@ Coverage Diff @@
## master #2730 +/- ##
==========================================
- Coverage 71.67% 71.65% -0.03%
==========================================
Files 808 808
Lines 142364 142382 +18
Branches 16121 16121
==========================================
- Hits 102043 102023 -20
- Misses 35879 35921 +42
+ Partials 4442 4438 -4
|
Console.WriteLine("The number of fields is: " + fieldCount); | ||
Console.WriteLine("The latent dimension is: " + latentDim); | ||
Console.WriteLine("The linear weights of some of the features are: " + | ||
string.Concat(Enumerable.Range(1, 10).Select(i => $"{linearWeights[i]:F4} "))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string.Concat
s are not aligned. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[TlcModule.SweepableLongParam(1, 100)] | ||
public int Iters = 5; | ||
public int Iterations = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NumberOfIterations
for consistency with other learners. #Resolved
/// <summary> | ||
/// Number of training iterations. | ||
/// </summary> | ||
[Argument(ArgumentType.AtMostOnce, HelpText = "Number of training iterations", ShortName = "iters", SortOrder = 2)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iters [](start = 104, length = 5)
iters,iter #Resolved
/// <param name="weights">The name of the optional weights' column.</param> | ||
/// <param name="featureColumnNames">The name of column hosting the features. The i-th element stores feature column of the i-th field.</param> | ||
/// <param name="labelColumnName">The name of the label column.</param> | ||
/// <param name="weightColumnName">The name of the weight column (optional).</param> | ||
[BestFriend] | ||
internal FieldAwareFactorizationMachineTrainer(IHostEnvironment env, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internal FieldAwareFactorizationMachineTrainer(IHostEnvironment env, | |
internal FieldAwareFactorizationMachineBinaryClassificationTrainer(IHostEnvironment env, | |
``` #Resolved |
/// <summary> | ||
/// The linear coefficients of the features. It's the symbol `w` in the doc: https://github.com/wschin/fast-ffm/blob/master/fast-ffm.pdf | ||
/// </summary> | ||
public float[] GetLinearWeights() => _linearWeights; | ||
public float[] GetLinearWeights() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
float[] [](start = 15, length = 7)
I would reach Tom and double check, but I think we don't want expose arrays. We want expose immutable collections, either ImmutableArray
or IReadOnlyList
not sure which we prefer.
If you expose array, that
a) force you to copy data.
b) give user idea what he can change it. (which is not true)
#Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In tree's public interface, we expose IReadOnlyList
for all array attributes. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about ReadOnlyCollection
? Any idea @TomFinley? what we are following?
In reply to: 260523116 [](ancestors = 260523116)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I follow what @wschin suggest. IReadOnlyList
#Resolved
@@ -253,11 +253,9 @@ internal void CopyLatentWeightsTo(AlignedArray latentWeights) | |||
/// <summary> | |||
/// The linear coefficients of the features. It's the symbol `w` in the doc: https://github.com/wschin/fast-ffm/blob/master/fast-ffm.pdf | |||
/// </summary> | |||
public float[] GetLinearWeights() | |||
public IReadOnlyList<float> GetLinearWeights() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One line function should look like public IReadOnlyList<float> Get() => _linearWeights
. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR partially addressed #2620
In FieldAwareFactorizationMachine related classes, the following tasks were performed.
internal
.