Skip to content

Add a sample for one class matrix factorization #3282

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 12, 2019

Conversation

wschin
Copy link
Member

@wschin wschin commented Apr 10, 2019

Fix #1769.

@wschin wschin self-assigned this Apr 10, 2019
Add missing file
@codecov
Copy link

codecov bot commented Apr 10, 2019

Codecov Report

Merging #3282 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #3282      +/-   ##
==========================================
+ Coverage   72.63%   72.64%   +<.01%     
==========================================
  Files         807      807              
  Lines      145129   145192      +63     
  Branches    16220    16224       +4     
==========================================
+ Hits       105413   105472      +59     
- Misses      35298    35301       +3     
- Partials     4418     4419       +1
Flag Coverage Δ
#Debug 72.64% <100%> (ø) ⬆️
#production 68.16% <ø> (-0.01%) ⬇️
#test 88.97% <100%> (+0.03%) ⬆️
Impacted Files Coverage Δ
src/Microsoft.ML.Recommender/RecommenderCatalog.cs 70.83% <ø> (ø) ⬆️
...ests/TrainerEstimators/MatrixFactorizationTests.cs 97.84% <100%> (+0.43%) ⬆️
...c/Microsoft.ML.FastTree/Utils/ThreadTaskManager.cs 79.48% <0%> (-20.52%) ⬇️
src/Microsoft.ML.Maml/MAML.cs 24.75% <0%> (-1.46%) ⬇️
src/Microsoft.ML.DataView/DataViewType.cs 86.82% <0%> (ø) ⬆️
src/Microsoft.ML.DataView/VectorType.cs 89.41% <0%> (ø) ⬆️
...soft.ML.TestFramework/DataPipe/TestDataPipeBase.cs 74.03% <0%> (+0.33%) ⬆️
src/Microsoft.ML.Transforms/Text/LdaTransform.cs 89.89% <0%> (+0.62%) ⬆️

// Two columns with highest predicted score to the 2nd row (indexed by 1). If we view row index as user ID and column as game ID,
// the following list contains the games recommended by the trained model. Note that sometime, you may want to exclude training
// data from your predicted results because those games were already purchased.
var topColumns = results.Where(element => element.MatrixRowIndex == 1).OrderByDescending(element => element.Score).Take(2);
Copy link
Member

@codemzs codemzs Apr 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var topColumns = results.Where(element => element.MatrixRowIndex == 1).OrderByDescending(element => element.Score).Take(2); [](start = 12, length = 123)

Can we print the output for these and put in comments? #Resolved

@@ -87,6 +87,7 @@ internal RecommendationTrainers(RecommendationCatalog catalog)
/// <format type="text/markdown">
/// <![CDATA[
/// [!code-csharp[MatrixFactorization](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/Recommendation/MatrixFactorizationWithOptions.cs)]
/// [!code-csharp[MatrixFactorization](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/Recommendation/OneClassMatrixFactorizationWithOptions.cs)]
Copy link
Member

@codemzs codemzs Apr 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to check this won't generate too much content for the user. I had 4 links for time series but after speaking with @natke I reduced to one. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matrix factorization works extremely differently with different loss functions. We must have two samples.


In reply to: 274225365 [](ancestors = 274225365)

Copy link
Member

@codemzs codemzs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

public static class OneClassMatrixFactorizationWithOptions
{
// This example shows the use of ML.NET's one-class matrix factorization module which implements
// Algorithm 1 in a <a href="https://www.csie.ntu.edu.tw/~cjlin/papers/one-class-mf/biased-mf-sdm-with-supp.pdf">paper</a>.
Copy link
Contributor

@rogancarr rogancarr Apr 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a [](start = 26, length = 1)

"Algorithm 1 in a paper" isn't very meaningful. Can you give a longer description of what it is? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we tell user it's a coordinate descent method.


In reply to: 274574781 [](ancestors = 274574781)

var mlContext = new MLContext(seed: 0);

// Get a small in-memory dataset.
GetOneClassMatrix(out List<MatrixElement> data, out List<MatrixElement> testData);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List [](start = 34, length = 4)

nit: I would use a less-specific signature than List.

// Convert the in-memory matrix into an IDataView so that ML.NET components can consume it.
var dataView = mlContext.Data.LoadFromEnumerable(data);

// Create a matrix factorization trainer which may consume "Value" as the training label, "MatrixColumnIndex" as the
Copy link
Contributor

@rogancarr rogancarr Apr 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may consume [](start = 59, length = 11)

takes #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.


In reply to: 274575597 [](ancestors = 274575597)


// Create a matrix factorization trainer which may consume "Value" as the training label, "MatrixColumnIndex" as the
// matrix's column index, and "MatrixRowIndex" as the matrix's row index. Here nameof(...) is used to extract field
// names' in MatrixElement class.
Copy link
Contributor

@rogancarr rogancarr Apr 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Necessary? #WontFix

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to be more explicit. :)


In reply to: 274575710 [](ancestors = 274575710)

NumberOfThreads = 8,
ApproximationRank = 32,
Alpha = 1,
// The desired of unobserved values.
Copy link
Contributor

@rogancarr rogancarr Apr 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// The desired of unobserved values [](start = 16, length = 35)

Unclear what this means. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New description

                // The desired values of matrix elements not specified in the training set.
                // If the training set doesn't tell the value at the u-th row and v-th column,
                // its desired value would be set 0.15. In other words, this parameter determines
                // the value of all missing matrix elements.

In reply to: 274575901 [](ancestors = 274575901)

Alpha = 1,
// The desired of unobserved values.
C = 0.15,
// To enable one-class matrix factorization, the following line is required.
Copy link
Contributor

@rogancarr rogancarr Apr 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To enable one-class matrix factorization, the following line is required. [](start = 18, length = 74)

Suggested Rephrase: This argument enables one-class matrix factorization. #Resolved

var results = mlContext.Data.CreateEnumerable<MatrixElement>(prediction, false).ToList();
// Feed the test data into the model and then iterate through a few predictions.
foreach (var pred in results.Take(15))
Console.WriteLine($"Predicted value at row {pred.MatrixRowIndex - 1} and column {pred.MatrixColumnIndex - 1} is {pred.Score} and its expected value is {pred.Value}.");
Copy link
Contributor

@rogancarr rogancarr Apr 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Console.Write [](start = 16, length = 13)

Break this line; I would suggest for such long lines using old-style {0}...{1} string formatting so you can throw the arguments onto new lines. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is long but makes the code readable like a text message.


In reply to: 274576483 [](ancestors = 274576483)

// Predicted value at row 13 and column 0 is 0.1499254 and its expected value is 0.15.
// Predicted value at row 14 and column 0 is 0.1499074 and its expected value is 0.15.
//
// Note: use the advanced options constructor to set the number of threads to 1 for a deterministic behavior.
Copy link
Contributor

@rogancarr rogancarr Apr 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Note: [](start = 12, length = 9)

Nice touch. #Resolved

//
// Note: use the advanced options constructor to set the number of threads to 1 for a deterministic behavior.

// Two columns with highest predicted score to the 2nd row (indexed by 1). If we view row index as user ID and column as game ID,
Copy link
Contributor

@rogancarr rogancarr Apr 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two columns with highest predicted score to the 2nd row (indexed by 1). [](start = 15, length = 71)

This sentence doesn't have any context. #Resolved

//
// Note: use the advanced options constructor to set the number of threads to 1 for a deterministic behavior.

// Two columns with highest predicted score to the 2nd row (indexed by 1). If we view row index as user ID and column as game ID,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we [](start = 90, length = 2)

If we wanted a model to recommend video games to a user, we could view...


// Two columns with highest predicted score to the 2nd row (indexed by 1). If we view row index as user ID and column as game ID,
// the following list contains the games recommended by the trained model. Note that sometime, you may want to exclude training
// data from your predicted results because those games were already purchased.
Copy link
Contributor

@rogancarr rogancarr Apr 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those [](start = 56, length = 5)

would represent games that were #Resolved

Copy link
Contributor

@rogancarr rogancarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just a few nits.

@wschin wschin merged commit 89a1fb9 into dotnet:master Apr 12, 2019
@wschin wschin deleted the ocmf-sample branch April 12, 2019 00:01
@ghost ghost locked as resolved and limited conversation to collaborators Mar 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Need example for one-class matrix factorization
3 participants