-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Add a sample for one class matrix factorization #3282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add missing file
Codecov Report
@@ Coverage Diff @@
## master #3282 +/- ##
==========================================
+ Coverage 72.63% 72.64% +<.01%
==========================================
Files 807 807
Lines 145129 145192 +63
Branches 16220 16224 +4
==========================================
+ Hits 105413 105472 +59
- Misses 35298 35301 +3
- Partials 4418 4419 +1
|
// Two columns with highest predicted score to the 2nd row (indexed by 1). If we view row index as user ID and column as game ID, | ||
// the following list contains the games recommended by the trained model. Note that sometime, you may want to exclude training | ||
// data from your predicted results because those games were already purchased. | ||
var topColumns = results.Where(element => element.MatrixRowIndex == 1).OrderByDescending(element => element.Score).Take(2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var topColumns = results.Where(element => element.MatrixRowIndex == 1).OrderByDescending(element => element.Score).Take(2); [](start = 12, length = 123)
Can we print the output for these and put in comments? #Resolved
@@ -87,6 +87,7 @@ internal RecommendationTrainers(RecommendationCatalog catalog) | |||
/// <format type="text/markdown"> | |||
/// <] | |||
/// [!code-csharp[MatrixFactorization](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/Recommendation/OneClassMatrixFactorizationWithOptions.cs)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to check this won't generate too much content for the user. I had 4 links for time series but after speaking with @natke I reduced to one. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Matrix factorization works extremely differently with different loss functions. We must have two samples.
In reply to: 274225365 [](ancestors = 274225365)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public static class OneClassMatrixFactorizationWithOptions | ||
{ | ||
// This example shows the use of ML.NET's one-class matrix factorization module which implements | ||
// Algorithm 1 in a <a href="https://www.csie.ntu.edu.tw/~cjlin/papers/one-class-mf/biased-mf-sdm-with-supp.pdf">paper</a>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a [](start = 26, length = 1)
"Algorithm 1 in a paper" isn't very meaningful. Can you give a longer description of what it is? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var mlContext = new MLContext(seed: 0); | ||
|
||
// Get a small in-memory dataset. | ||
GetOneClassMatrix(out List<MatrixElement> data, out List<MatrixElement> testData); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
List [](start = 34, length = 4)
nit: I would use a less-specific signature than List
.
// Convert the in-memory matrix into an IDataView so that ML.NET components can consume it. | ||
var dataView = mlContext.Data.LoadFromEnumerable(data); | ||
|
||
// Create a matrix factorization trainer which may consume "Value" as the training label, "MatrixColumnIndex" as the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may consume [](start = 59, length = 11)
takes #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
// Create a matrix factorization trainer which may consume "Value" as the training label, "MatrixColumnIndex" as the | ||
// matrix's column index, and "MatrixRowIndex" as the matrix's row index. Here nameof(...) is used to extract field | ||
// names' in MatrixElement class. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Necessary? #WontFix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NumberOfThreads = 8, | ||
ApproximationRank = 32, | ||
Alpha = 1, | ||
// The desired of unobserved values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// The desired of unobserved values [](start = 16, length = 35)
Unclear what this means. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New description
// The desired values of matrix elements not specified in the training set.
// If the training set doesn't tell the value at the u-th row and v-th column,
// its desired value would be set 0.15. In other words, this parameter determines
// the value of all missing matrix elements.
In reply to: 274575901 [](ancestors = 274575901)
Alpha = 1, | ||
// The desired of unobserved values. | ||
C = 0.15, | ||
// To enable one-class matrix factorization, the following line is required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To enable one-class matrix factorization, the following line is required. [](start = 18, length = 74)
Suggested Rephrase: This argument enables one-class matrix factorization. #Resolved
var results = mlContext.Data.CreateEnumerable<MatrixElement>(prediction, false).ToList(); | ||
// Feed the test data into the model and then iterate through a few predictions. | ||
foreach (var pred in results.Take(15)) | ||
Console.WriteLine($"Predicted value at row {pred.MatrixRowIndex - 1} and column {pred.MatrixColumnIndex - 1} is {pred.Score} and its expected value is {pred.Value}."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Console.Write [](start = 16, length = 13)
Break this line; I would suggest for such long lines using old-style {0}...{1}
string formatting so you can throw the arguments onto new lines. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is long but makes the code readable like a text message.
In reply to: 274576483 [](ancestors = 274576483)
// Predicted value at row 13 and column 0 is 0.1499254 and its expected value is 0.15. | ||
// Predicted value at row 14 and column 0 is 0.1499074 and its expected value is 0.15. | ||
// | ||
// Note: use the advanced options constructor to set the number of threads to 1 for a deterministic behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Note: [](start = 12, length = 9)
Nice touch. #Resolved
// | ||
// Note: use the advanced options constructor to set the number of threads to 1 for a deterministic behavior. | ||
|
||
// Two columns with highest predicted score to the 2nd row (indexed by 1). If we view row index as user ID and column as game ID, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two columns with highest predicted score to the 2nd row (indexed by 1). [](start = 15, length = 71)
This sentence doesn't have any context. #Resolved
// | ||
// Note: use the advanced options constructor to set the number of threads to 1 for a deterministic behavior. | ||
|
||
// Two columns with highest predicted score to the 2nd row (indexed by 1). If we view row index as user ID and column as game ID, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we [](start = 90, length = 2)
If we wanted a model to recommend video games to a user, we could view...
|
||
// Two columns with highest predicted score to the 2nd row (indexed by 1). If we view row index as user ID and column as game ID, | ||
// the following list contains the games recommended by the trained model. Note that sometime, you may want to exclude training | ||
// data from your predicted results because those games were already purchased. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
those [](start = 56, length = 5)
would represent games that were #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just a few nits.
Fix #1769.