-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Usage of Matrix Factorization Trainer for Recommendation #1806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
var trainer = mlcontext.Recommendation().Trainers.MatrixFactorization
("userIdEncoded", "movieIdEncoded", "rating")); is equivalent to var trainer = mlcontext.Recommendation().Trainers.MatrixFactorization
(matrixColumnIndexColumnName: "userIdEncoded", matrixRowIndexColumnName: "movieIdEncoded", labelColumn: "rating")); according to its signature /// <summary>
/// Initializing a new instance of <see cref="MatrixFactorizationTrainer"/>.
/// </summary>
/// <param name="env">The private instance of <see cref="IHostEnvironment"/>.</param>
/// <param name="matrixColumnIndexColumnName">The name of the column hosting the matrix's column IDs.</param>
/// <param name="matrixRowIndexColumnName">The name of the column hosting the matrix's row IDs.</param>
/// <param name="labelColumn">The name of the label column.</param>
/// <param name="advancedSettings">A delegate to apply all the advanced arguments to the algorithm.</param>
public MatrixFactorizationTrainer(IHostEnvironment env,
string matrixColumnIndexColumnName,
string matrixRowIndexColumnName,
string labelColumn = DefaultColumnNames.Label,
Action<Arguments> advancedSettings = null)
: base(env, LoadNameValue) Instead of a single feature
the 4-by-3 rating matrix being factorized may be
where |
Two questions still: Why are these different: vs and then the order of parameters again for consistency? other trainers seem to take Label as the first parameter for input. so this instead. var trainer = mlcontext.Recommendation().Trainers.MatrixFactorization |
@assafi, for your second question, matrix factorization is a special problem. It's not standard regression/classification which maps a feature vector to a label, so it looks not very bad to have label as the last argument. Of course, you can submit a PR if this doesn't look good enough for you. It should be a minor change. |
Hi @asthana86, For the question about the differences in the mlContext api calls, please see issue #1770 as this addresses your question. It is because Recommendation is not part of the core nuget package and therefore is defined as an extension rather than a property of MLContext. For the ordering of label columns in the matrix factorization, we can create an issue on this or use this as an the issue for tracking. |
I am not sure if #1770 addresses my issue. #1770 is about discoverability, the fix we came up for that was one needs to acquire the MatrixFactorization Nuget for now. I do agree with @GalOshri its not the best experience but even with that nuget acquisition the usage pattern should remain the same. Given the need to acquire an additional NuGet is there no way to have the API be consistent like the one that follows? It just looks a bit odd from a user perspective. mlcontext.Recommendation.Trainers.MatrixFactorization instead of mlcontext.Recommendation().Trainers.MatrixFactorization In terms of the other issue, it might be worth it creating a list of all learners on order of input parameters /output parameters and follow it across. I have been working on exporting these samples to 0.8 and after playing with Regression, Classification the recommendation ML task is a bit less consistent to other MLTasks. |
I agree with you @asthana86 it is odd from the user's perspective and I think we could do better here - The current implementation has the trainers declared as properties on MLContext. This current implementation does not work when the code lives in a different nuget package (i.e. we can't add a RecommendationContext property on MLContext since we aren't guaranteed the nuget is installed). I would like to see if there are other ways where this could be more flexible based on what nugets are installed by the user. It would need some investigation - but your main point is that the api needs to be consistent, right? Also we are in talks with the .net core team regarding API changes. I think this is something worth discussing with them. @TomFinley, @Zruty0, @eerhardt I want to separate the issues to get more specific, here are the issues as I understand: I also like the idea of having consistent parameter usage and confirming if this is consistent across the board. Looking through our existing issues, there is a number of issues titled "Final Public API *" for learners and transforms. For example there is #1703. While they do mention public constructor, they do not mention consistent parameter usage so I added a note to the issue. |
Sounds great. Thanks for capturing this. |
Thanks @asthana86. For reference, I filed the two issues I will keep this issue open for a few days in case there is any other things to discuss. |
Closing as we have the other two issues filed. |
When using Matrix Factorization Trainer:
When using other trainers:
Is the difference in usage prop vs. method by design? Also there is a difference in the order of parameters being passed. First parameter is Label vs. Features being used.
The text was updated successfully, but these errors were encountered: