Closed
Description
When using Matrix Factorization Trainer:
var trainer = mlcontext.Recommendation().Trainers.MatrixFactorization
("userIdEncoded", "movieIdEncoded", "rating"));
When using other trainers:
var trainer = mlContext.Regression.Trainers.StochasticDualCoordinateAscent
(label: "Label", features: "Features");
Is the difference in usage prop vs. method by design? Also there is a difference in the order of parameters being passed. First parameter is Label vs. Features being used.
Metadata
Metadata
Assignees
Type
Projects
Milestone
Relationships
Development
No branches or pull requests
Activity
wschin commentedon Dec 3, 2018
is equivalent to
according to its signature
Instead of a single feature
Column
and a labelColumn
, matrix factorization requires row indexColumn
, column indexColumn
, and labelColumn
. For example, assume thatthe 4-by-3 rating matrix being factorized may be
where
?
happens at rowu
and columnv
means useru
never ratesv
in your training data. Note that I assume those IDs are 0-based indexes.asthana86 commentedon Dec 3, 2018
Two questions still:
Why are these different:
var foo = mlContext.Regression.Trainers;
var foo_bar = mlContext.BinaryClassification.Trainers;
var foo_moo_bar = mlContext.Recommendation().Trainers;
vs
var foo_moo_bar = mlContext.Recommendation.Trainers
and then the order of parameters again for consistency? other trainers seem to take Label as the first parameter for input. so this instead.
var trainer = mlcontext.Recommendation().Trainers.MatrixFactorization
(labelColumn: "rating", matrixColumnIndexColumnName: "userIdEncoded", matrixRowIndexColumnName: "movieIdEncoded"));
wschin commentedon Dec 3, 2018
@assafi, for your second question, matrix factorization is a special problem. It's not standard regression/classification which maps a feature vector to a label, so it looks not very bad to have label as the last argument. Of course, you can submit a PR if this doesn't look good enough for you. It should be a minor change.
singlis commentedon Dec 4, 2018
Hi @asthana86,
For the question about the differences in the mlContext api calls, please see issue #1770 as this addresses your question. It is because Recommendation is not part of the core nuget package and therefore is defined as an extension rather than a property of MLContext.
For the ordering of label columns in the matrix factorization, we can create an issue on this or use this as an the issue for tracking.
asthana86 commentedon Dec 4, 2018
I am not sure if #1770 addresses my issue. #1770 is about discoverability, the fix we came up for that was one needs to acquire the MatrixFactorization Nuget for now. I do agree with @GalOshri its not the best experience but even with that nuget acquisition the usage pattern should remain the same.
Given the need to acquire an additional NuGet is there no way to have the API be consistent like the one that follows? It just looks a bit odd from a user perspective.
mlcontext.Recommendation.Trainers.MatrixFactorization
instead of
mlcontext.Recommendation().Trainers.MatrixFactorization
In terms of the other issue, it might be worth it creating a list of all learners on order of input parameters /output parameters and follow it across. I have been working on exporting these samples to 0.8 and after playing with Regression, Classification the recommendation ML task is a bit less consistent to other MLTasks.
singlis commentedon Dec 4, 2018
I agree with you @asthana86 it is odd from the user's perspective and I think we could do better here - The current implementation has the trainers declared as properties on MLContext. This current implementation does not work when the code lives in a different nuget package (i.e. we can't add a RecommendationContext property on MLContext since we aren't guaranteed the nuget is installed).
I would like to see if there are other ways where this could be more flexible based on what nugets are installed by the user. It would need some investigation - but your main point is that the api needs to be consistent, right?
Also we are in talks with the .net core team regarding API changes. I think this is something worth discussing with them. @TomFinley, @Zruty0, @eerhardt
I want to separate the issues to get more specific, here are the issues as I understand:
Issue 1 - The trainer API for MLContext is inconsistent for learner in external nugets
Issue 2 - MatrixFactorization construction parameters are not consistent with other learners.
I also like the idea of having consistent parameter usage and confirming if this is consistent across the board. Looking through our existing issues, there is a number of issues titled "Final Public API *" for learners and transforms. For example there is #1703. While they do mention public constructor, they do not mention consistent parameter usage so I added a note to the issue.
asthana86 commentedon Dec 4, 2018
Sounds great. Thanks for capturing this.
singlis commentedon Dec 5, 2018
Thanks @asthana86. For reference, I filed the two issues
#1827 The trainer API for MLContext is inconsistent for learner in external nugets
#1826 - MatrixFactorization construction parameters are not consistent with other learners.
I will keep this issue open for a few days in case there is any other things to discuss.
singlis commentedon Dec 7, 2018
Closing as we have the other two issues filed.