Skip to content

Usage of Matrix Factorization Trainer for Recommendation #1806

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
asthana86 opened this issue Dec 3, 2018 · 9 comments
Closed

Usage of Matrix Factorization Trainer for Recommendation #1806

asthana86 opened this issue Dec 3, 2018 · 9 comments
Labels
question Further information is requested

Comments

@asthana86
Copy link
Contributor

When using Matrix Factorization Trainer:

var trainer = mlcontext.Recommendation().Trainers.MatrixFactorization
                                                    ("userIdEncoded", "movieIdEncoded", "rating"));

When using other trainers:

var trainer = mlContext.Regression.Trainers.StochasticDualCoordinateAscent
                                                      (label: "Label", features: "Features");

Is the difference in usage prop vs. method by design? Also there is a difference in the order of parameters being passed. First parameter is Label vs. Features being used.

@asthana86 asthana86 added the question Further information is requested label Dec 3, 2018
@wschin
Copy link
Member

wschin commented Dec 3, 2018

var trainer = mlcontext.Recommendation().Trainers.MatrixFactorization
                                                    ("userIdEncoded", "movieIdEncoded", "rating"));

is equivalent to

var trainer = mlcontext.Recommendation().Trainers.MatrixFactorization
                                                    (matrixColumnIndexColumnName: "userIdEncoded", matrixRowIndexColumnName: "movieIdEncoded", labelColumn: "rating"));

according to its signature

        /// <summary>
        /// Initializing a new instance of <see cref="MatrixFactorizationTrainer"/>.
        /// </summary>
        /// <param name="env">The private instance of <see cref="IHostEnvironment"/>.</param>
        /// <param name="matrixColumnIndexColumnName">The name of the column hosting the matrix's column IDs.</param>
        /// <param name="matrixRowIndexColumnName">The name of the column hosting the matrix's row IDs.</param>
        /// <param name="labelColumn">The name of the label column.</param>
        /// <param name="advancedSettings">A delegate to apply all the advanced arguments to the algorithm.</param>
        public MatrixFactorizationTrainer(IHostEnvironment env,
            string matrixColumnIndexColumnName,
            string matrixRowIndexColumnName,
            string labelColumn = DefaultColumnNames.Label,
            Action<Arguments> advancedSettings = null)
            : base(env, LoadNameValue)

Instead of a single feature Column and a label Column, matrix factorization requires row index Column, column index Column, and label Column. For example, assume that

row index column = [0, 3, 1]
column index column = [2, 1, 0]
label column = [7, 7, 8]

the 4-by-3 rating matrix being factorized may be

[? ? 7]
|8 ? ? |
|? ? ? |
[? 7 ?]

where ? happens at row u and column v means user u never rates v in your training data. Note that I assume those IDs are 0-based indexes.

@asthana86
Copy link
Contributor Author

Two questions still:

Why are these different:
var foo = mlContext.Regression.Trainers;
var foo_bar = mlContext.BinaryClassification.Trainers;
var foo_moo_bar = mlContext.Recommendation().Trainers;

vs
var foo_moo_bar = mlContext.Recommendation.Trainers

and then the order of parameters again for consistency? other trainers seem to take Label as the first parameter for input. so this instead.

var trainer = mlcontext.Recommendation().Trainers.MatrixFactorization
(labelColumn: "rating", matrixColumnIndexColumnName: "userIdEncoded", matrixRowIndexColumnName: "movieIdEncoded"));

@wschin
Copy link
Member

wschin commented Dec 3, 2018

@assafi, for your second question, matrix factorization is a special problem. It's not standard regression/classification which maps a feature vector to a label, so it looks not very bad to have label as the last argument. Of course, you can submit a PR if this doesn't look good enough for you. It should be a minor change.

@singlis
Copy link
Member

singlis commented Dec 4, 2018

Hi @asthana86,

For the question about the differences in the mlContext api calls, please see issue #1770 as this addresses your question. It is because Recommendation is not part of the core nuget package and therefore is defined as an extension rather than a property of MLContext.

For the ordering of label columns in the matrix factorization, we can create an issue on this or use this as an the issue for tracking.

@asthana86
Copy link
Contributor Author

I am not sure if #1770 addresses my issue. #1770 is about discoverability, the fix we came up for that was one needs to acquire the MatrixFactorization Nuget for now. I do agree with @GalOshri its not the best experience but even with that nuget acquisition the usage pattern should remain the same.

Given the need to acquire an additional NuGet is there no way to have the API be consistent like the one that follows? It just looks a bit odd from a user perspective.

mlcontext.Recommendation.Trainers.MatrixFactorization

instead of

mlcontext.Recommendation().Trainers.MatrixFactorization

In terms of the other issue, it might be worth it creating a list of all learners on order of input parameters /output parameters and follow it across. I have been working on exporting these samples to 0.8 and after playing with Regression, Classification the recommendation ML task is a bit less consistent to other MLTasks.

@singlis
Copy link
Member

singlis commented Dec 4, 2018

I agree with you @asthana86 it is odd from the user's perspective and I think we could do better here - The current implementation has the trainers declared as properties on MLContext. This current implementation does not work when the code lives in a different nuget package (i.e. we can't add a RecommendationContext property on MLContext since we aren't guaranteed the nuget is installed).

I would like to see if there are other ways where this could be more flexible based on what nugets are installed by the user. It would need some investigation - but your main point is that the api needs to be consistent, right?

Also we are in talks with the .net core team regarding API changes. I think this is something worth discussing with them. @TomFinley, @Zruty0, @eerhardt

I want to separate the issues to get more specific, here are the issues as I understand:
Issue 1 - The trainer API for MLContext is inconsistent for learner in external nugets
Issue 2 - MatrixFactorization construction parameters are not consistent with other learners.

I also like the idea of having consistent parameter usage and confirming if this is consistent across the board. Looking through our existing issues, there is a number of issues titled "Final Public API *" for learners and transforms. For example there is #1703. While they do mention public constructor, they do not mention consistent parameter usage so I added a note to the issue.

@asthana86
Copy link
Contributor Author

Sounds great. Thanks for capturing this.

@singlis
Copy link
Member

singlis commented Dec 5, 2018

Thanks @asthana86. For reference, I filed the two issues
#1827 The trainer API for MLContext is inconsistent for learner in external nugets
#1826 - MatrixFactorization construction parameters are not consistent with other learners.

I will keep this issue open for a few days in case there is any other things to discuss.

@singlis
Copy link
Member

singlis commented Dec 7, 2018

Closing as we have the other two issues filed.

@singlis singlis closed this as completed Dec 7, 2018
@ghost ghost locked as resolved and limited conversation to collaborators Mar 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants