Skip to content

Usage of Matrix Factorization Trainer for Recommendation  #1806

Closed
@asthana86

Description

@asthana86

When using Matrix Factorization Trainer:

var trainer = mlcontext.Recommendation().Trainers.MatrixFactorization
                                                    ("userIdEncoded", "movieIdEncoded", "rating"));

When using other trainers:

var trainer = mlContext.Regression.Trainers.StochasticDualCoordinateAscent
                                                      (label: "Label", features: "Features");

Is the difference in usage prop vs. method by design? Also there is a difference in the order of parameters being passed. First parameter is Label vs. Features being used.

Activity

wschin

wschin commented on Dec 3, 2018

@wschin
Member
var trainer = mlcontext.Recommendation().Trainers.MatrixFactorization
                                                    ("userIdEncoded", "movieIdEncoded", "rating"));

is equivalent to

var trainer = mlcontext.Recommendation().Trainers.MatrixFactorization
                                                    (matrixColumnIndexColumnName: "userIdEncoded", matrixRowIndexColumnName: "movieIdEncoded", labelColumn: "rating"));

according to its signature

        /// <summary>
        /// Initializing a new instance of <see cref="MatrixFactorizationTrainer"/>.
        /// </summary>
        /// <param name="env">The private instance of <see cref="IHostEnvironment"/>.</param>
        /// <param name="matrixColumnIndexColumnName">The name of the column hosting the matrix's column IDs.</param>
        /// <param name="matrixRowIndexColumnName">The name of the column hosting the matrix's row IDs.</param>
        /// <param name="labelColumn">The name of the label column.</param>
        /// <param name="advancedSettings">A delegate to apply all the advanced arguments to the algorithm.</param>
        public MatrixFactorizationTrainer(IHostEnvironment env,
            string matrixColumnIndexColumnName,
            string matrixRowIndexColumnName,
            string labelColumn = DefaultColumnNames.Label,
            Action<Arguments> advancedSettings = null)
            : base(env, LoadNameValue)

Instead of a single feature Column and a label Column, matrix factorization requires row index Column, column index Column, and label Column. For example, assume that

row index column = [0, 3, 1]
column index column = [2, 1, 0]
label column = [7, 7, 8]

the 4-by-3 rating matrix being factorized may be

[? ? 7]
|8 ? ? |
|? ? ? |
[? 7 ?]

where ? happens at row u and column v means user u never rates v in your training data. Note that I assume those IDs are 0-based indexes.

asthana86

asthana86 commented on Dec 3, 2018

@asthana86
ContributorAuthor

Two questions still:

Why are these different:
var foo = mlContext.Regression.Trainers;
var foo_bar = mlContext.BinaryClassification.Trainers;
var foo_moo_bar = mlContext.Recommendation().Trainers;

vs
var foo_moo_bar = mlContext.Recommendation.Trainers

and then the order of parameters again for consistency? other trainers seem to take Label as the first parameter for input. so this instead.

var trainer = mlcontext.Recommendation().Trainers.MatrixFactorization
(labelColumn: "rating", matrixColumnIndexColumnName: "userIdEncoded", matrixRowIndexColumnName: "movieIdEncoded"));

wschin

wschin commented on Dec 3, 2018

@wschin
Member

@assafi, for your second question, matrix factorization is a special problem. It's not standard regression/classification which maps a feature vector to a label, so it looks not very bad to have label as the last argument. Of course, you can submit a PR if this doesn't look good enough for you. It should be a minor change.

singlis

singlis commented on Dec 4, 2018

@singlis
Member

Hi @asthana86,

For the question about the differences in the mlContext api calls, please see issue #1770 as this addresses your question. It is because Recommendation is not part of the core nuget package and therefore is defined as an extension rather than a property of MLContext.

For the ordering of label columns in the matrix factorization, we can create an issue on this or use this as an the issue for tracking.

asthana86

asthana86 commented on Dec 4, 2018

@asthana86
ContributorAuthor

I am not sure if #1770 addresses my issue. #1770 is about discoverability, the fix we came up for that was one needs to acquire the MatrixFactorization Nuget for now. I do agree with @GalOshri its not the best experience but even with that nuget acquisition the usage pattern should remain the same.

Given the need to acquire an additional NuGet is there no way to have the API be consistent like the one that follows? It just looks a bit odd from a user perspective.

mlcontext.Recommendation.Trainers.MatrixFactorization

instead of

mlcontext.Recommendation().Trainers.MatrixFactorization

In terms of the other issue, it might be worth it creating a list of all learners on order of input parameters /output parameters and follow it across. I have been working on exporting these samples to 0.8 and after playing with Regression, Classification the recommendation ML task is a bit less consistent to other MLTasks.

singlis

singlis commented on Dec 4, 2018

@singlis
Member

I agree with you @asthana86 it is odd from the user's perspective and I think we could do better here - The current implementation has the trainers declared as properties on MLContext. This current implementation does not work when the code lives in a different nuget package (i.e. we can't add a RecommendationContext property on MLContext since we aren't guaranteed the nuget is installed).

I would like to see if there are other ways where this could be more flexible based on what nugets are installed by the user. It would need some investigation - but your main point is that the api needs to be consistent, right?

Also we are in talks with the .net core team regarding API changes. I think this is something worth discussing with them. @TomFinley, @Zruty0, @eerhardt

I want to separate the issues to get more specific, here are the issues as I understand:
Issue 1 - The trainer API for MLContext is inconsistent for learner in external nugets
Issue 2 - MatrixFactorization construction parameters are not consistent with other learners.

I also like the idea of having consistent parameter usage and confirming if this is consistent across the board. Looking through our existing issues, there is a number of issues titled "Final Public API *" for learners and transforms. For example there is #1703. While they do mention public constructor, they do not mention consistent parameter usage so I added a note to the issue.

asthana86

asthana86 commented on Dec 4, 2018

@asthana86
ContributorAuthor

Sounds great. Thanks for capturing this.

singlis

singlis commented on Dec 5, 2018

@singlis
Member

Thanks @asthana86. For reference, I filed the two issues
#1827 The trainer API for MLContext is inconsistent for learner in external nugets
#1826 - MatrixFactorization construction parameters are not consistent with other learners.

I will keep this issue open for a few days in case there is any other things to discuss.

singlis

singlis commented on Dec 7, 2018

@singlis
Member

Closing as we have the other two issues filed.

ghost locked as resolved and limited conversation to collaborators on Mar 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @wschin@asthana86@singlis

        Issue actions

          Usage of Matrix Factorization Trainer for Recommendation · Issue #1806 · dotnet/machinelearning