Skip to content

API reference - Finalize the template for trainer samples #3160

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
shmoradims opened this issue Apr 1, 2019 · 8 comments
Closed

API reference - Finalize the template for trainer samples #3160

shmoradims opened this issue Apr 1, 2019 · 8 comments

Comments

@shmoradims
Copy link

shmoradims commented Apr 1, 2019

Our current template for trainer API reference samples is as follows:

  1. Create in-memory random training data (as discussed in Text loader v.s in-memory data structure in API reference samples #2726 we're avoiding complex datasets and text-loader).
  2. Create a pipeline with just the trainer (i.e. focusing only on the API that this sample is about without getting into complex featurization pipeline).
  3. Fit the trainer.
  4. Generate 5 predictions and output results.
  5. Evaluate with in-memory test data and output metrics.

Below are some examples:

Other samples will be added using this template. If you want to make any changes please mention them here so that we can finalize the sample.

/cc @shauheen @glebuk

@wschin
Copy link
Member

wschin commented Apr 2, 2019

What's the desired position of in-memory data structures? Before Example() or after Example()? In addition, we should add reference papers as the doc string of Example() whenever finding them costs less than 30 mins. That doc string needs to include a brief introduction to Example().

@shmoradims
Copy link
Author

I added the position to the template:
// Add the data structures and helper classes below Example()

For reference papers, please add a snippet so that I know what you mean.

@sfilipi
Copy link
Member

sfilipi commented Apr 3, 2019

I think we should also add reading the model parameters. There has been asks for it.

@shmoradims
Copy link
Author

Are model parameters available for all trainers or just some? Do they change per ML task, or trainer family (linear, tree, etc). Could you please post a snippet here?

@wschin wschin closed this as completed Apr 4, 2019
@wschin wschin reopened this Apr 4, 2019
@wschin
Copy link
Member

wschin commented Apr 4, 2019

Here is some reference paper of matrix factorization in ML.NET:

. My motivation is to make ML.NET a place for entry-level ML users to become senior-level scientists.

@wschin
Copy link
Member

wschin commented Apr 4, 2019

Showing model parameters will be too much for a trainer's first example. However, inspecting models is definitely important, so I'd suggest to have at least two example files per trainer.

@shmoradims
Copy link
Author

@wschin could you please provide snippets for the second example you want?

@wschin
Copy link
Member

wschin commented Apr 9, 2019

Sure. I got one from a test.

        /// <summary>
        /// Introspective Training: Linear model parameters may be inspected.
        /// </summary>
        [Fact]
        public void InpsectLinearModelParameters()
        {
            // ...
            // Prepare a linear model and data by reusing code in the first example.
            // ...

            // Train the model.
            var model = pipeline.Fit(data);

            // Extract the linear model from the pipeline.
            var linearModel = model.Model;

            // Get the model bias and weights.
            var bias = linearModel.Bias;
            var weights = linearModel.Weights;

            //  Print coefficient of each feature and show the computation of score using bias and weights.
            //  ...
        }

The key differentiator between this example and the first example is that

  • We teach user to reproduce the prediction rule implemented inside that class, and
  • show some useful properties of the trained model.

In this example, the need of adding papers will be even stronger, because in general, we are not able to cover every math bit in code. If papers are added, we also need to make sure the symbols in code are linked with their associated symbols in papers; for example,

// This is the linear coefficient vector denoted by \boldsymbol{w} in reference [1]
var weights = linearModel.Weights;

@wschin wschin closed this as completed Jul 2, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants