API reference - Finalize the template for trainer samples #3160

shmoradims · 2019-04-01T22:02:04Z

Our current template for trainer API reference samples is as follows:

Create in-memory random training data (as discussed in Text loader v.s in-memory data structure in API reference samples #2726 we're avoiding complex datasets and text-loader).
Create a pipeline with just the trainer (i.e. focusing only on the API that this sample is about without getting into complex featurization pipeline).
Fit the trainer.
Generate 5 predictions and output results.
Evaluate with in-memory test data and output metrics.

Below are some examples:

Binary classification: FastTree, FastTreeWithOptions
Regression: PoissonRegression, PoissonRegressionWithOptions.

Other samples will be added using this template. If you want to make any changes please mention them here so that we can finalize the sample.

/cc @shauheen @glebuk

wschin · 2019-04-02T17:22:13Z

What's the desired position of in-memory data structures? Before Example() or after Example()? In addition, we should add reference papers as the doc string of Example() whenever finding them costs less than 30 mins. That doc string needs to include a brief introduction to Example().

shmoradims · 2019-04-02T22:41:52Z

I added the position to the template:
// Add the data structures and helper classes below Example()

For reference papers, please add a snippet so that I know what you mean.

sfilipi · 2019-04-03T19:51:31Z

I think we should also add reading the model parameters. There has been asks for it.

shmoradims · 2019-04-04T17:37:00Z

Are model parameters available for all trainers or just some? Do they change per ML task, or trainer family (linear, tree, etc). Could you please post a snippet here?

wschin · 2019-04-04T22:39:11Z

Here is some reference paper of matrix factorization in ML.NET:

machinelearning/src/Microsoft.ML.Recommender/MatrixFactorizationTrainer.cs

Line 69 in b8a70ac

/// <list type = 'bullet'>

. My motivation is to make ML.NET a place for entry-level ML users to become senior-level scientists.

wschin · 2019-04-04T22:41:03Z

Showing model parameters will be too much for a trainer's first example. However, inspecting models is definitely important, so I'd suggest to have at least two example files per trainer.

shmoradims · 2019-04-06T00:59:34Z

@wschin could you please provide snippets for the second example you want?

wschin · 2019-04-09T15:54:25Z

Sure. I got one from a test.

        /// <summary>
        /// Introspective Training: Linear model parameters may be inspected.
        /// </summary>
        [Fact]
        public void InpsectLinearModelParameters()
        {
            // ...
            // Prepare a linear model and data by reusing code in the first example.
            // ...

            // Train the model.
            var model = pipeline.Fit(data);

            // Extract the linear model from the pipeline.
            var linearModel = model.Model;

            // Get the model bias and weights.
            var bias = linearModel.Bias;
            var weights = linearModel.Weights;

            //  Print coefficient of each feature and show the computation of score using bias and weights.
            //  ...
        }

The key differentiator between this example and the first example is that

We teach user to reproduce the prediction rule implemented inside that class, and
show some useful properties of the trained model.

In this example, the need of adding papers will be even stronger, because in general, we are not able to cover every math bit in code. If papers are added, we also need to make sure the symbols in code are linked with their associated symbols in papers; for example,

// This is the linear coefficient vector denoted by \boldsymbol{w} in reference [1]
var weights = linearModel.Weights;

wschin closed this as completed Apr 4, 2019

wschin reopened this Apr 4, 2019

shmoradims mentioned this issue Jun 7, 2019

Catalog documentation is not in sync with the code #3126

Closed

wschin closed this as completed Jul 2, 2019

ghost locked as resolved and limited conversation to collaborators Mar 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API reference - Finalize the template for trainer samples #3160

API reference - Finalize the template for trainer samples #3160

shmoradims commented Apr 1, 2019 •

edited

Loading

wschin commented Apr 2, 2019 •

edited

Loading

shmoradims commented Apr 2, 2019

sfilipi commented Apr 3, 2019

shmoradims commented Apr 4, 2019

wschin commented Apr 4, 2019

wschin commented Apr 4, 2019

shmoradims commented Apr 6, 2019

wschin commented Apr 9, 2019 •

edited

Loading

API reference - Finalize the template for trainer samples #3160

API reference - Finalize the template for trainer samples #3160

Comments

shmoradims commented Apr 1, 2019 • edited Loading

wschin commented Apr 2, 2019 • edited Loading

shmoradims commented Apr 2, 2019

sfilipi commented Apr 3, 2019

shmoradims commented Apr 4, 2019

wschin commented Apr 4, 2019

wschin commented Apr 4, 2019

shmoradims commented Apr 6, 2019

wschin commented Apr 9, 2019 • edited Loading

shmoradims commented Apr 1, 2019 •

edited

Loading

wschin commented Apr 2, 2019 •

edited

Loading

wschin commented Apr 9, 2019 •

edited

Loading