Document the possible output column names in scored DataView #10136

CESARDELATORRE · 2019-01-24T19:51:11Z

Right now, these column/field names are kind of "secret names" that you need to know.
I don't think this is documented "end-to-end" in docs.microsoft.com, neither in the HOW to / CookBook.

We should have the following info available:

The purpose of the output columns in scored IDataView depends on the specific learning task being used:

Regression

Label: Original regression value of the example.
Score: Predicted regression value.

Binary Classification

Label: Original Label of the example.
Score: Raw score from the learner (e.g. value before applying sigmoid function to get probability).
Probability: Probability of being in certain class
PredictedLabel: Predicted class.

Multi-class Classification

Label: Original Label of the example.
Score: Its an array whose length is equal to number of classes and contains probability for each class.
PredictedLabel: Predicted class.

Clustering

Label: Original cluster Id of the example.
Score: Its an array whose length is equal to number of clusters. It contains square distance from the cluster centeriod.
PredictedLabel: Predicted cluster Id.

This is related to the following issue requesting that info and raised at the ML.NET repo:
dotnet/machinelearning#376

This additional issue might also be related:
#5640

mairaw · 2019-02-04T22:12:41Z

@JRAlexander @luisquintanilla FYI - I just tagged it with the ML.NET Guide

shmoradims · 2019-05-20T23:17:54Z

I believe this issue is already addressed as part of 1.0 API reference documentation. Now all trainers have a sub-section in their remarks called input/output columns, where the types and definition of the input/output columns are clearly explained. E.g.: https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.trainers.averagedperceptrontrainer?view=ml-dotnet#input-and-output-columns

JRAlexander · 2019-05-21T00:22:29Z

Thanks, @shmoradims! I agree and am therefore closing.

dotnet-bot added the ⌚ Not Triaged Not triaged label Jan 24, 2019

mairaw added the 📚 Area - ML.NET Guide label Feb 4, 2019

JRAlexander added this to the Backlog milestone Feb 4, 2019

JRAlexander removed the ⌚ Not Triaged Not triaged label Feb 4, 2019

luisquintanilla self-assigned this Apr 30, 2019

JRAlexander closed this as completed May 21, 2019

mairaw removed this from the Backlog milestone Oct 16, 2019

BillWagner added dotnet-ml/svc and removed 📚 Area - ML.NET Guide labels Feb 9, 2021

dotnet-bot added the ⌚ Not Triaged Not triaged label Feb 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document the possible output column names in scored DataView #10136

Document the possible output column names in scored DataView #10136

CESARDELATORRE commented Jan 24, 2019

mairaw commented Feb 4, 2019

shmoradims commented May 20, 2019

JRAlexander commented May 21, 2019

Document the possible output column names in scored DataView #10136

Document the possible output column names in scored DataView #10136

Comments

CESARDELATORRE commented Jan 24, 2019

We should have the following info available:

Regression

Binary Classification

Multi-class Classification

Clustering

mairaw commented Feb 4, 2019

shmoradims commented May 20, 2019

JRAlexander commented May 21, 2019