Skip to content

Document the possible output column names in scored DataView #10136

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
CESARDELATORRE opened this issue Jan 24, 2019 · 3 comments
Closed

Document the possible output column names in scored DataView #10136

CESARDELATORRE opened this issue Jan 24, 2019 · 3 comments
Assignees

Comments

@CESARDELATORRE
Copy link
Contributor

Right now, these column/field names are kind of "secret names" that you need to know.
I don't think this is documented "end-to-end" in docs.microsoft.com, neither in the HOW to / CookBook.

We should have the following info available:

The purpose of the output columns in scored IDataView depends on the specific learning task being used:

Regression

  • Label: Original regression value of the example.
  • Score: Predicted regression value.

Binary Classification

  • Label: Original Label of the example.
  • Score: Raw score from the learner (e.g. value before applying sigmoid function to get probability).
  • Probability: Probability of being in certain class
  • PredictedLabel: Predicted class.

Multi-class Classification

  • Label: Original Label of the example.
  • Score: Its an array whose length is equal to number of classes and contains probability for each class.
  • PredictedLabel: Predicted class.

Clustering

  • Label: Original cluster Id of the example.
  • Score: Its an array whose length is equal to number of clusters. It contains square distance from the cluster centeriod.
  • PredictedLabel: Predicted cluster Id.

This is related to the following issue requesting that info and raised at the ML.NET repo:
dotnet/machinelearning#376

This additional issue might also be related:
#5640

@mairaw
Copy link
Contributor

mairaw commented Feb 4, 2019

@JRAlexander @luisquintanilla FYI - I just tagged it with the ML.NET Guide

@JRAlexander JRAlexander added this to the Backlog milestone Feb 4, 2019
@JRAlexander JRAlexander removed the ⌚ Not Triaged Not triaged label Feb 4, 2019
@luisquintanilla luisquintanilla self-assigned this Apr 30, 2019
@shmoradims
Copy link

I believe this issue is already addressed as part of 1.0 API reference documentation. Now all trainers have a sub-section in their remarks called input/output columns, where the types and definition of the input/output columns are clearly explained. E.g.: https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.trainers.averagedperceptrontrainer?view=ml-dotnet#input-and-output-columns

@JRAlexander
Copy link
Contributor

Thanks, @shmoradims! I agree and am therefore closing.

@mairaw mairaw removed this from the Backlog milestone Oct 16, 2019
@dotnet-bot dotnet-bot added the ⌚ Not Triaged Not triaged label Feb 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants