Skip to content

Feature Importance with ML.NET #599

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
WladdGorshenin opened this issue Jul 30, 2018 · 6 comments
Closed

Feature Importance with ML.NET #599

WladdGorshenin opened this issue Jul 30, 2018 · 6 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@WladdGorshenin
Copy link

Dear ML.NET team and community members,

I'm so excited about ML.NET. It helps me easily integrate ML capabilities in a C# projects.
But as evolving project it lacks documentation and code examples. Therefore I'd like to ask the following question.

My current project requires not only prediction but reasoning behind it as well. I tried my approach with decision trees in Python/Sklearn and have proved my PoC. Now I'm going to implement the same approach with ML.NET and I'd like to know:

  • what is the best way to derive feature importance out of a trained tree/forest?
  • what is the best way to implement a method similar to DecisionTreeClassifier.decision_path with ML.NET?
@justinormont justinormont added the question Further information is requested label Jul 30, 2018
@Zruty0
Copy link
Contributor

Zruty0 commented Aug 9, 2018

@WladdGorshenin ,

  1. After we train a tree ensemble model, the trainer produces the 'model summary', which includes aggregated per-feature gains. These proved a useful proxy to 'feature importance'.

Currently it's a bit of a chore to extract the summary post-training, but it's definitely possible.
You can take a look at a complete example, where we inspect the topology of the tree among other things:
https://github.com/dotnet/machinelearning/pull/653/files#diff-d36b6bf4d2fcf5366387069ff79b95a5

treePredictor.GetSummaryInKeyValuePairs() is a method that you can call to extract the per-feature aggregated gains.

In addition to this artifact of training, we are also planning to enable some more 'explainability' features: namely:

  • permutation feature importance: it's a model-agnostic analysis tool that tries to assess which features the model is more sensitive to
  • per-example feature gains: for any given example and a model (tree ensemble or linear), we can give a signed 'feature impact' of each feature to the score of that example. Note that this analysis is per-example, whereas the above is for the dataset as a whole.

These features await their porting to ML.NET, and @GalOshri would like to know how much value would you put in them.

@WladdGorshenin
Copy link
Author

Hi @Zruty0 , thank you for answering the first point. I'm working on it. Could you please give me any hint on the second point (what is the best way to implement a method similar to DecisionTreeClassifier.decision_path with ML.NET?)

@klausmh
Copy link
Contributor

klausmh commented Sep 4, 2018

+1 for adding permutation feature importance and per-example feature gains to ML.NET. That is something we would need.

@Zruty0
Copy link
Contributor

Zruty0 commented Nov 5, 2018

@GalOshri , could you please consolidate all the requests for feature importance in one issue, and close the others?

@GalOshri
Copy link
Contributor

GalOshri commented Nov 5, 2018

@Zruty0 I looked through some of the explainability issues and will close some of the duplicates but others are worth keeping open for more open-ended discussion.

This issue refers to specific components that need to be moved to ML.NET (permutation feature importance and per-example feature gains). Can we try to schedule this for 0.8?

@shauheen
Copy link
Contributor

shauheen commented Dec 6, 2018

closing this as we shipped this functionality in 0.8, feel free to reopen if still not completely addressed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

7 participants