Skip to content

Make FastTree/LightGBM learned model suitable for public consumption #1960

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomFinley opened this issue Dec 23, 2018 · 0 comments · Fixed by #2243
Closed

Make FastTree/LightGBM learned model suitable for public consumption #1960

TomFinley opened this issue Dec 23, 2018 · 0 comments · Fixed by #2243
Assignees
Labels
API Issues pertaining the friendly API

Comments

@TomFinley
Copy link
Contributor

TomFinley commented Dec 23, 2018

Related to #1901. Regarding FastTree/LightGBM we may want to refine the public prediction/learned model surface a bit more.

The structures exposed here, including TreeEnsemble but especially RegressionTree, are dual use structures in the sense that they are used for both training and prediction. This means it is irretreivably mutable (as it must be during training) even during prediction. It is polymorphic in its structure, in the sense that various structures are active and populated during training than are used during prediction -- worse, in some cases the same structures are used, but have distinct meanings. (E.g., the feature indices during training are distinct from those used during prediction, since if FastTree determines it can only use a subset of features, it will during training not even consider the unusable features to exist, until it is done with training and maps the structure back to the original feature space so it is usable by ML.NET.) There are other serious but (in context) comparatively minor notes, like not being either sealed or abstract, disobeying more .NET naming and implementation guidelines than I care to enumerate, etc.

Yet, we cannot avoid exposing some structure, since of course people want to inspect the trees they have learnt. And, we use this structure to represent the trained ensembles of both the FastTree and LightGBM learner, and, if we ever clean up our XGBoost wrapper so it can be open sourced (or, perhaps even, someone implements a fresh wrapper for us), maybe it will wind up using exactly that same structure. This will also serve them better since this class's structure is, for the same reasons enumerated above, incomprehensible, since about half the members on it are things people shouldn't use. (Since they are used as explained above exclusively during training.)

What should be done instead is the following: during training there is an internal class, in fact, probably more or less the same class that exists now, with a separate immutable class that is exposed as the model during training, created out of instances of this class. This also implies a separate training/prediction structure of what is currently called TreeEnsemble, and of course the quantile trees will have to be fashioned in some way. Much of the code supporting prediction and the standard ML.NET interfaces would move into that set of immutable classes.

@TomFinley TomFinley added the API Issues pertaining the friendly API label Dec 23, 2018
@wschin wschin self-assigned this Jan 18, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants