Make FastTree/LightGBM learned model suitable for public consumption #1960

TomFinley · 2018-12-23T16:08:54Z

Related to #1901. Regarding FastTree/LightGBM we may want to refine the public prediction/learned model surface a bit more.

The structures exposed here, including TreeEnsemble but especially RegressionTree, are dual use structures in the sense that they are used for both training and prediction. This means it is irretreivably mutable (as it must be during training) even during prediction. It is polymorphic in its structure, in the sense that various structures are active and populated during training than are used during prediction -- worse, in some cases the same structures are used, but have distinct meanings. (E.g., the feature indices during training are distinct from those used during prediction, since if FastTree determines it can only use a subset of features, it will during training not even consider the unusable features to exist, until it is done with training and maps the structure back to the original feature space so it is usable by ML.NET.) There are other serious but (in context) comparatively minor notes, like not being either sealed or abstract, disobeying more .NET naming and implementation guidelines than I care to enumerate, etc.

Yet, we cannot avoid exposing some structure, since of course people want to inspect the trees they have learnt. And, we use this structure to represent the trained ensembles of both the FastTree and LightGBM learner, and, if we ever clean up our XGBoost wrapper so it can be open sourced (or, perhaps even, someone implements a fresh wrapper for us), maybe it will wind up using exactly that same structure. This will also serve them better since this class's structure is, for the same reasons enumerated above, incomprehensible, since about half the members on it are things people shouldn't use. (Since they are used as explained above exclusively during training.)

What should be done instead is the following: during training there is an internal class, in fact, probably more or less the same class that exists now, with a separate immutable class that is exposed as the model during training, created out of instances of this class. This also implies a separate training/prediction structure of what is currently called TreeEnsemble, and of course the quantile trees will have to be fashioned in some way. Much of the code supporting prediction and the standard ML.NET interfaces would move into that set of immutable classes.

The text was updated successfully, but these errors were encountered:

TomFinley added the API Issues pertaining the friendly API label Dec 23, 2018

wschin self-assigned this Jan 18, 2019

wschin mentioned this issue Jan 25, 2019

Public Interface of RegressionTree and TreeEnsemble #2243

Merged

Ivanidzo4ka mentioned this issue Jan 29, 2019

Cannot access leaf nodes of trained trees #2084

Closed

wschin closed this as completed in #2243 Feb 1, 2019

ghost locked as resolved and limited conversation to collaborators Mar 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make FastTree/LightGBM learned model suitable for public consumption #1960

Make FastTree/LightGBM learned model suitable for public consumption #1960

TomFinley commented Dec 23, 2018 •

edited

Loading

Make FastTree/LightGBM learned model suitable for public consumption #1960

Make FastTree/LightGBM learned model suitable for public consumption #1960

Comments

TomFinley commented Dec 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TomFinley commented Dec 23, 2018 •

edited

Loading