Skip to content

FastTree: Instantiate feature map for disk transpose and make Generalized Additive Models predictor resilient when feature map is not available. #123

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
codemzs opened this issue May 11, 2018 · 3 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@codemzs
Copy link
Member

codemzs commented May 11, 2018

We drop features from FastTree gradient boosting decision tree during training that offer little to no value such as features that have zero instance count during training or features that don't have enough instance count for unique feature values. Due to this the feature count in training set can be less than or equal to the feature count in the input features vector from the user, hence we use a featuremap internally to map dataset training features to the input features.

Issue# 1:
If no features are dropped or filtered during training then feature map is not created. FastTree handles a null featuremap but Generalized Additive Model(GAM) predictor does not.

Issue# 1.1:
Before training starts in FastTree we go through a data preparation step where we transpose the dataset and eliminate examples that have missing feature values. The transpose can be done in memory or on disk(recommended for larger dataset). In disk transpose the code was not filtering features that were not supposed to be included in training and it was also not creating a feature map when one was supposed to be created. Hence a null feature map was passed to GAM predictor which was not resilient to it.

@markusweimer
Copy link
Member

Can you explain more? The title makes it sound like two separate issues to me.

@codemzs
Copy link
Member Author

codemzs commented May 11, 2018

@markusweimer: We drop features from FastTree gradient boosting decision tree during training that offer little to no value such as features that have zero instance count during training or features that don't have enough instance count for unique feature values. Due to this the feature count in training set can be less than or equal to the feature count in the input features vector from the user, hence we use a featuremap internally to map dataset training features to the input features.

Issue# 1:
If no features are dropped or filtered during training then feature map is not created. FastTree handles a null featuremap but Generalized Additive Model(GAM) predictor does not.

Issue# 1.1:
Before training starts in FastTree we go through a data preparation step where we transpose the dataset and eliminate examples that have missing feature values. The transpose can be done in memory or on disk(recommended for larger dataset). In disk transpose the code was not filtering features that were not supposed to be included in training and it was also not creating a feature map when one was supposed to be created. Hence a null feature map was passed to GAM predictor which was not resilient to it.

You are right, they are two issues but they are also related.

@shauheen
Copy link
Contributor

closed by #122

@ghost ghost locked as resolved and limited conversation to collaborators Mar 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants