-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Multiclass LightGBM bug #3878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I've just taken a quick look to this. To do it, I modified the Multiclass LightGbm sample, and added code so to Fit again the pipeline with a dataview that has labels 1-4 (whereas the original dataview used to fit the pipeline the first time, had labels 1-3). As a result, when fitting again the pipeline I believe this is the issue @yaeldekel is describing, right? If this is the issue, wouldn't this be solved by simply moving the initialization of _tlcNumClass out of this if statement, so that it would always be initialized when fitting the pipeline? |
What does calling If it's not LightGBM's task=refit, can we expose it? This may serve the needs of AutoML, which is looking for streamable trees. This would let us fit the tree structure on ~10 to 100GB of data, then stream the whole dataset (TBs) to refit the leaf node values. There's a similar option by using TreeFeat + linear model. |
The estimators are intended to be stateless, so calling |
…trainer. (#4608) * Reset _numberOfClassesIncludingNan everytime the trainer is fitted. * Renamed some variables and added comments to make the code more legible * Other minor changes in LightGBM classes
LightGBM trainer has two non-readonly fields called
_numClass
and_tlcNumClass
. The second one is used to determine the number of predictors in the OVA predictor. However, the value of_tlcNumClass
is only updated once, so ifFit
is called again on the same estimator, it might give the wrong number of classes.The text was updated successfully, but these errors were encountered: