Multiclass LightGBM bug #3878

yaeldekel · 2019-06-18T19:34:03Z

LightGBM trainer has two non-readonly fields called _numClass and _tlcNumClass. The second one is used to determine the number of predictors in the OVA predictor. However, the value of _tlcNumClass is only updated once, so if Fit is called again on the same estimator, it might give the wrong number of classes.

The text was updated successfully, but these errors were encountered:

antoniovs1029 · 2019-12-11T00:37:16Z

I've just taken a quick look to this. To do it, I modified the Multiclass LightGbm sample, and added code so to Fit again the pipeline with a dataview that has labels 1-4 (whereas the original dataview used to fit the pipeline the first time, had labels 1-3). As a result, when fitting again the pipeline _tlcNumClass was still set to be "3" (instead of 4), and when printing the metrics, only 3 labels were taken into account, ignoring the last one.

I believe this is the issue @yaeldekel is describing, right?

If this is the issue, wouldn't this be solved by simply moving the initialization of _tlcNumClass out of this if statement, so that it would always be initialized when fitting the pipeline?

justinormont · 2019-12-11T06:18:16Z

What does calling Fit() twice do?

If it's not LightGBM's task=refit, can we expose it?

This may serve the needs of AutoML, which is looking for streamable trees. This would let us fit the tree structure on ~10 to 100GB of data, then stream the whole dataset (TBs) to refit the leaf node values. There's a similar option by using TreeFeat + linear model.
/cc @daholste

yaeldekel · 2019-12-11T18:24:29Z

The estimators are intended to be stateless, so calling Fit() twice should produce exactly the same result as defining two estimators and calling Fit() once on each of them (except, perhaps, for any randomness used during training).
Regarding LightGBM refit, is it capable of doing something that cannot be done using TreeFeat + linear model? If the answer is yes, could you open a new issue for it?

…trainer. (#4608) * Reset _numberOfClassesIncludingNan everytime the trainer is fitted. * Renamed some variables and added comments to make the code more legible * Other minor changes in LightGBM classes

yaeldekel added the P0 Priority of the issue for triage purpose: IMPORTANT, needs to be fixed right away. label Jun 18, 2019

wschin self-assigned this Jun 27, 2019

antoniovs1029 self-assigned this Dec 11, 2019

antoniovs1029 mentioned this issue Dec 31, 2019

Fixes #3878. About calling Fit more than once on Multiclass LightGBM trainer. #4608

Merged

antoniovs1029 closed this as completed in #4608 Jan 7, 2020

artemiusgreat mentioned this issue Mar 5, 2020

Dynamic number of features for the trainer / schema #4903

Closed

ghost locked as resolved and limited conversation to collaborators Mar 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiclass LightGBM bug #3878

Multiclass LightGBM bug #3878

yaeldekel commented Jun 18, 2019

antoniovs1029 commented Dec 11, 2019 •

edited

Loading

justinormont commented Dec 11, 2019

yaeldekel commented Dec 11, 2019

Multiclass LightGBM bug #3878

Multiclass LightGBM bug #3878

Comments

yaeldekel commented Jun 18, 2019

antoniovs1029 commented Dec 11, 2019 • edited Loading

justinormont commented Dec 11, 2019

yaeldekel commented Dec 11, 2019

antoniovs1029 commented Dec 11, 2019 •

edited

Loading