-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Fixes #3878. About calling Fit more than once on Multiclass LightGBM trainer. #4608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4608 +/- ##
=========================================
Coverage ? 75.64%
=========================================
Files ? 938
Lines ? 168669
Branches ? 18210
=========================================
Hits ? 127584
Misses ? 36056
Partials ? 5029
|
This is one of the trainer fields that are not Refers to: src/Microsoft.ML.LightGbm/LightGbmTrainerBase.cs:289 in 1aaa77f. [](commit_id = 1aaa77f, deletion_comment = False) |
Unrelated: this is Refers to: test/Microsoft.ML.Tests/TrainerEstimators/TreeEstimators.cs:495 in 1aaa77f. [](commit_id = 1aaa77f, deletion_comment = False) |
…ining and GbmOptions
It seems to me the only reason In reply to: 570035614 [](ancestors = 570035614) Refers to: src/Microsoft.ML.LightGbm/LightGbmTrainerBase.cs:291 in 1aaa77f. [](commit_id = 1aaa77f, deletion_comment = False) |
|
||
//MYTODO: Include more initializations, of TrainedEnsemble, for example? | ||
//For example: | ||
//TrainedEnsemble = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these comments can be safely deleted. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixes #3878, by initializing the values of _
tlcNumClass
and_numClass
everytimeFit()
is called on a Multiclass LightGBM trainer. This is done under the assumption that trainers should behave as if they were stateless, and every call toFit()
should re-initialize those values.As discussed offline with @yaeldekel ,
_tlcNumClass
and_numClass
hold the same value if there were no NaN labels on the dataset used for training. If there were NaN labels then_tlcNumClass = _numClass - 1
. Mantaining both fields is necessary, because here NaN labels are replaced to be a new class, and then, here, when training theWrappedLightGbmTraining
it is as if NaN labels were an extra class, but then here, when creating the Predictors, only_tlcNumClass
predictors are created. So, for example, if I have a dataset with 3 classes (0-2), but some rows have NaN values on their labels, then NaN values get converted to "3",_numClass
is equal to "4" andWrappedLightGbmTraining
trains as if there were truly 4 classes... but_tlcNumClass
is equal to "3" and when creating the Predictors, only 3 predictors are created (one for each of the original classes ignoring the "fake NaN class").The above wasn't documented in the code, but after doing some tests it seems that is how it is supposed to behave, and so I added some comments explaining this, as @yaeldekel and I agree that the above isn't really clear directly from reading the code, and the names
_tlcNumClass
and_numClass
are somewhat obscure.I also added a test.