Description
[Enter feedback here]
This is an important feature however there are several different ways one could come to a conclusion to early stop the training.
At the moment the training iteration goes to a fixed number 100 if I'm not mistaken. Although that number is nice it would be better to provide a early stopping rule, providing a rule would in my opinion override/enhance the NumberOfIterations property.
some simple generic rules would help
- Minimum improvement
- MaxDuration (TimeSpan)
- Number of Iterations
Proposed delegate
- Iteration Nr => (get)
- Maximum Iteration=> (get)
- Pref Score/ Improvement =>(get)
- Current Score/ Improvement => (get)
- Auto-configured [Trainer].Options specific values related to fitting
- StopNow => (get/set)
- GetMetrics(validationdata)
- GetCurrentModel()
This would allow us to early adjust the training to stop if Iteration/ time passed is not improving as per expected value.
Some use cases:
-
Speed up training and quality by make better use of trainer options
- understand the auto - discovered properties and how they are adjusted by
the trainer so that one can better understands what to specify in the
[Trainer].Options like LearningRate etc NumberOfLeaves.
- understand the auto - discovered properties and how they are adjusted by
-
Reporting:
- allows to generate charts showing progress of training (real-time and as log).
- allows joining machine resources with progress in training in reporting/ logging.
-
In Multi-class:
- Additional train a specific class (under over fitting).
- Store "sub models" for specific classes.
-
Generative adversarial network (GAN)
- Hook for joining [N] networks together allowing them.
to improve the other network in a more efficient way.
- Hook for joining [N] networks together allowing them.
-
Adversarial machine learning (spam filter's, vulnerability testing etc)
- hook for additional training on specific exploits and or
make/ save specific models for specific set of exploits/ classes.
- hook for additional training on specific exploits and or
The above list is not a complete set of use cases, just some use cases that would be greatly improve usability of the framework that pop in mind.
I know of no way, at the moment, how this can be done with the current framework without massive waist of resources and time. Training with our in-house framework has this and some of my models train for days big server so iterating / poking around (and waiting) with values is not really an option.
When playing with Iris sample size of data this feature might sound silly as it's done before one can sip a cup of coffee, production development is a bit different.
Document Details
⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.
- ID: 1ee559b0-04b6-5280-ed68-6291d1e2f7cf
- Version Independent ID: abe34fa0-63f1-7501-165b-20b55190dc0b
- Content: LightGbmTrainerBase<TOptions,TOutput,TTransformer,TModel>.OptionsBase.EarlyStoppingRound Field (Microsoft.ML.Trainers.LightGbm)
- Content Source: [dotnet/xml/Microsoft.ML.Trainers.LightGbm/LightGbmTrainerBase
4+OptionsBase.xml](https://github.com/dotnet/ml-api-docs/blob/live/dotnet/xml/Microsoft.ML.Trainers.LightGbm/LightGbmTrainerBase
4+OptionsBase.xml) - Product: dotnet-ml-api
- GitHub Login: @sfilipi
- Microsoft Alias: johalex