Skip to content

Issue training #3800

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
woanware opened this issue May 31, 2019 · 5 comments
Closed

Issue training #3800

woanware opened this issue May 31, 2019 · 5 comments
Labels
AutoML.NET Automating various steps of the machine learning process bug Something isn't working command-line Issues pertaining to the command-line interface

Comments

@woanware
Copy link

I have tried creating a simple data and performing the training like so

dotnet .\mlnet.dll auto-train --task binary-classification
 --dataset "logons.csv" --label-column-index 0 
--has-header true --max-exploration-time 10

Here is an example of the data set which is reduced from my original, but shows the format:

Valid	 Data
0	 09:00
0	 09:01
0	 09:02
0	 09:03
0	 09:04
0	 09:05
0	 09:06
0	 09:07
1	 12:08
0	 09:09
0	 09:10
0	 09:00
0	 09:01
0	 09:02
0	 09:03
0	 09:04
0	 09:05
0	 09:06
0	 09:07
1	 13:08
0	 09:09
0	 09:10
0	 09:00
0	 09:01
0	 09:02
0	 09:03
0	 09:04
0	 09:05
0	 09:06
0	 09:07
1	 14:08
0	 09:09
0	 09:10

Every time I try and run the command I get the following error:

Exception occured while exploring pipelines:
Training failed with the exception: 
System.ArgumentOutOfRangeException: AUC is not definied 
when there is no positive class in the data
Parameter name: PosSample

I originally tried it via VS2019 and the latest version of ML.Net, but that failed, so I tried it using the binary directly

@justinormont
Copy link
Contributor

justinormont commented May 31, 2019

This is likely an instance of a cross-validation fold failing. It fails due to not having enough samples to always have both classes.

This is being fixed in #3794

@woanware
Copy link
Author

woanware commented May 31, 2019

My original file had over 200 training lines, which is similar to the Wikipedia training set?

@CESARDELATORRE
Copy link
Contributor

I'll transfer this issue to the ML.NET repo since it is related to the framework, not the samples, ok?

@CESARDELATORRE CESARDELATORRE transferred this issue from dotnet/machinelearning-samples May 31, 2019
@woanware
Copy link
Author

I have now altered my test data to have a 30+% split of positive results, and the training works. Thanks!

@justinormont justinormont added AutoML.NET Automating various steps of the machine learning process bug Something isn't working command-line Issues pertaining to the command-line interface labels May 31, 2019
@justinormont
Copy link
Contributor

@woanware: You may want to set a weight column too, which will preserve the original true/false ratio.

Upsampling your positive class (or downsampling your negative class) changes the ratio of true/false that your trainer sees. This will cause the model to predict true more often than your original dataset. If that is unwanted, you can use a weight column to down-weight your positive class.

Also, if you're upsampling, ensure you split your dataset first, then upsample. Otherwise duplicate rows will be seen again in the test set causing your metrics to be no longer representative, which is a form of data leakage.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
AutoML.NET Automating various steps of the machine learning process bug Something isn't working command-line Issues pertaining to the command-line interface
Projects
None yet
Development

No branches or pull requests

3 participants