-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Issue training #3800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is likely an instance of a cross-validation fold failing. It fails due to not having enough samples to always have both classes. This is being fixed in #3794 |
My original file had over 200 training lines, which is similar to the Wikipedia training set? |
I'll transfer this issue to the ML.NET repo since it is related to the framework, not the samples, ok? |
I have now altered my test data to have a 30+% split of positive results, and the training works. Thanks! |
@woanware: You may want to set a weight column too, which will preserve the original true/false ratio. Upsampling your positive class (or downsampling your negative class) changes the ratio of true/false that your trainer sees. This will cause the model to predict Also, if you're upsampling, ensure you split your dataset first, then upsample. Otherwise duplicate rows will be seen again in the test set causing your metrics to be no longer representative, which is a form of data leakage. |
I have tried creating a simple data and performing the training like so
Here is an example of the data set which is reduced from my original, but shows the format:
Every time I try and run the command I get the following error:
I originally tried it via VS2019 and the latest version of ML.Net, but that failed, so I tried it using the binary directly
The text was updated successfully, but these errors were encountered: