Skip to content

Issue Classification Scenario Fails After Multiple Refinements #661

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nate-george-buck opened this issue Apr 8, 2020 · 5 comments
Closed
Assignees
Labels
Bug Something isn't working Reported by: Customer
Milestone

Comments

@nate-george-buck
Copy link

System Information (please complete the following information):

  • Model Builder Version: 16.0.2003.302
  • Visual Studio 2019 Enterprise

Describe the bug
Using the Issue Classification Scenario:
After a failed attempt to Train, I revisited the Data screen and changed some of the columns. I then attempted another Train that failed again. I changed more of the columns settings and attempted a third time to Train. This time an error was thrown as in the below image.

To Reproduce
Steps to reproduce the behaviour:

  1. Go to 'Scenario' and select 'Issue Classification'
  2. Select a file with multiple text columns using ';' as the delimiter.
  3. Select a 'Label' and as many 'Feature' columns as reasonable.
  4. Click 'Train'
  5. Input 180 as the 'Time to train'
  6. Click 'Start training'
  7. If no exception is thrown, edit 'Feature' columns and repeat steps 4-6.
  8. Exception is thrown.

Expected behaviour
I expected to be able to repeat steps 3-6 as many times as necessary to complete a successful training session.

Screenshots
image

Additional context
As I'm using version control, I reverted my version and tried again. I also tried restarting VS. Now every time I attempt to train, the exception is thrown. I would attach the data source file, but it contains sensitive database schema information.

@nate-george-buck
Copy link
Author

More Additional Context:
The first attempt to train produced this exception:
image

@LittleLittleCloud LittleLittleCloud added this to the April 2020 milestone Apr 8, 2020
@LittleLittleCloud LittleLittleCloud added Bug Something isn't working Reported by: Customer labels Apr 8, 2020
@LittleLittleCloud
Copy link
Contributor

Looks like a bug to me as issue classification should be able to accept all data type as features.
Is that possible for you to provide part of your dataset, you can clear schema information if you have concern over security matters.

@nate-george-buck
Copy link
Author

Thanks for your quick response to this issue! Here is a copy of the file with all the characters scrambled. The instances of characters are the same, so it should generate the same results. When I encountered the exception, it was using the 'Purpose' column as the Label. Using a different column ('Returns') did not produce an exception, although leaving it running through the night did not result in any predictions.
Scrambled.txt

@LittleLittleCloud
Copy link
Contributor

LittleLittleCloud commented Apr 21, 2020

Your scrambled.txt has newline char in string, which can't be loaded in ModelBuilder because of an existing bug: dotnet/machinelearning#4464. And since it's all text-based, ModelBuilder can't detect it's delimiter either, which should be a bug in Prose.

And after I replace the /n char with /newline and replace all ',' to ' ', filter out all the lines which # of '; ;' is not 6 (the # of ';' in header). the dataset seems to be able running in ModelBuilder without error. But it takes forever to give a result. which may be related to your Scrambled dataset, which is all text-bases and some of text is really long. Transforming text field can be really time consuming, for detailed info, you can take a look at this Issue (#596). And after I reduce the Scrambled data from 2700+ lines to 100 lines, I can get a result within 100s.

So do you also change column type when scramble dataset, what's your system's language? This issue seems really wield, could you provides more info?

@LittleLittleCloud
Copy link
Contributor

Closing this issue due to no response, if you're still seeing a problem, feel free to reopen it, thanks for reporting!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working Reported by: Customer
Projects
None yet
Development

No branches or pull requests

2 participants