Skip to content

Enabling Tests that consume Zbaseline files. #78

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Enabling Tests that consume Zbaseline files. #78

wants to merge 1 commit into from

Conversation

Anipik
Copy link
Contributor

@Anipik Anipik commented May 8, 2018

.txt, -rp.txt and -out.txt are copied from the TLC project and values are edited in their last decimal places to match the results.

.ini files are copied from the output folder as i was not able to find them in TLC project.

Some Datasets are also added to the repo which were required by these tests.

The remaining tests are not enabled because they fail due to some other error and not because zbaselines or datasets are not present.

Update :-
The repo already contains zbaselines for the test being enabled

cc @eerhardt @danmosemsft @codemzs

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented May 8, 2018

rp.txt, ini files, txt files with scored instances are ok, but -out.txt created through running maml.exe on windows with paths to files been replaced by regex to something like %Data% or %Output%.
In current moment our regex works only for windows file system, and even if you port all this files, they wouldn't work on Linux and Mac. Tbh, I'm not even sure we want to keep current testing infrastructure and I don't see point of bringing it artifacts to github. #Closed

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of bringing this files? And your PR says, "Enabling Tests". I don't see any changes in test files, are they coming in next iteration?

@Anipik
Copy link
Contributor Author

Anipik commented May 8, 2018

@Ivanidzo4ka

What's the point of bringing this files? And your PR says, "Enabling Tests". I don't see any changes in test files, are they coming in next iteration?

This change enables some tests. You can see that tests being enabled here
https://github.com/Anipik/machinelearning/blame/tests/test/Microsoft.ML.Predictor.Tests/TestPredictors.cs

In current moment our regex works only for windows file system

Me and Zeeshan added the regex which does this for Unix and mac too.
https://github.com/Anipik/machinelearning/blob/tests/test/Microsoft.ML.TestFramework/BaseTestBaseline.cs#L57

even if you port all this files, they wouldn't work on Linux and Mac

The tests run fine on Linux and mac. We changed the behaviour in how we used to compare the files.
we added tolerance in matching the numbers.
https://github.com/Anipik/machinelearning/blob/tests/test/Microsoft.ML.TestFramework/BaseTestBaseline.cs#L601 #Closed

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented May 8, 2018

This change enables some tests. You can see that tests being enabled here
I guess codeflow just cut files from PR after certain amount, because all I can see in it is ZBaseline changes (last one is in FastRankRegression)
Me and Zeeshan added the regex which does this for Unix and mac too.
cool.

Any chance you can break you PR into few PR with reasonable amount of files? Codeflow is not working, and I can't find test file in github web UI.


In reply to: 387482606 [](ancestors = 387482606)

@Anipik
Copy link
Contributor Author

Anipik commented May 8, 2018

Any chance you can break you PR into few PR with reasonable amount of files? Codeflow is not working, and I can't find test file github web UI.

I have a better suggestion, i can break this commit into smaller commits. Then you will be able to see the these commits in the github web UI. Will that be Okay ? #Closed

@Ivanidzo4ka
Copy link
Contributor

As long it's something manageable, I'm fine with that. 471 files is hard to review.


In reply to: 387499988 [](ancestors = 387499988)

@shauheen shauheen requested review from codemzs and zeahmed May 8, 2018 18:45
@Ivanidzo4ka
Copy link
Contributor

age, workclass, fnlwgt, education, education-num, marital-status, occupation, relationship, race, sex, capital-gain, capital-loss, hours-per-week, native-country, label(IsOver50K)

AFAIK we need confirmation from CELA regarding redistributing this dataset, and all others. At least I remember mails to CELA department with questions can we redistribute breast-cancer, iris, and other dataset we currently have in in test folder.


Refers to: test/data/adult.test:1 in 690443f. [](commit_id = 690443f, deletion_comment = False)

@Anipik
Copy link
Contributor Author

Anipik commented May 8, 2018

@Ivanidzo4ka ASFAIK the datasets already in the repo are CELA approved. (We made sure that before making it public)
Adult.train and adult.test are already in the repo in Samples\UCI making it public
@eerhardt told me that the housing one is not cela approved but the breast cancer is cela approved.
I will find out about the status of Autosample dataset

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented May 8, 2018

Adult.train and adult.test are already in the repo in Samples\UCI making it public
Is there are any reason to have two similar datasets in two different directories? Or you delete them in last commit, sorry it's hard to tell in 30 different commits (especially if codeflow breaks after certain amount of files which is few last commits, is where are anything good other than codeflow/github web ui?).

breast-cancer-weights-quarter is modification of public dataset, housing is same (it's a UCI dataset in which we change column orders), so I'm not sure is it modification of data makes it not cela approved or something else.

I have PR to replace housing with wine dataset which we download from UCI (PR #17) (I have hopes to polish it soon) so maybe you can change housing to wine as well (although it will require major baseline modification.

Sorry for be nagging cat, only excuse I can come up with, is unfamiliarity with all this OSS process.


In reply to: 387530660 [](ancestors = 387530660)

@danmoseley danmoseley requested a review from shauheen May 9, 2018 10:30
@Anipik
Copy link
Contributor Author

Anipik commented May 9, 2018

@Ivanidzo4ka i will remove the adult.train and adult.test in the next commit. Then i will currently disable the tests currently using the housing.txt. (Can turn them on later after your PR has been merged)
Whom should i ask about breast-cancer-weights-quater and autosample being clea approved or not ?

@Ivanidzo4ka
Copy link
Contributor

Whom should i ask about breast-cancer-weights-quater and autosample
Probably CELA team, @shauheen should knew contacts. Breast cancer is derivative of original breast cancer file. Auto-sample looks like derivative of https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data


In reply to: 387782207 [](ancestors = 387782207)

@Anipik
Copy link
Contributor Author

Anipik commented May 14, 2018

@dotnet-bot test this please

@TomFinley
Copy link
Contributor

Hi @Anipik , thanks much for writing this. Migrating and re-enabling baseline tests is important work.

So there are some things I find somewhat confusing about this, and perhaps you can help me out. The title to this PR is "Enabling Tests that consume Zbaseline files". Yet I see here, if I look at the PR, changes to ZBaselines (that is, the files we compare the output against), deletion of Samples/UCI/readme.md (which in its state wasn't terribly useful anyway), but I see no changes to any files in the unit test.

So: I might have expected that if a test is being enabled, there would be some change in those files. Is it not so? Or do I misunderstand the purpose of this PR?

Perhaps also a linked issue would be helpful, to understand the wider context.

@Anipik
Copy link
Contributor Author

Anipik commented May 14, 2018

@TomFinley the first commit enables the test c8315d8
the enabled tests are in testpredictors.cs file

xadupre pushed a commit to xadupre/machinelearning that referenced this pull request May 18, 2018
@Anipik
Copy link
Contributor Author

Anipik commented May 24, 2018

@shauheen @Ivanidzo4ka I am enabling one test at a time. I will create multiple small PRs for it. I will update the description

@Anipik Anipik deleted the tests branch May 28, 2018 16:46
eerhardt pushed a commit to eerhardt/machinelearning that referenced this pull request Jul 27, 2018
@ghost ghost locked as resolved and limited conversation to collaborators Mar 31, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants