Skip to content

Fix random state not being used for sampling configurations #1329

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Dec 13, 2021

Conversation

eddiebergman
Copy link
Contributor

@eddiebergman eddiebergman commented Nov 29, 2021

This PR:

  • Keeps random_state in tests as None so that further bad configurations can still surface, all be it randomly.
  • Forwards on the random_state to Automl._create_search_space so the non-test code is deterministic.
  • Add's documentation to the classification pipeline tests.

Copy link
Contributor

@mfeurer mfeurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good, but is there a reason you only touched the classification pipeline test and not the regression ones, too?

@eddiebergman
Copy link
Contributor Author

I labelled it with "PR: In Progress", I will do the regression ones as well, also I have to take it back out of our randomized test where we randomly select sample configurations too.

I'll send you a review request when it's ready

@eddiebergman
Copy link
Contributor Author

So I ended up removing random_state from the tests again, this will slowly allow invalid configurations to bubble up and we deal with them while we can. This PR now only fixes the AutoML._create_search_space and adds some documentation to the tests.

@codecov
Copy link

codecov bot commented Dec 1, 2021

Codecov Report

Merging #1329 (da17589) into development (3761f9b) will increase coverage by 0.40%.
The diff coverage is 100.00%.

Impacted file tree graph

@@               Coverage Diff               @@
##           development    #1329      +/-   ##
===============================================
+ Coverage        88.05%   88.46%   +0.40%     
===============================================
  Files              140      140              
  Lines            11163    11811     +648     
===============================================
+ Hits              9830    10449     +619     
- Misses            1333     1362      +29     
Impacted Files Coverage Δ
autosklearn/util/pipeline.py 100.00% <100.00%> (+7.50%) ⬆️
...ine/components/classification/gradient_boosting.py 93.04% <0.00%> (-0.87%) ⬇️
...preprocessing/imputation/categorical_imputation.py 96.29% <0.00%> (-0.77%) ⬇️
...ning/optimizers/metalearn_optimizer/metalearner.py 96.34% <0.00%> (+0.23%) ⬆️
autosklearn/estimators.py 93.93% <0.00%> (+0.51%) ⬆️
autosklearn/pipeline/components/base.py 79.63% <0.00%> (+0.59%) ⬆️
...pipeline/components/regression/gaussian_process.py 97.91% <0.00%> (+0.61%) ⬆️
autosklearn/evaluation/abstract_evaluator.py 93.05% <0.00%> (+0.77%) ⬆️
...ata_preprocessing/categorical_encoding/encoding.py 97.36% <0.00%> (+0.93%) ⬆️
autosklearn/smbo.py 89.14% <0.00%> (+1.14%) ⬆️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3761f9b...da17589. Read the comment docs.

@eddiebergman eddiebergman requested a review from mfeurer December 2, 2021 05:19
classifier = SimpleClassificationPipeline(
random_state=1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now inconsistent. The random_state is not dropped in a few unit tests above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's because this test relies an an accuracy score, it needs to be fixed as not all configurations will get 96%. The other tests are not specifically for metrics.

@eddiebergman eddiebergman merged commit 88ad023 into development Dec 13, 2021
@mfeurer mfeurer deleted the fix_pipelines_not_getting_random_state branch December 14, 2021 08:36
@eddiebergman eddiebergman mentioned this pull request Jan 24, 2022
eddiebergman added a commit that referenced this pull request Jan 25, 2022
* Added random state to classifiers

* Added some doc strings

* Removed random_state again

* flake'd

* Fix some test issues

* Re-added seed to test

* Updated test doc for unknown test

* flake'd
@eddiebergman eddiebergman mentioned this pull request Jan 25, 2022
eddiebergman added a commit that referenced this pull request Aug 18, 2022
* Added random state to classifiers

* Added some doc strings

* Removed random_state again

* flake'd

* Fix some test issues

* Re-added seed to test

* Updated test doc for unknown test

* flake'd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants