-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Fixing hps remain active & meta hp configuration old #1489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…he csv files. Also fixing the hps remain active bug.
Codecov Report
@@ Coverage Diff @@
## development #1489 +/- ##
===============================================
+ Coverage 83.83% 84.29% +0.45%
===============================================
Files 153 154 +1
Lines 11694 11741 +47
Branches 2047 2044 -3
===============================================
+ Hits 9804 9897 +93
+ Misses 1339 1298 -41
+ Partials 551 546 -5 |
…he csv files. Also fixing the hps remain active bug.
…he csv files. Also fixing the hps remain active bug.
…he csv files. Also fixing the hps remain active bug.
…he csv files. Also fixing the hps remain active bug.
Hey @Louquinze, Can you give a quick summary of how you did this and how to review it, it's a lot of files to check |
autosklearn/pipeline/base.py
Outdated
@@ -34,6 +34,7 @@ class BasePipeline(Pipeline): | |||
|
|||
def __init__( | |||
self, | |||
feat_type=None, | |||
config=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add types here since there no types for the other attributes ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please. We need to somehow start :)
@@ -94,7 +95,7 @@ def get_properties(dataset_properties=None): | |||
} | |||
|
|||
@staticmethod | |||
def get_hyperparameter_search_space(dataset_properties=None): | |||
def get_hyperparameter_search_space(feat_type=None, dataset_properties=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add annotanies here ? dataset_properties also has no annotations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please. You could also add the annotation for dataset_properties
. They should be importable, too.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments around but otherwise looks good. Seems like it was a lot of models to modify but it makes sense.
Can we test that unnecessary hyperparameters are not included? I.e. if you use Automl::fit(configuration_space_only=True)
, it should be quick enough and we can use the configuration space from that to determine if the changes are actually taking place as intended.
autosklearn/experimental/askl2.py
Outdated
_member = { | ||
key: member[key] | ||
for key in member | ||
if key in scenario.cs.get_hyperparameter_names() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor optimization, can we compute scenario.cs.get_hyperparameter_names()
before the dict comprehension? The function will get called for every key
in memeber
.
def f():
print("hello")
return ["a", "b", "c"]
x = [key for key in "abcdefghi" if key in f()]
# hello
# hello
# hello
# ...
autosklearn/experimental/askl2.py
Outdated
_member = { | ||
key: member[key] | ||
for key in member | ||
if key in scenario.cs.get_hyperparameter_names() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
if not value or hp_name == "idx": | ||
continue | ||
|
||
if hp_name not in self.cs.get_hyperparameter_names(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here to move it outside the loop. I checked the code and calling the function does some processing.
try: | ||
# columns = [str(col) for col in columns] | ||
pass | ||
except Exception as e: | ||
raise ValueError( | ||
f"Train data has columns={expected} yet the" | ||
f" feat_types are feat={columns}\n" | ||
f"Exception: {e}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this whole try: except
wont trigger since there's nothing to try
autosklearn/pipeline/components/feature_preprocessing/nystroem_sampler.py
Show resolved
Hide resolved
test/fixtures/ensembles.py
Outdated
models = [MyDummyRegressor(config=1, random_state=seed) for _ in range(5)] | ||
models = [ | ||
MyDummyRegressor( | ||
feat_type={i: "numerical" for i in range(4)}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this is going to fail if X data doesn't match this description. Could we make feat_types
an optional argument that defaults to None and just does {i: "numerical" for i in range(X.shape[1])
test/fixtures/ensembles.py
Outdated
models = [ | ||
MyDummyClassifier( | ||
feat_type={i: "numerical" for i in range(4)}, | ||
config=1, | ||
random_state=seed, | ||
) | ||
for _ in range(5) | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above
X, y = data_maker(random_state=0) | ||
estimator = estimator_class( | ||
feat_type={i: "numerical" for i in range(X.shape[0])}, config=1, random_state=0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess X.shape[1]
for the columns?
But do the automl fit function provide a "config_space_only" argument ? |
And it is still an issue that the |
regarding 1: auto-sklearn/autosklearn/automl.py Lines 532 to 544 in 9002fca
Second point on early stopping, I checked the latest tests for this PR that completed and reported something, it seems there's a lot of errors still around. I imagine as soon as these are cleaned up in full then the early stopping one goes away. If you manage to clean up the other errors and it's still there then I'll take a proper deep look. Whether it's related to |
I forgot to pass the feat_type in |
gonna benchmark this now :) |
fixing the issue that metalearning tries to use every hp defined in the csv files.
Also fixing the hps remain active bug.