Skip to content

Fixing hps remain active & meta hp configuration old #1489

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 40 commits into from

Conversation

Louquinze
Copy link
Collaborator

fixing the issue that metalearning tries to use every hp defined in the csv files.

Also fixing the hps remain active bug.

…he csv files.

Also fixing the hps remain active bug.
@codecov
Copy link

codecov bot commented Jun 1, 2022

Codecov Report

Merging #1489 (414b28f) into development (0ae2463) will increase coverage by 0.45%.
The diff coverage is 94.78%.

❗ Current head 414b28f differs from pull request most recent head 859163e. Consider uploading reports for the commit 859163e to get more accurate results

@@               Coverage Diff               @@
##           development    #1489      +/-   ##
===============================================
+ Coverage        83.83%   84.29%   +0.45%     
===============================================
  Files              153      154       +1     
  Lines            11694    11741      +47     
  Branches          2047     2044       -3     
===============================================
+ Hits              9804     9897      +93     
+ Misses            1339     1298      -41     
+ Partials           551      546       -5     

Impacted file tree graph

Louquinze added 5 commits June 1, 2022 13:51
…he csv files.

Also fixing the hps remain active bug.
…he csv files.

Also fixing the hps remain active bug.
…he csv files.

Also fixing the hps remain active bug.
…he csv files.

Also fixing the hps remain active bug.
@eddiebergman eddiebergman linked an issue Jun 10, 2022 that may be closed by this pull request
11 tasks
@eddiebergman eddiebergman added this to the V0.15 milestone Jun 10, 2022
@eddiebergman eddiebergman added the maintenance Internal maintenance label Jun 10, 2022
@eddiebergman
Copy link
Contributor

Hey @Louquinze,

Can you give a quick summary of how you did this and how to review it, it's a lot of files to check

@@ -34,6 +34,7 @@ class BasePipeline(Pipeline):

def __init__(
self,
feat_type=None,
config=None,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add types here since there no types for the other attributes ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please. We need to somehow start :)

@@ -94,7 +95,7 @@ def get_properties(dataset_properties=None):
}

@staticmethod
def get_hyperparameter_search_space(dataset_properties=None):
def get_hyperparameter_search_space(feat_type=None, dataset_properties=None):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add annotanies here ? dataset_properties also has no annotations

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please. You could also add the annotation for dataset_properties. They should be importable, too.

@Louquinze
Copy link
Collaborator Author

  1. automl.py: pass datamangaer instead of datamanager.info since datamanager also stores the feat_type of the model
  2. all classifiers / regressors become a new feat_type argument, which maps the column to the data type
  3. askl2.py: ignore HPs in the meta configuration which not exit in the current search space
  4. aslib_simple.py: ignore HPs in the meta configuration which not exit in the current search space
  5. meta_base.py: pass config space to AlgorithmSelectionProblem to exclude hps which are in the meta config but not in the current search space
  6. feat_type.py: self._transformer holds the column transformers which can be categorical, numerical or text. But including them all by default resutls in recusivly building a to huge search space. Therfore pass feat_type argument and only select the transformer needed for the specific task.

Copy link
Contributor

@eddiebergman eddiebergman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments around but otherwise looks good. Seems like it was a lot of models to modify but it makes sense.

Can we test that unnecessary hyperparameters are not included? I.e. if you use Automl::fit(configuration_space_only=True), it should be quick enough and we can use the configuration space from that to determine if the changes are actually taking place as intended.

_member = {
key: member[key]
for key in member
if key in scenario.cs.get_hyperparameter_names()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor optimization, can we compute scenario.cs.get_hyperparameter_names() before the dict comprehension? The function will get called for every key in memeber.

def f():
    print("hello")
    return ["a", "b", "c"]

x = [key for key in "abcdefghi" if key in f()]
# hello
# hello
# hello
# ...

_member = {
key: member[key]
for key in member
if key in scenario.cs.get_hyperparameter_names()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

if not value or hp_name == "idx":
continue

if hp_name not in self.cs.get_hyperparameter_names():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here to move it outside the loop. I checked the code and calling the function does some processing.

Comment on lines 157 to 164
try:
# columns = [str(col) for col in columns]
pass
except Exception as e:
raise ValueError(
f"Train data has columns={expected} yet the"
f" feat_types are feat={columns}\n"
f"Exception: {e}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this whole try: except wont trigger since there's nothing to try

models = [MyDummyRegressor(config=1, random_state=seed) for _ in range(5)]
models = [
MyDummyRegressor(
feat_type={i: "numerical" for i in range(4)},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this is going to fail if X data doesn't match this description. Could we make feat_types an optional argument that defaults to None and just does {i: "numerical" for i in range(X.shape[1])

Comment on lines 46 to 53
models = [
MyDummyClassifier(
feat_type={i: "numerical" for i in range(4)},
config=1,
random_state=seed,
)
for _ in range(5)
]
Copy link
Contributor

@eddiebergman eddiebergman Jun 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above

X, y = data_maker(random_state=0)
estimator = estimator_class(
feat_type={i: "numerical" for i in range(X.shape[0])}, config=1, random_state=0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess X.shape[1] for the columns?

@Louquinze
Copy link
Collaborator Author

Some minor comments around but otherwise looks good. Seems like it was a lot of models to modify but it makes sense.

Can we test that unnecessary hyperparameters are not included? I.e. if you use Automl::fit(configuration_space_only=True), it should be quick enough and we can use the configuration space from that to determine if the changes are actually taking place as intended.

But do the automl fit function provide a "config_space_only" argument ?

@Louquinze
Copy link
Collaborator Author

And it is still an issue that the early_stopping_test fails. But all other test are working. So i think this might happen because somewhere a feat_type is not correctly set in the test it self ?

@eddiebergman
Copy link
Contributor

eddiebergman commented Jun 16, 2022

regarding 1:

def fit(
self,
X: SUPPORTED_FEAT_TYPES,
y: SUPPORTED_TARGET_TYPES,
task: Optional[int] = None,
X_test: Optional[SUPPORTED_FEAT_TYPES] = None,
y_test: Optional[SUPPORTED_TARGET_TYPES] = None,
feat_type: Optional[list[str]] = None,
dataset_name: Optional[str] = None,
only_return_configuration_space: bool = False,
load_models: bool = True,
is_classification: bool = False,
):

Second point on early stopping, I checked the latest tests for this PR that completed and reported something, it seems there's a lot of errors still around. I imagine as soon as these are cleaned up in full then the early stopping one goes away. If you manage to clean up the other errors and it's still there then I'll take a proper deep look.

Whether it's related to feat_type or not, I don't really know but it very much follows the same construction as all the other test_automl tests. There is no explicit feat_types passed but neither do many of the other automl tests. My best advice is just look at that test and compare to ones that pass.

@Louquinze
Copy link
Collaborator Author

Louquinze commented Jun 16, 2022

.

test/test_automl/test_early_stopping.py is now working on my local version :) think we should wait for the test on github to be successful but i think i am finally done .

I forgot to pass the feat_type in base.py:223

@eddiebergman
Copy link
Contributor

gonna benchmark this now :)

@Louquinze Louquinze changed the title Fixing hps remain active & meta hp configuration Fixing hps remain active & meta hp configuration old Jul 3, 2022
@Louquinze Louquinze closed this Jul 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Internal maintenance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Text preprocessing V2 TODOs
3 participants