Fixing hps remain active & meta hp configuration old #1489

Louquinze · 2022-06-01T09:29:19Z

fixing the issue that metalearning tries to use every hp defined in the csv files.

Also fixing the hps remain active bug.

…he csv files. Also fixing the hps remain active bug.

codecov · 2022-06-01T10:26:22Z

Codecov Report

Merging #1489 (414b28f) into development (0ae2463) will increase coverage by 0.45%.
The diff coverage is 94.78%.

❗ Current head 414b28f differs from pull request most recent head 859163e. Consider uploading reports for the commit 859163e to get more accurate results

@@               Coverage Diff               @@
##           development    #1489      +/-   ##
===============================================
+ Coverage        83.83%   84.29%   +0.45%     
===============================================
  Files              153      154       +1     
  Lines            11694    11741      +47     
  Branches          2047     2044       -3     
===============================================
+ Hits              9804     9897      +93     
+ Misses            1339     1298      -41     
+ Partials           551      546       -5

…he csv files. Also fixing the hps remain active bug.

eddiebergman · 2022-06-15T11:31:07Z

Hey @Louquinze,

Can you give a quick summary of how you did this and how to review it, it's a lot of files to check

Louquinze · 2022-06-15T14:15:35Z

autosklearn/pipeline/base.py

@@ -34,6 +34,7 @@ class BasePipeline(Pipeline):

    def __init__(
        self,
+        feat_type=None,
        config=None,


Add types here since there no types for the other attributes ?

Yes, please. We need to somehow start :)

Louquinze · 2022-06-15T14:29:11Z

autosklearn/pipeline/components/feature_preprocessing/nystroem_sampler.py

@@ -94,7 +95,7 @@ def get_properties(dataset_properties=None):
        }

    @staticmethod
-    def get_hyperparameter_search_space(dataset_properties=None):
+    def get_hyperparameter_search_space(feat_type=None, dataset_properties=None):


Add annotanies here ? dataset_properties also has no annotations

Yes, please. You could also add the annotation for dataset_properties. They should be importable, too.

Louquinze · 2022-06-15T14:40:25Z

automl.py: pass datamangaer instead of datamanager.info since datamanager also stores the feat_type of the model
all classifiers / regressors become a new feat_type argument, which maps the column to the data type
askl2.py: ignore HPs in the meta configuration which not exit in the current search space
aslib_simple.py: ignore HPs in the meta configuration which not exit in the current search space
meta_base.py: pass config space to AlgorithmSelectionProblem to exclude hps which are in the meta config but not in the current search space
feat_type.py: self._transformer holds the column transformers which can be categorical, numerical or text. But including them all by default resutls in recusivly building a to huge search space. Therfore pass feat_type argument and only select the transformer needed for the specific task.

eddiebergman

Some minor comments around but otherwise looks good. Seems like it was a lot of models to modify but it makes sense.

Can we test that unnecessary hyperparameters are not included? I.e. if you use Automl::fit(configuration_space_only=True), it should be quick enough and we can use the configuration space from that to determine if the changes are actually taking place as intended.

eddiebergman · 2022-06-16T11:29:29Z

autosklearn/experimental/askl2.py

+                _member = {
+                    key: member[key]
+                    for key in member
+                    if key in scenario.cs.get_hyperparameter_names()


Minor optimization, can we compute scenario.cs.get_hyperparameter_names() before the dict comprehension? The function will get called for every key in memeber.

def f(): print("hello") return ["a", "b", "c"] x = [key for key in "abcdefghi" if key in f()] # hello # hello # hello # ...

eddiebergman · 2022-06-16T11:29:49Z

autosklearn/experimental/askl2.py

+                _member = {
+                    key: member[key]
+                    for key in member
+                    if key in scenario.cs.get_hyperparameter_names()


eddiebergman · 2022-06-16T11:30:58Z

autosklearn/metalearning/input/aslib_simple.py

                    if not value or hp_name == "idx":
                        continue
-
+                    if hp_name not in self.cs.get_hyperparameter_names():


Same here to move it outside the loop. I checked the code and calling the function does some processing.

eddiebergman · 2022-06-16T11:35:37Z

autosklearn/pipeline/components/data_preprocessing/feature_type.py

+                try:
+                    # columns = [str(col) for col in columns]
+                    pass
+                except Exception as e:
+                    raise ValueError(
+                        f"Train data has columns={expected} yet the"
+                        f" feat_types are feat={columns}\n"
+                        f"Exception: {e}"


I think this whole try: except wont trigger since there's nothing to try

autosklearn/pipeline/components/feature_preprocessing/nystroem_sampler.py

eddiebergman · 2022-06-16T11:44:08Z

test/fixtures/ensembles.py

-            models = [MyDummyRegressor(config=1, random_state=seed) for _ in range(5)]
+            models = [
+                MyDummyRegressor(
+                    feat_type={i: "numerical" for i in range(4)},


I feel like this is going to fail if X data doesn't match this description. Could we make feat_types an optional argument that defaults to None and just does {i: "numerical" for i in range(X.shape[1])

eddiebergman · 2022-06-16T11:44:19Z

test/fixtures/ensembles.py

+            models = [
+                MyDummyClassifier(
+                    feat_type={i: "numerical" for i in range(4)},
+                    config=1,
+                    random_state=seed,
+                )
+                for _ in range(5)
+            ]


See comment above

eddiebergman · 2022-06-16T11:45:04Z

test/test_evaluation/test_dummy_pipelines.py

    X, y = data_maker(random_state=0)
+    estimator = estimator_class(
+        feat_type={i: "numerical" for i in range(X.shape[0])}, config=1, random_state=0


I guess X.shape[1] for the columns?

Louquinze · 2022-06-16T13:07:30Z

Some minor comments around but otherwise looks good. Seems like it was a lot of models to modify but it makes sense.

Can we test that unnecessary hyperparameters are not included? I.e. if you use Automl::fit(configuration_space_only=True), it should be quick enough and we can use the configuration space from that to determine if the changes are actually taking place as intended.

But do the automl fit function provide a "config_space_only" argument ?

Louquinze · 2022-06-16T13:09:34Z

And it is still an issue that the early_stopping_test fails. But all other test are working. So i think this might happen because somewhere a feat_type is not correctly set in the test it self ?

eddiebergman · 2022-06-16T13:24:00Z

regarding 1:

auto-sklearn/autosklearn/automl.py

Lines 532 to 544 in 9002fca

    
           def fit( 
        
               self, 
        
               X: SUPPORTED_FEAT_TYPES, 
        
               y: SUPPORTED_TARGET_TYPES, 
        
               task: Optional[int] = None, 
        
               X_test: Optional[SUPPORTED_FEAT_TYPES] = None, 
        
               y_test: Optional[SUPPORTED_TARGET_TYPES] = None, 
        
               feat_type: Optional[list[str]] = None, 
        
               dataset_name: Optional[str] = None, 
        
               only_return_configuration_space: bool = False, 
        
               load_models: bool = True, 
        
               is_classification: bool = False, 
        
           ):

Second point on early stopping, I checked the latest tests for this PR that completed and reported something, it seems there's a lot of errors still around. I imagine as soon as these are cleaned up in full then the early stopping one goes away. If you manage to clean up the other errors and it's still there then I'll take a proper deep look.

Whether it's related to feat_type or not, I don't really know but it very much follows the same construction as all the other test_automl tests. There is no explicit feat_types passed but neither do many of the other automl tests. My best advice is just look at that test and compare to ones that pass.

Louquinze · 2022-06-16T14:16:22Z

.

test/test_automl/test_early_stopping.py is now working on my local version :) think we should wait for the test on github to be successful but i think i am finally done .

I forgot to pass the feat_type in base.py:223

eddiebergman · 2022-06-17T09:51:27Z

gonna benchmark this now :)

fixing the issue that metalearning tries to use every hp defined in t…

0a09821

…he csv files. Also fixing the hps remain active bug.

Louquinze added 5 commits June 1, 2022 13:51

fixing the issue that metalearning tries to use every hp defined in t…

619ccb6

…he csv files. Also fixing the hps remain active bug.

fixing the issue that metalearning tries to use every hp defined in t…

7171534

…he csv files. Also fixing the hps remain active bug.

fixing the issue that metalearning tries to use every hp defined in t…

4631a91

…he csv files. Also fixing the hps remain active bug.

fixing the issue that metalearning tries to use every hp defined in t…

0bad1d4

…he csv files. Also fixing the hps remain active bug.

fixing ensemble builder

5d36fa5

eddiebergman linked an issue Jun 10, 2022 that may be closed by this pull request

Text preprocessing V2 TODOs #1373

Open

11 tasks

eddiebergman assigned Louquinze Jun 10, 2022

eddiebergman added this to the V0.15 milestone Jun 10, 2022

eddiebergman added the maintenance Internal maintenance label Jun 10, 2022

Louquinze added 17 commits June 11, 2022 13:41

fixing ensemble builder

8afbd97

fixing ensemble builder

de09993

fixing ensemble builder

a5c9bad

fixing ensemble builder

7f2f14b

fixing ensemble builder

68b051e

fixing ensemble builder

b940a97

fixing ensemble builder

f124190

fixing ensemble builder

9100749

fixing ensemble builder

3149f8e

fixing ensemble builder

e942807

fixing ensemble builder

a27ca67

fixing ensemble builder

ecb3801

fixing ensemble builder

0f39c36

fixing ensemble builder

cc0ffd2

fixing ensemble builder

561d40e

fixing ensemble builder

f6cc8a5

fixing ensemble builder

37b08b8

Louquinze requested review from mfeurer and eddiebergman June 15, 2022 11:27

Merge branch 'automl:development' into development

1e536db

Louquinze commented Jun 15, 2022

View reviewed changes

fix search space bug

ed3da30

Louquinze added 4 commits June 15, 2022 17:30

fix search space bug

4690854

fix search space bug

aff8c04

fix search space bug

e4e9fe3

fix search space bug

be72171

eddiebergman reviewed Jun 16, 2022

View reviewed changes

Louquinze added 4 commits June 16, 2022 14:45

fix search space bug

34bb58f

fix search space bug

6b0fdb4

fix search space bug

9096ea3

fix search space bug

372d979

Louquinze added 2 commits June 16, 2022 16:13

fix search space bug

3b1105e

fix search space bug

b090ecf

Louquinze added 5 commits June 30, 2022 14:07

include typing

523bbef

include typing

642f18b

include typing

414b28f

del typing

10850da

Merge remote-tracking branch 'origin/development' into development

859163e

Louquinze changed the title ~~Fixing hps remain active & meta hp configuration~~ Fixing hps remain active & meta hp configuration old Jul 3, 2022

Louquinze closed this Jul 3, 2022

Fixing hps remain active & meta hp configuration old #1489

Fixing hps remain active & meta hp configuration old #1489

Uh oh!

Conversation

Louquinze commented Jun 1, 2022

Uh oh!

codecov bot commented Jun 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

eddiebergman commented Jun 15, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Louquinze commented Jun 15, 2022

Uh oh!

eddiebergman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eddiebergman Jun 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Louquinze commented Jun 16, 2022

Uh oh!

Louquinze commented Jun 16, 2022

Uh oh!

eddiebergman commented Jun 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Louquinze commented Jun 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eddiebergman commented Jun 17, 2022

Uh oh!

Uh oh!

codecov bot commented Jun 1, 2022 •

edited

Loading

eddiebergman Jun 16, 2022 •

edited

Loading

eddiebergman commented Jun 16, 2022 •

edited

Loading

Louquinze commented Jun 16, 2022 •

edited

Loading