Is there a way to retrieve all trained models during the trial, not just the ones in the best model? #1376

DariaTkachova · 2022-01-21T17:42:12Z

Short Question Description

We need an alternative method to sklearn_ensemble.get_models_with_weights() to retrieve all trained models.
Please also kindly confirm if the auto-sklearn ensemble contain all the trained models.

Extra context

We found that the function get_models_with_weights() retrieves trained models with weights above zero, but would like to retrieve all models generated by auto-sklearn during a trial run.

 def get_models_with_weights(self, models: BasePipeline) -> List[Tuple[float, BasePipeline]]:

  output = []
  for i, weight in enumerate(self.weights_):
      if weight > 0.0: #TODO: find a way around this
         identifier = self.identifiers_[i]
          model = models[identifier]
     output.append((weight, model))

  output.sort(reverse=True, key=lambda t: t[0])

  return output

The new show_models() function which would return a dictionary of models in ensemble as described in: Changes show_models() function to return a dictionary of models in ensemble #1321 returns an estimator not the actual model as per the code snippet below:
```
 model_type, autosklearn_wrapped_model = model.steps[-1]
 model_dict['sklearn_model'] = autosklearn_wrapped_model.choice.estimator
```

System Details

OS: Mac BigSur
Using a Docker container environment
Python: 3.6.2
Auto-sklearn version: 0.13.0

The text was updated successfully, but these errors were encountered:

eddiebergman · 2022-01-22T21:44:25Z

Hi @DariaTkachova,

This is not very intuitive but I am currently working on a reworked backend that should make all these model based queries much simpler in the future.

For now, I would refer you to issue #1206, specifically this reply. This is how you can load any of the particular models that were trained.

Is there any particular reason you need the Autosklearn wrapper versions of the estimators and not the sklearn estimators in show_models()?

P.S. Python 3.6 has reached it's end of life so future version will stop explicitly supporting it, they may however still work for a while.

Best,
Eddie

DariaTkachova · 2022-01-26T14:46:15Z

Hi @eddiebergman

Thank you for getting back to me.
I have tried your suggested solution, but I’m only able to retrieve the models in the ensemble.

Are the other Autosklearn generated models being saved and how can they be retrieved/accessed?

From our understanding, autosklearn works as follows:

Fit different models (let's say 100 are successfully fitted during the time allocated).
Ensemble a couple of fitted models to create the best model (let's say 14 are picked in the best model).
Return the ensembled "best model”.

From the approach you suggest, we were only able to retrieve the 14, and not the other 86. Is there a way to retrieve the ones fitted but not selected as ensemble?

Example code:

wanted_model_id = 29 

for (seed, model_id, budget), model in sk.automl.automl_.models_.items(): 
    print("This is model id: {}".format(model_id))
    if model_id == wanted_model_id:
        all_models_generated.append(model)
        print(model)

Output:

This is model id: 6
This is model id: 5
This is model id: 51
This is model id: 24
This is model id: 60
This is model id: 102
This is model id: 2
This is model id: 63
This is model id: 11
This is model id: 21
This is model id: 91
This is model id: 93
This is model id: 39
This is model id: 42

Re: Is there any particular reason you need the Autosklearn wrapper versions of the estimators and not the sklearn estimators in show_models()?

We do prefer to use the sklearn estimators / models as opposed to the autosklearn wrapped models, however we need to be able to pre-process data as is done by autosklearn. Do you perhaps have an example of how this may be done using show_models()?

eddiebergman · 2022-02-01T14:25:59Z

Hi @DariaTkachova, apologies for the delayed reply.

Regarding accessing models that were not included in the ensemble, this is currently not intuitive to do and unlikely to bring you much better results. Furthermore, the backend has been rewritten and will likely have a much different API by the end of this month, to facilitate exactly these kind of requests. However, if urgent:

Models can be loaded using this function clf.automl_._backend.load_model_by_seed_and_id_and_budget(seed, id, budget) for which these identifiers can be gotten similar to this line.

We do some minor transformations before data is passed to the pipeline in InputValidator::transform but these are mainly surrounding dataframes, sparse matrices and validation, not actual transformations.

If you follow the code through refit to fit_with_budget you can see that the models we load don't have anything extra attached to them for transforming data, other than what's contained in the models themselves.

DariaTkachova · 2022-02-07T08:25:13Z

Hi @eddiebergman

Thank you kindly for the feedback.
We have implemented the solution below due to being on autosklearn version 0.13.0. We will patch our implementation once version 0.15.0 is available.

import re

seed = 42
for model_details in clf.automl_._backend.list_all_models(seed):
    budget = re.search(r"(?<=\_).+?(?=\/)", model_details)
    budget = budget.group()
    budget = float(budget[budget.index("_") + 1 :])
    model_id = re.search(r"(?<=\_).+?(?=\_)", model_details)
    model_id = int(model_id.group())
    backend = clf.automl_._backend
    model = backend.load_model_by_seed_and_id_and_budget(seed, model_id, budget)

Kind regards
Daria

eddiebergman · 2022-02-07T09:29:50Z

Hi @DariaTkachova,

Glad to hear you got it to work, this snippet will prove helpful for anyone else who has similar questions :)

Best,
Eddie

DariaTkachova changed the title ~~Is there a way to retrieve all the models trained by auto-sklearn including models which have a weight of zero?~~ Is there a way to retrieve all trained models during the trial, not just the ones in the best model? Jan 22, 2022

mfeurer mentioned this issue Jan 24, 2022

convert to scikit learn code. #388

Open

eddiebergman added Feedback-Required question labels Jan 24, 2022

eddiebergman closed this as completed May 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is there a way to retrieve all trained models during the trial, not just the ones in the best model? #1376

Is there a way to retrieve all trained models during the trial, not just the ones in the best model? #1376

DariaTkachova commented Jan 21, 2022 •

edited

Loading

eddiebergman commented Jan 22, 2022 •

edited

Loading

Uh oh!

DariaTkachova commented Jan 26, 2022

Uh oh!

eddiebergman commented Feb 1, 2022

Uh oh!

DariaTkachova commented Feb 7, 2022 •

edited by eddiebergman

Loading

Uh oh!

eddiebergman commented Feb 7, 2022

Uh oh!

Is there a way to retrieve all trained models during the trial, not just the ones in the best model? #1376

Is there a way to retrieve all trained models during the trial, not just the ones in the best model? #1376

Comments

DariaTkachova commented Jan 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Short Question Description

Extra context

System Details

eddiebergman commented Jan 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DariaTkachova commented Jan 26, 2022

Example code:

Output:

Re: Is there any particular reason you need the Autosklearn wrapper versions of the estimators and not the sklearn estimators in show_models()?

Uh oh!

eddiebergman commented Feb 1, 2022

Uh oh!

DariaTkachova commented Feb 7, 2022 • edited by eddiebergman Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eddiebergman commented Feb 7, 2022

Uh oh!

DariaTkachova commented Jan 21, 2022 •

edited

Loading

eddiebergman commented Jan 22, 2022 •

edited

Loading

DariaTkachova commented Feb 7, 2022 •

edited by eddiebergman

Loading