Skip to content

Is there a way to retrieve all trained models during the trial, not just the ones in the best model? #1376

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
DariaTkachova opened this issue Jan 21, 2022 · 5 comments
Labels

Comments

@DariaTkachova
Copy link

DariaTkachova commented Jan 21, 2022

Short Question Description

We need an alternative method to sklearn_ensemble.get_models_with_weights() to retrieve all trained models.
Please also kindly confirm if the auto-sklearn ensemble contain all the trained models.

Extra context

  • We found that the function get_models_with_weights() retrieves trained models with weights above zero, but would like to retrieve all models generated by auto-sklearn during a trial run.

     def get_models_with_weights(self, models: BasePipeline) -> List[Tuple[float, BasePipeline]]:
    
      output = []
      for i, weight in enumerate(self.weights_):
          if weight > 0.0: #TODO: find a way around this
             identifier = self.identifiers_[i]
              model = models[identifier]
         output.append((weight, model))
    
      output.sort(reverse=True, key=lambda t: t[0])
    
      return output
    
  • The new show_models() function which would return a dictionary of models in ensemble as described in: Changes show_models() function to return a dictionary of models in ensemble #1321 returns an estimator not the actual model as per the code snippet below:

     model_type, autosklearn_wrapped_model = model.steps[-1]
     model_dict['sklearn_model'] = autosklearn_wrapped_model.choice.estimator
    

System Details

OS: Mac BigSur
Using a Docker container environment
Python: 3.6.2
Auto-sklearn version: 0.13.0

@DariaTkachova DariaTkachova changed the title Is there a way to retrieve all the models trained by auto-sklearn including models which have a weight of zero? Is there a way to retrieve all trained models during the trial, not just the ones in the best model? Jan 22, 2022
@eddiebergman
Copy link
Contributor

eddiebergman commented Jan 22, 2022

Hi @DariaTkachova,

This is not very intuitive but I am currently working on a reworked backend that should make all these model based queries much simpler in the future.

For now, I would refer you to issue #1206, specifically this reply. This is how you can load any of the particular models that were trained.

Is there any particular reason you need the Autosklearn wrapper versions of the estimators and not the sklearn estimators in show_models()?

P.S. Python 3.6 has reached it's end of life so future version will stop explicitly supporting it, they may however still work for a while.

Best,
Eddie

@DariaTkachova
Copy link
Author

Hi @eddiebergman

Thank you for getting back to me.
I have tried your suggested solution, but I’m only able to retrieve the models in the ensemble.

Are the other Autosklearn generated models being saved and how can they be retrieved/accessed?

From our understanding, autosklearn works as follows:

  1. Fit different models (let's say 100 are successfully fitted during the time allocated).
  2. Ensemble a couple of fitted models to create the best model (let's say 14 are picked in the best model).
  3. Return the ensembled "best model”.

From the approach you suggest, we were only able to retrieve the 14, and not the other 86. Is there a way to retrieve the ones fitted but not selected as ensemble?

Example code:

wanted_model_id = 29 

for (seed, model_id, budget), model in sk.automl.automl_.models_.items(): 
    print("This is model id: {}".format(model_id))
    if model_id == wanted_model_id:
        all_models_generated.append(model)
        print(model)

Output:

This is model id: 6
This is model id: 5
This is model id: 51
This is model id: 24
This is model id: 60
This is model id: 102
This is model id: 2
This is model id: 63
This is model id: 11
This is model id: 21
This is model id: 91
This is model id: 93
This is model id: 39
This is model id: 42

Re: Is there any particular reason you need the Autosklearn wrapper versions of the estimators and not the sklearn estimators in show_models()?

We do prefer to use the sklearn estimators / models as opposed to the autosklearn wrapped models, however we need to be able to pre-process data as is done by autosklearn. Do you perhaps have an example of how this may be done using show_models()?

@eddiebergman
Copy link
Contributor

Hi @DariaTkachova, apologies for the delayed reply.

Regarding accessing models that were not included in the ensemble, this is currently not intuitive to do and unlikely to bring you much better results. Furthermore, the backend has been rewritten and will likely have a much different API by the end of this month, to facilitate exactly these kind of requests. However, if urgent:

We do some minor transformations before data is passed to the pipeline in InputValidator::transform but these are mainly surrounding dataframes, sparse matrices and validation, not actual transformations.

If you follow the code through refit to fit_with_budget you can see that the models we load don't have anything extra attached to them for transforming data, other than what's contained in the models themselves.

@DariaTkachova
Copy link
Author

DariaTkachova commented Feb 7, 2022

Hi @eddiebergman

Thank you kindly for the feedback.
We have implemented the solution below due to being on autosklearn version 0.13.0. We will patch our implementation once version 0.15.0 is available.

import re

seed = 42
for model_details in clf.automl_._backend.list_all_models(seed):
    budget = re.search(r"(?<=\_).+?(?=\/)", model_details)
    budget = budget.group()
    budget = float(budget[budget.index("_") + 1 :])
    model_id = re.search(r"(?<=\_).+?(?=\_)", model_details)
    model_id = int(model_id.group())
    backend = clf.automl_._backend
    model = backend.load_model_by_seed_and_id_and_budget(seed, model_id, budget)

Kind regards
Daria

@eddiebergman
Copy link
Contributor

Hi @DariaTkachova,

Glad to hear you got it to work, this snippet will prove helpful for anyone else who has similar questions :)

Best,
Eddie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants