Identify the best model #1206

kieran199 · 2021-08-05T09:36:39Z

Hello there,

I've been through all the examples & it's not entirely clear to me how I identify the best model.

If I print the leaderboard & select the model I am interested in, how do i then find the below for that model? I have model ID in the leaderboard - where do I use it?:

Model type
Hyper parameters used
Any pre-processing steps that auto-sklearn used

Thanks a lot for the help in advance

eddiebergman · 2021-08-05T11:13:25Z

Hi @kieran199,

We're currently working on changing the external API to make it more user friendly and we agree it's not so easy at the moment.

The current solution to get the model is as follows:

import sklearn
from sklearn import datasets
from autosklearn.classification import AutoSklearnClassifier

X, y = datasets.load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, random_state=1)

clf = AutoSklearnClassifier(time_left_for_this_task=120, per_run_time_limit=30)
clf.fit(X_train, y_train)

wanted_model_id = ...
wanted_model = None

for (seed, model_id, budget), model in clf.automl_.models_.items():
    if model_id == wanted_model_id:
        wanted_model = model

From there you can query the sklearn.Pipeline further to get the information you need.
There are also some more parameters to leaderboard that give some information on the model type and preprocessing used

The issue at the moment is that the internal keys to identify models consist of (seed, model_id, budget) which is more than an end user should really know about. As in your case and many others, you only really wish to use the model_id.

Rest assured we will bring some nicer public API changes to access the internals of autosklearn but for now this is the best solution I can offer you

kieran199 · 2021-08-05T12:32:11Z

Thanks very much for your reply. I am getting:

AttributeError: 'SimpleClassificationPipeline' object has no attribute 'automl_'

when using clf.automl_.models_.items()

Do you know why this may be?

Also - I note that when I print the model, I get the below. If I decided (for whatever reason) i wanted to change the model selected from gaussian_nb to something else on the leaderboard, how would I do that?

SimpleClassificationPipeline({'balancing:strategy': 'weighting', 'classifier:__choice__': 'gaussian_nb', 'data_preprocessing:categorical_transformer:categorical_encoding:__choice__': 'encoding', 'data_preprocessing:categorical_transformer:category_coalescence:__choice__': 'minority_coalescer', 'data_preprocessing:numerical_transformer:imputation:strategy': 'median', 'data_preprocessing:numerical_transformer:rescaling:__choice__': 'quantile_transformer', 'feature_preprocessor:__choice__': 'select_rates_classification', 'data_preprocessing:categorical_transformer:category_coalescence:minority_coalescer:minimum_fraction': 0.009151554238227241, 'data_preprocessing:numerical_transformer:rescaling:quantile_transformer:n_quantiles': 1726, 'data_preprocessing:numerical_transformer:rescaling:quantile_transformer:output_distribution': 'normal', 'feature_preprocessor:select_rates_classification:alpha': 0.027868485240680432, 'feature_preprocessor:select_rates_classification:score_func': 'chi2', 'feature_preprocessor:select_rates_classification:mode': 'fpr'},
dataset_properties={
  'task': 1,
  'sparse': False,
  'multilabel': False,
  'multiclass': False,
  'target_type': 'classification',
  'signed': False})

kieran199 · 2021-08-05T12:37:43Z

Also - one more question :) - normally, I would pickle a model and call it in production when new data arrives.

Will pickling this model, also pickle the preprocessing steps?

eddiebergman · 2021-08-05T13:18:35Z

Thanks very much for your reply. I am getting:

AttributeError: 'SimpleClassificationPipeline' object has no attribute 'automl_'

when using clf.automl_.models_.items()

Do you know why this may be?

It sounds like you pickled part of the whole object? If so, can you provide code for how you did that? The above example works in an ipython session for me,

Also - one more question :) - normally, I would pickle a model and call it in production when new data arrives.

Will pickling this model, also pickle the preprocessing steps?

The Pipeline object you get at the end consists of all steps that are used in the process, so yes this include the preprocessing steps.

Also - I note that when I print the model, I get the below. If I decided (for whatever reason) i wanted to change the model selected from gaussian_nb to something else on the leaderboard, how would I do that?

SimpleClassificationPipeline({'balancing:strategy': 'weighting', 'classifier:__choice__': 'gaussian_nb', 'data_preprocessing:categorical_transformer:categorical_encoding:__choice__': 'encoding', 'data_preprocessing:categorical_transformer:category_coalescence:__choice__': 'minority_coalescer', 'data_preprocessing:numerical_transformer:imputation:strategy': 'median', 'data_preprocessing:numerical_transformer:rescaling:__choice__': 'quantile_transformer', 'feature_preprocessor:__choice__': 'select_rates_classification', 'data_preprocessing:categorical_transformer:category_coalescence:minority_coalescer:minimum_fraction': 0.009151554238227241, 'data_preprocessing:numerical_transformer:rescaling:quantile_transformer:n_quantiles': 1726, 'data_preprocessing:numerical_transformer:rescaling:quantile_transformer:output_distribution': 'normal', 'feature_preprocessor:select_rates_classification:alpha': 0.027868485240680432, 'feature_preprocessor:select_rates_classification:score_func': 'chi2', 'feature_preprocessor:select_rates_classification:mode': 'fpr'},
dataset_properties={
  'task': 1,
  'sparse': False,
  'multilabel': False,
  'multiclass': False,
  'target_type': 'classification',
  'signed': False})

You would have to a load a seperate model from the leaderboard. The whole pipeline was built around using Guassian_nb, including the hyperparameters which are not valid for other model types. So no there is no meaningful way to drop and replace a different model type. If you want to fit a specific pipeline, you can make copy the configuration manually in sklearn and train the pipeline that way.

kieran199 · 2021-08-05T13:26:52Z

Hi there, I haven't pickled anything yet - all I've done so far is the below. Running your script immediately after gives me that error

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.metrics import accuracy_score
import autosklearn.classification

train_features, test_features, train_labels, test_labels = train_test_split(features, output, test_size = 0.2, random_state = 42)
model = autosklearn.classification.AutoSklearnClassifier()
model.fit(train_features, train_labels)
predictions = model.predict(test_features)

eddiebergman · 2021-08-05T13:28:06Z

I also used the variable model in the snippet, you'll have to change that in either my snippet or yours.

kieran199 · 2021-08-05T13:43:55Z

Yeah, I had changed that already to x :(

eddiebergman · 2021-08-05T13:51:46Z

I don't know what to tell you, the snippet above works so I would imagine there is an error in however you copy-and-pasted and renaming variables. This kind of falls outside the scope of the help we can provide but if you post the full code you are using I'm happy to have a look and then close the issue if the answers all your questions.

kieran199 · 2021-08-05T14:19:49Z

Ah OK I see, I just re-ran it and it worked ( I am not sure why)

I promise this is the last question :) :)

Is there a way to return 1 single model as the output - the most accurate one? Rather than a dictionary of many models?

eddiebergman · 2021-08-05T14:28:47Z

You can use the leaderboard to identify the most accurate model

clf.leaderboard(
    ensemble_only=False, # Include models that were also not included in the final ensemble
    sort_by='cost' # The loss on the validation set
)

You can identify the best model by the rank column.

Again i highly recommend reading the leaderboard api to know what kind of information you can extract.

kieran199 · 2021-08-05T15:12:52Z

And the number 1 model in the leaderboard is always the one selected?

so if I then pickled the result of the below - it would be rank 1 of leaderboard?

model = autosklearn.classification.AutoSklearnClassifier()
model.fit(train_features, train_labels)

eddiebergman · 2021-08-05T15:14:56Z

Autosklearn selects an ensemble of models, not a single model. This isn't clear from the initial documentation that users come across so I'll take a note to update that!

Every model shown in leaderboard() is in the ensemble where ensemble_weight is how strong that model is in the ensemble.

kieran199 · 2021-08-05T15:16:21Z

Oh i see - that's interesting. So it will use a combination of all the models it produces, with a different weight assigned to each.

That makes sense - I hadn't picked that up from the documentation. Is there a high level overview which would cover how it works, as the manual didn't help in this regard

eddiebergman · 2021-08-05T15:19:56Z

Noted, we'll try to make that clearer for code users in the future.

For now the best comprehensive overview are the two papers associated with autosklearn which may be a bit dense

eddiebergman closed this as completed Aug 5, 2021

eddiebergman reopened this Aug 5, 2021

eddiebergman closed this as completed Aug 5, 2021

m-alshehri mentioned this issue Sep 27, 2021

[Question] How to generate Confusion Matrix and get parameters of models found #1255

Closed

eddiebergman mentioned this issue Sep 27, 2021

[Question] Accessing particular models found by autosklearn #1252

Closed

eddiebergman mentioned this issue Nov 17, 2021

Improve user method of seeing pipelines generated #1298

Closed

eddiebergman mentioned this issue Jan 22, 2022

Is there a way to retrieve all trained models during the trial, not just the ones in the best model? #1376

Closed

eddiebergman mentioned this issue Mar 23, 2023

[Question] Is this the correct way of extending AutoSklearn with a Standard Scaler feature preprocessor? #1651

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Identify the best model #1206

Identify the best model #1206

kieran199 commented Aug 5, 2021

eddiebergman commented Aug 5, 2021 •

edited

Loading

Uh oh!

kieran199 commented Aug 5, 2021 •

edited

Loading

Uh oh!

kieran199 commented Aug 5, 2021

Uh oh!

eddiebergman commented Aug 5, 2021

Uh oh!

kieran199 commented Aug 5, 2021

Uh oh!

eddiebergman commented Aug 5, 2021 •

edited

Loading

Uh oh!

kieran199 commented Aug 5, 2021

Uh oh!

eddiebergman commented Aug 5, 2021

Uh oh!

kieran199 commented Aug 5, 2021

Uh oh!

eddiebergman commented Aug 5, 2021 •

edited

Loading

Uh oh!

kieran199 commented Aug 5, 2021

Uh oh!

eddiebergman commented Aug 5, 2021 •

edited

Loading

Uh oh!

kieran199 commented Aug 5, 2021

Uh oh!

eddiebergman commented Aug 5, 2021

Uh oh!

Identify the best model #1206

Identify the best model #1206

Comments

kieran199 commented Aug 5, 2021

eddiebergman commented Aug 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kieran199 commented Aug 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kieran199 commented Aug 5, 2021

Uh oh!

eddiebergman commented Aug 5, 2021

Uh oh!

kieran199 commented Aug 5, 2021

Uh oh!

eddiebergman commented Aug 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kieran199 commented Aug 5, 2021

Uh oh!

eddiebergman commented Aug 5, 2021

Uh oh!

kieran199 commented Aug 5, 2021

Uh oh!

eddiebergman commented Aug 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kieran199 commented Aug 5, 2021

Uh oh!

eddiebergman commented Aug 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kieran199 commented Aug 5, 2021

Uh oh!

eddiebergman commented Aug 5, 2021

Uh oh!

eddiebergman commented Aug 5, 2021 •

edited

Loading

kieran199 commented Aug 5, 2021 •

edited

Loading

eddiebergman commented Aug 5, 2021 •

edited

Loading

eddiebergman commented Aug 5, 2021 •

edited

Loading

eddiebergman commented Aug 5, 2021 •

edited

Loading