-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Identify the best model #1206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @kieran199, We're currently working on changing the external API to make it more user friendly and we agree it's not so easy at the moment. The current solution to get the model is as follows: import sklearn
from sklearn import datasets
from autosklearn.classification import AutoSklearnClassifier
X, y = datasets.load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, random_state=1)
clf = AutoSklearnClassifier(time_left_for_this_task=120, per_run_time_limit=30)
clf.fit(X_train, y_train)
wanted_model_id = ...
wanted_model = None
for (seed, model_id, budget), model in clf.automl_.models_.items():
if model_id == wanted_model_id:
wanted_model = model From there you can query the sklearn.Pipeline further to get the information you need. The issue at the moment is that the internal keys to identify models consist of Rest assured we will bring some nicer public API changes to access the internals of autosklearn but for now this is the best solution I can offer you |
Thanks very much for your reply. I am getting: AttributeError: 'SimpleClassificationPipeline' object has no attribute 'automl_' when using clf.automl_.models_.items() Do you know why this may be? Also - I note that when I print the model, I get the below. If I decided (for whatever reason) i wanted to change the model selected from gaussian_nb to something else on the leaderboard, how would I do that?
|
Also - one more question :) - normally, I would pickle a model and call it in production when new data arrives. Will pickling this model, also pickle the preprocessing steps? |
It sounds like you pickled part of the whole object? If so, can you provide code for how you did that? The above example works in an
The
You would have to a load a seperate model from the leaderboard. The whole pipeline was built around using Guassian_nb, including the hyperparameters which are not valid for other model types. So no there is no meaningful way to drop and replace a different model type. If you want to fit a specific pipeline, you can make copy the configuration manually in |
Hi there, I haven't pickled anything yet - all I've done so far is the below. Running your script immediately after gives me that error
|
I also used the variable |
Yeah, I had changed that already to x :( |
I don't know what to tell you, the snippet above works so I would imagine there is an error in however you copy-and-pasted and renaming variables. This kind of falls outside the scope of the help we can provide but if you post the full code you are using I'm happy to have a look and then close the issue if the answers all your questions. |
Ah OK I see, I just re-ran it and it worked ( I am not sure why) I promise this is the last question :) :) Is there a way to return 1 single model as the output - the most accurate one? Rather than a dictionary of many models? |
You can use the leaderboard to identify the most accurate model clf.leaderboard(
ensemble_only=False, # Include models that were also not included in the final ensemble
sort_by='cost' # The loss on the validation set
) You can identify the best model by the Again i highly recommend reading the leaderboard api to know what kind of information you can extract. |
And the number 1 model in the leaderboard is always the one selected? so if I then pickled the result of the below - it would be rank 1 of leaderboard? model = autosklearn.classification.AutoSklearnClassifier() |
Autosklearn selects an ensemble of models, not a single model. This isn't clear from the initial documentation that users come across so I'll take a note to update that! Every model shown in |
Oh i see - that's interesting. So it will use a combination of all the models it produces, with a different weight assigned to each. That makes sense - I hadn't picked that up from the documentation. Is there a high level overview which would cover how it works, as the manual didn't help in this regard |
Noted, we'll try to make that clearer for code users in the future. For now the best comprehensive overview are the two papers associated with autosklearn which may be a bit dense |
Hello there,
I've been through all the examples & it's not entirely clear to me how I identify the best model.
If I print the leaderboard & select the model I am interested in, how do i then find the below for that model? I have model ID in the leaderboard - where do I use it?:
Thanks a lot for the help in advance
The text was updated successfully, but these errors were encountered: