Obtain feature list after ensemble classification #719

rcuocolo · 2019-08-30T15:48:26Z

I ran auto-sklearn and obtained a 3 model ensemble for classifying my data. I would like to know which features were selected for the classification for reporting and better understanding of the process.
I already tried the code in: #524, but was not able to obtain feature names (in the column header of my data set).

This is the code I am currently employing for the classification:

import pandas as pd
import sklearn.model_selection
import sklearn.metrics
import autosklearn.classification
from sklearn import preprocessing

X = pd.read_csv('df.csv', index_col='A')
le = preprocessing.LabelEncoder()
for column_name in X.columns:
   if X[column_name].dtype == object:
      X[column_name] = le.fit_transform(X[column_name])
   else:
      pass
y = X.Infiltration
X_train, X_test, y_train, y_test = \
    sklearn.model_selection.train_test_split(X, y, random_state=3, test_size=0.2)

automl = autosklearn.classification.AutoSklearnClassifier(
    time_left_for_this_task=180,
    per_run_time_limit=60,
    ensemble_size=3,
    ensemble_nbest=50,
    resampling_strategy='cv',
    resampling_strategy_arguments={'folds': 10},
    )

automl.fit(X_train.copy(), y_train.copy(), dataset_name='test')
automl.refit(X_train.copy(), y_train.copy())

predictions = automl.predict(X_test)
probabilities = automl.predict_proba(X_test)

What can I add to obtain the desired output?

The text was updated successfully, but these errors were encountered:

mfeurer · 2020-01-07T16:48:33Z

Could you please post a fully reproducible example in which you apply the code from issue #524 and it fails?

Also, you might have to cast to a numpy array before passing the data to Auto-sklearn.

domainoverflow · 2020-02-02T01:31:47Z

Hi @mfeurer Thanks in advance for your valuable time.
I am having the same problem. I am trying to get the feature_importance or coef_ but I can't.
The only difference is that I am using Regressor and not Classifier
I get R2 score. It works well in predicting but I need to access the feature importance .. tried many ways from here and #524.
I would be grateful if you could point me

import autosklearn.classification
import autosklearn.regression
import sklearn.model_selection
import sklearn.datasets
import sklearn.metrics
import pandas as pd
import sys
import numpy as np
import matplotlib
import matplotlib.pyplot as plt

url = '/home/path_to/dataset.csv'
full_data = pd.read_csv(url)
full_data[['feature1','feature2','feature3','target_y_feature']]
y = full_data["target_y_feature"]
X = full_data.drop(["target_y_feature"], axis=1)
#y = y.to_numpy()
#X = X.to_numpy()
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, test_size=0.3, random_state=42)
 
automl = autosklearn.regression.AutoSklearnRegressor(ensemble_size=1,time_left_for_this_task=220,per_run_time_limit=60,initial_configurations_via_metalearning=0)
automl.fit(X_train, y_train)
y_hat = automl.predict(X_test)
#print("Accuracy score", sklearn.metrics.accuracy_score(y_test, y_hat))
predictions=y_hat
print(automl.show_models())
print("R2 score:", sklearn.metrics.r2_score(y_test, predictions))
print("ground")
print(y_test)
print("predictions") 
print(predictions)


for weight, model in automl.get_models_with_weights():
        # Obtain the step of the underlying scikit-learn pipeline
        print(model.steps[-2])
        # Obtain the scores of the current feature selector
        print(model.steps[-2][-1].choice.preprocessor.scores_)
        # Obtain the percentile configured by Auto-sklearn
        print(model.steps[-2][-1].choice.preprocessor.percentile)
 

#automl.get_models_with_weights()

But I get AttributeError: 'int' object has no attribute 'scores_'

I also tried following the example from @teresaconc

pipeline = list(automl._automl.models.values())[0]
print(pipeline)

but get

AttributeError: 'list' object has no attribute 'models'

whereas if I do

pipeline = list(automl._automl)
print(pipeline)

I get

RuntimeError: scikit-learn estimators should always specify their parameters in the signature of their init (no varargs). <class 'autosklearn.automl.AutoMLRegressor'> with constructor (self, *args, **kwargs) doesn't follow this convention.

For ensemble_size = 1 I have the following:

[(1.000000, SimpleRegressionPipeline({'categorical_encoding:__choice__': 'one_hot_encoding', 'imputation:strategy': 'mean', 'preprocessor:__choice__': 'no_preprocessing', 'regressor:__choice__': 'random_forest', 'rescaling:__choice__': 'standardize', 'categorical_encoding:one_hot_encoding:use_minimum_fraction': 'True', 'regressor:random_forest:bootstrap': 'True', 'regressor:random_forest:criterion': 'mse', 'regressor:random_forest:max_depth': 'None', 'regressor:random_forest:max_features': 1.0, 'regressor:random_forest:max_leaf_nodes': 'None', 'regressor:random_forest:min_impurity_decrease': 0.0, 'regressor:random_forest:min_samples_leaf': 1, 'regressor:random_forest:min_samples_split': 2, 'regressor:random_forest:min_weight_fraction_leaf': 0.0, 'regressor:random_forest:n_estimators': 100, 'categorical_encoding:one_hot_encoding:minimum_fraction': 0.01},
dataset_properties={
  'task': 4,
  'sparse': False,
  'multilabel': False,
  'multiclass': False,
  'target_type': 'regression',
  'signed': False})),
]

I also tried casting the Panda Dataframe to numPy ( commented out above in the code ) but with the same outcome.

I would be grateful if you could point me to accessing the coefficients / feature importance.

Thank you,
s
ps: another way of asking this would be.. how could I get the feature_importances_ from this regression example auto-sklearn/examples/example_regression.py .. ? thanks for your time.

akshayparanjape · 2020-08-04T08:04:51Z

During prediction, I get an error as - DataFrame object has no attribute 'dtype' while passing a pandas DataFrame as input. Pandas dataframe has no attribute 'dtype' but has attribute as 'dtypes'.
Can you let me know, if this is a bug or am I doing something wrong?

mfeurer · 2020-08-04T08:22:03Z

We now do have an example showing how to obtain information from the trained pipelines: https://automl.github.io/auto-sklearn/development/examples/example_get_pipeline_components.html

This is currently in the development branch only but will be available in the next release.

@akshayparanjape we did not test for pandas in the master branch. Please use numpy there. We will support pandas dataframes in the next release.

mfeurer · 2020-09-02T13:51:42Z

Hi together, the previously mentioned example is now available in the master branch and main documentation: https://automl.github.io/auto-sklearn/master/examples/40_advanced/example_get_pipeline_components.html#sphx-glr-examples-40-advanced-example-get-pipeline-components-py

Please reopen if this issue is still of interest to you and you need help adapting it to a specific model.

domainoverflow mentioned this issue Feb 2, 2020

Get model co-efficients #224

Closed

mfeurer closed this as completed Sep 2, 2020

zgm0407 mentioned this issue Mar 20, 2021

How to see the selected features when ensemble size = 1? #1102

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Obtain feature list after ensemble classification #719

Obtain feature list after ensemble classification #719

rcuocolo commented Aug 30, 2019 •

edited by mfeurer

Loading

mfeurer commented Jan 7, 2020

Uh oh!

domainoverflow commented Feb 2, 2020 •

edited

Loading

Uh oh!

akshayparanjape commented Aug 4, 2020

Uh oh!

mfeurer commented Aug 4, 2020

Uh oh!

mfeurer commented Sep 2, 2020

Uh oh!

Obtain feature list after ensemble classification #719

Obtain feature list after ensemble classification #719

Comments

rcuocolo commented Aug 30, 2019 • edited by mfeurer Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

mfeurer commented Jan 7, 2020

Uh oh!

domainoverflow commented Feb 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

akshayparanjape commented Aug 4, 2020

Uh oh!

mfeurer commented Aug 4, 2020

Uh oh!

mfeurer commented Sep 2, 2020

Uh oh!

rcuocolo commented Aug 30, 2019 •

edited by mfeurer

Loading

domainoverflow commented Feb 2, 2020 •

edited

Loading