Update scikit learn 1.2 #1611

eddiebergman · 2022-11-15T13:52:08Z

This PR attempts to cleanly just update scikit-learn to 1.2 which necessitates updating to Python 3.8. This means we are locked out of Colab from any newer versions as Google Colab only support python 3.7

Will be a live PR, going through the changelogs for 1.0.2 and changelogs for 1.1.3

Supposedly relevant Changelog entries

API Change The option for using the squared error via loss and criterion parameters was made more consistent. The preferred way is by setting the value to "squared_error". Old option names are still valid, produce the same models, but are deprecated and will be removed in version 1.2. #19310 by Christian Lorentzen.

For ensemble.ExtraTreesRegressor, criterion="mse" is deprecated, use "squared_error" instead which is now the default.

For ensemble.GradientBoostingRegressor, loss="ls" is deprecated, use "squared_error" instead which is now the default.

For ensemble.RandomForestRegressor, criterion="mse" is deprecated, use "squared_error" instead which is now the default.

For ensemble.HistGradientBoostingRegressor, loss="least_squares" is deprecated, use "squared_error" instead which is now the default.

For linear_model.RANSACRegressor, loss="squared_loss" is deprecated, use "squared_error" instead.

For linear_model.SGDRegressor, loss="squared_loss" is deprecated, use "squared_error" instead which is now the default.

For tree.DecisionTreeRegressor, criterion="mse" is deprecated, use "squared_error" instead which is now the default.

For tree.ExtraTreeRegressor, criterion="mse" is deprecated, use "squared_error" instead which is now the default.
API Change The option for using the absolute error via loss and criterion parameters was made more consistent. The preferred way is by setting the value to "absolute_error". Old option names are still valid, produce the same models, but are deprecated and will be removed in version 1.2. #19733 by Christian Lorentzen.

For ensemble.ExtraTreesRegressor, criterion="mae" is deprecated, use "absolute_error" instead.

For ensemble.GradientBoostingRegressor, loss="lad" is deprecated, use "absolute_error" instead.

For ensemble.RandomForestRegressor, criterion="mae" is deprecated, use "absolute_error" instead.

For ensemble.HistGradientBoostingRegressor, loss="least_absolute_deviation" is deprecated, use "absolute_error" instead.

For linear_model.RANSACRegressor, loss="absolute_loss" is deprecated, use "absolute_error" instead which is now the default.

For tree.DecisionTreeRegressor, criterion="mae" is deprecated, use "absolute_error" instead.

For tree.ExtraTreeRegressor, criterion="mae" is deprecated, use "absolute_error" instead.
API Change np.matrix usage is deprecated in 1.0 and will raise a TypeError in 1.2. #20165 by Thomas Fan.
API Change get_feature_names_out has been added to the transformer API to get the names of the output features. get_feature_names has in turn been deprecated. #18444 by Thomas Fan.
API Change All estimators store feature_names_in_ when fitted on pandas Dataframes. These feature names are compared to names seen in non-fit methods, e.g. transform and will raise a FutureWarning if they are not consistent. These FutureWarning s will become ValueError s in 1.2. #18010 by Thomas Fan.
API Change Deprecates the following keys in cv_results_: 'mean_score', 'std_score', and 'split(k)_score' in favor of 'mean_test_score' 'std_test_score', and 'split(k)_test_score'. #20583 by Thomas Fan.
API Change Deprecates datasets.load_boston in 1.0 and it will be removed in 1.2. Alternative code snippets to load similar datasets are provided. Please report to the docstring of the function for details. #20729 by Guillaume Lemaitre.
API Change Rename variable names in KernelPCA to improve readability. lambdas_ and alphas_ are renamed to eigenvalues_ and eigenvectors_, respectively. lambdas_ and alphas_ are deprecated and will be removed in 1.2. #19908 by Kei Ishikawa.
API Change Attribute n_features_in_ in dummy.DummyRegressor and dummy.DummyRegressor is deprecated and will be removed in 1.2. #20960 by Thomas Fan.
Fix Fixed the range of the argument max_samples to be (0.0, 1.0] in ensemble.RandomForestClassifier, ensemble.RandomForestRegressor, where max_samples=1.0 is interpreted as using all n_samples for bootstrapping. #20159 by @murata-yu.
API Change Removes tol=None option in ensemble.HistGradientBoostingClassifier and ensemble.HistGradientBoostingRegressor. Please use tol=0 for the same behavior. #19296 by Thomas Fan.
Fix Raise a warning in feature_extraction.text.CountVectorizer with lowercase=True when there are vocabulary entries with uppercase characters to avoid silent misses in the resulting feature vectors. #19401 by Zito Relova
API Change Raises an error in feature_selection.VarianceThreshold when the variance threshold is negative. #20207 by Tomohiro Endo
API Change Deprecates grid_scores_ in favor of split scores in cv_results_ in feature_selection.RFECV. grid_scores_ will be removed in version 1.2. #20161 by Shuhei Kayawari and @arka204.
Enhancement Add max_samples parameter in inspection.permutation_importance. It enables to draw a subset of the samples to compute the permutation importance. This is useful to keep the method tractable when evaluating feature importance on large datasets. #20431 by Oliver Pfaffel.
Feature Added sample_weight parameter to linear_model.LassoCV and linear_model.ElasticNetCV. #16449 by Christian Lorentzen.
Feature Added new solver lbfgs (available with solver="lbfgs") and positive argument to linear_model.Ridge. When positive is set to True, forces the coefficients to be positive (only supported by lbfgs). #20231 by Toshihiro Nakae.
Enhancement fit method preserves dtype for numpy.float32 in linear_model.Lars, linear_model.LassoLars, linear_model.LassoLars, linear_model.LarsCV and linear_model.LassoLarsCV. #20155 by Takeshi Oura.
API Change : The parameter normalize of linear_model.LinearRegression is deprecated and will be removed in 1.2. Motivation for this deprecation: normalize parameter did not take any effect if fit_intercept was set to False and therefore was deemed confusing. The behavior of the deprecated LinearModel(normalize=True) can be reproduced with a Pipeline with LinearModel (where LinearModel is LinearRegression, Ridge, RidgeClassifier, RidgeCV or RidgeClassifierCV) as follows: make_pipeline(StandardScaler(with_mean=False), LinearModel()). The normalize parameter in LinearRegression was deprecated in #17743 by Maria Telenczuk and Alexandre Gramfort. Same for Ridge, RidgeClassifier, RidgeCV, and RidgeClassifierCV, in: #17772 by Maria Telenczuk and Alexandre Gramfort. Same for BayesianRidge, ARDRegression in: #17746 by Maria Telenczuk. Same for Lasso, LassoCV, ElasticNet, ElasticNetCV, MultiTaskLasso, MultiTaskLassoCV, MultiTaskElasticNet, MultiTaskElasticNetCV, in: #17785 by Maria Telenczuk and Alexandre Gramfort.
API Change Keyword validation has moved from init and set_params to fit for the following estimators conforming to scikit-learn’s conventions: SGDClassifier, SGDRegressor, SGDOneClassSVM, PassiveAggressiveClassifier, and PassiveAggressiveRegressor. #20683 by Guillaume Lemaitre.
Enhancement The model_selection.BaseShuffleSplit base class is now public. #20056 by @pabloduque0.
API Change The attribute sigma_ is now deprecated in naive_bayes.GaussianNB and will be removed in 1.2. Use var_ instead. #18842 by Hong Shao Yang.
Fix The preprocessing.StandardScaler.inverse_transform method now raises error when the input data is 1D. #19752 by Zhehao Liu.
Fix The fit method of preprocessing.OrdinalEncoder will not raise error when handle_unknown='ignore' and unknown categories are given to fit. #19906 by Zhehao Liu.
API Change The n_input_features_ attribute of preprocessing.PolynomialFeatures is deprecated in favor of n_features_in_ and will be removed in 1.2. #20240 by Jérémie du Boisberranger.
API Change The n_features_ attribute of tree.DecisionTreeClassifier, tree.DecisionTreeRegressor, tree.ExtraTreeClassifier and tree.ExtraTreeRegressor is deprecated in favor of n_features_in_ and will be removed in 1.2. #20272 by Jérémie du Boisberranger.
Enhancement utils.validation.check_is_fitted now uses sklearn_is_fitted if available, instead of checking for attributes ending with an underscore. This also makes pipeline.Pipeline and preprocessing.FunctionTransformer pass check_is_fitted(estimator). #20657 by Adrin Jalali.
Fix Support for np.matrix is deprecated in check_array in 1.0 and will raise a TypeError in 1.2. #20165 by Thomas Fan.
Fix impute.SimpleImputer uses the dtype seen in fit for transform when the dtype is object. #22063 by Thomas Fan.
Enhancement Added an extension in doc/conf.py to automatically generate the list of estimators that handle NaN values. #23198 by Lise Kleiber, Zhehao Liu and Chiara Marmo.
Efficiency cluster.KMeans now defaults to algorithm="lloyd" instead of algorithm="auto", which was equivalent to algorithm="elkan". Lloyd’s algorithm and Elkan’s algorithm converge to the same solution, up to numerical rounding errors, but in general Lloyd’s algorithm uses much less memory, and it is often faster.
API Change The option for using the log loss, aka binomial or multinomial deviance, via the loss parameters was made more consistent. The preferred way is by setting the value to "log_loss". Old option names are still valid and produce the same models, but are deprecated and will be removed in version 1.3.

For ensemble.GradientBoostingClassifier, the loss parameter name “deviance” is deprecated in favor of the new name “log_loss”, which is now the default. #23036 by Christian Lorentzen.

For ensemble.HistGradientBoostingClassifier, the loss parameter names “auto”, “binary_crossentropy” and “categorical_crossentropy” are deprecated in favor of the new name “log_loss”, which is now the default. #23040 by Christian Lorentzen.

For linear_model.SGDClassifier, the loss parameter name “log” is deprecated in favor of the new name “log_loss”. #23046 by Christian Lorentzen.
Major Feature Added additional option loss="quantile" to ensemble.HistGradientBoostingRegressor for modelling quantiles. The quantile level can be specified with the new parameter quantile. #21800 and #20567 by Christian Lorentzen.
Enhancement ensemble.RandomForestClassifier and ensemble.ExtraTreesClassifier have the new criterion="log_loss", which is equivalent to criterion="entropy". #23047 by Christian Lorentzen.
Enhancement Adds get_feature_names_out to ensemble.VotingClassifier, ensemble.VotingRegressor, ensemble.StackingClassifier, and ensemble.StackingRegressor. #22695 and #22697 by Thomas Fan.
API Change Changed the default of max_features to 1.0 for ensemble.RandomForestRegressor and to "sqrt" for ensemble.RandomForestClassifier. Note that these give the same fit results as before, but are much easier to understand. The old default value "auto" has been deprecated and will be removed in version 1.3. The same changes are also applied for ensemble.ExtraTreesRegressor and ensemble.ExtraTreesClassifier. #20803 by Brian Sun.
Fix predict and sample_y methods of gaussian_process.GaussianProcessRegressor now return arrays of the correct shape in single-target and multi-target cases, and for both normalize_y=False and normalize_y=True. #22199 by Guillaume Lemaitre, Aidar Shakerimoff and Tenavi Nakamura-Zimmerer.
Enhancement impute.SimpleImputer now warns with feature names when features which are skipped due to the lack of any observed values in the training set. #21617 by Christian Ritter.
Enhancement Added support for pd.NA in impute.SimpleImputer. #21114 by Ying Xiong.
Enhancement Adds get_feature_names_out to impute.SimpleImputer, impute.KNNImputer, impute.IterativeImputer, and impute.MissingIndicator. #21078 by Thomas Fan.
API Change The verbose parameter was deprecated for impute.SimpleImputer. A warning will always be raised upon the removal of empty columns. #21448 by Oleh Kozynets and Christian Ritter.
Feature preprocessing.OneHotEncoder now supports grouping infrequent categories into a single feature. Grouping infrequent categories is enabled by specifying how to select infrequent categories with min_frequency or max_categories. #16018 by Thomas Fan.
Enhancement Adds encoded_missing_value to preprocessing.OrdinalEncoder to configure the encoded value for missing data. #21988 by Thomas Fan.
Enhancement svm.OneClassSVM, svm.NuSVC, svm.NuSVR, svm.SVC and svm.SVR now expose n_iter_, the number of iterations of the libsvm optimization routine. #21408 by Juan Martín Loyola.
Enhancement tree.DecisionTreeClassifier and tree.ExtraTreeClassifier have the new criterion="log_loss", which is equivalent to criterion="entropy". #23047 by Christian Lorentzen.
API Change Changed the default value of max_features to 1.0 for tree.ExtraTreeRegressor and to "sqrt" for tree.ExtraTreeClassifier, which will not change the fit result. The original default value "auto" has been deprecated and will be removed in version 1.3. Setting max_features to "auto" is also deprecated for tree.DecisionTreeClassifier and tree.DecisionTreeRegressor. #22476 by Zhehao Liu.

API Change Deprecates datasets.load_boston in 1.0 and it will be removed in 1.2. Alternative code snippets to load similar datasets are provided. Please report to the docstring of the function for details. #20729 by Guillaume Lemaitre.

Write separate issue to update this. It will break a lot of our test fixtures.

ensemble.HistGradientBoostingRegressor for modelling quantiles. The quantile level can be specified with the new parameter quantile. FEA add quantile HGBT scikit-learn/scikit-learn#21800 and [MRG] Common Private Loss Module with tempita scikit-learn/scikit-learn#20567 by Christian Lorentzen.

Write separate issue to investigate this

Feature preprocessing.OneHotEncoder now supports grouping infrequent categories into a single feature. Grouping infrequent categories is enabled by specifying how to select infrequent categories with min_frequency or max_categories. ENH Adds infrequent categories to OneHotEncoder scikit-learn/scikit-learn#16018 by Thomas Fan.

Write seperate issue to investigate this

Enhancement Adds encoded_missing_value to preprocessing.OrdinalEncoder to configure the encoded value for missing data. ENH Adds encoded_missing_value to OrdinalEncoder scikit-learn/scikit-learn#21988 by Thomas Fan.

Seperate issue

Enhancement svm.OneClassSVM, svm.NuSVC, svm.NuSVR, svm.SVC and svm.SVR now expose n_iter_, the number of iterations of the libsvm optimization routine. [MRG] Expose n_iter_ to BaseLibSVM scikit-learn/scikit-learn#21408 by Juan Martín Loyola.

Could be made to be iterative fit methods? Seperate issue

Next steps

Update test score fixtures
Update meta-learning most likely
Update docs, specifically that we deprecate Python 3.7 and using newer versions of Autosklearn will no longer work in Google Colab while they are fixed to Python 3.7

(scikit-learn/scikit-learn#19310)

scikit-learn/scikit-learn#19310

codecov · 2022-11-15T15:33:53Z

Codecov Report

Merging #1611 (7aa518e) into development (1abd1f9) will decrease coverage by 0.11%.
The diff coverage is 76.92%.

Additional details and impacted files

@@               Coverage Diff               @@
##           development    #1611      +/-   ##
===============================================
- Coverage        83.42%   83.32%   -0.11%     
===============================================
  Files              156      156              
  Lines            11927    12298     +371     
  Branches          1896     2033     +137     
===============================================
+ Hits              9950    10247     +297     
- Misses            1412     1453      +41     
- Partials           565      598      +33

ecederstrand · 2023-03-30T09:23:05Z

I'm interested in this change. Is anything holding back this PR?

forzagreen · 2023-06-07T14:38:47Z

Any updates on this ?

AmirAlavi · 2023-06-14T14:21:15Z

@eddiebergman can we help with this? Are the below two steps all that remains?

Update test score fixtures

Update meta-learning most likely

eddiebergman added 20 commits November 15, 2022 14:12

build(scikit-learn): Update to 1.1

4e137bf

build(python)!: Update to python 3.8

d9a4a3c

fix(requirements): Parser now recognized <=

dcf0ee1

fix(scikit-learn): Upperbound dependacny to 1.2

fa5548b

chore: update criterion

b379b22

(scikit-learn/scikit-learn#19310)

fix(test): sk_mod use correct estimator

9fc4a21

chore(space): RandomForestRegressor criterion

db947ab

scikit-learn/scikit-learn#19310

chore(space): loss HistGradientBoostingRegressor

247fa6d

scikit-learn/scikit-learn#19310

chore(space): Loss SGDRegressor

490c6aa

scikit-learn/scikit-learn#19310

chore(space): DecisionTreeRegressor

68d489b

scikit-learn/scikit-learn#19310

chore(space): ExtraTreesRegressor

277ddef

scikit-learn/scikit-learn#19310

chore(space): scikit-learn/scikit-learn#19733

bf477f1

chore(KernelPCA): scikit-learn/scikit-learn#19908

f9e517a

chore(ARDRegression): scikit-learn/scikit-learn#17746

b572463

refactor(BaseShuffleSplit): scikit-learn/scikit-learn#20056

6141073

chore(StandardScaler): scikit-learn/scikit-learn#19752

00fc2ab

chore(Steps): scikit-learn/scikit-learn#20657

165afe5

chore(space): scikit-learn/scikit-learn#23046

a82b510

doc(space): scikit-learn/scikit-learn#23047

62bb689

chore(RandomForestclassifier): scikit-learn/scikit-learn#20803

d5ee744

doc(space): scikit-learn/scikit-learn#23047

0510520

eddiebergman added the maintenance Internal maintenance label Nov 15, 2022

eddiebergman added 2 commits November 15, 2022 17:03

doc(Python): Add note on Python3.7 dropping

b199abb

Merge branch 'development' into update-scikit-learn-1.2

02870f4

Merge branch 'development' into update-scikit-learn-1.2

f4bb897

eddiebergman mentioned this pull request Nov 15, 2022

Update SMAC, ConfigSpace, Pynisher #1618

Closed

doc: Cleanup script

7aa518e

eddiebergman linked an issue Nov 29, 2022 that may be closed by this pull request

Cannot import classes from SMAC3 - "cannot import name <name> from smac" automl/SMAC3#891

Closed

github-actions bot added the Stale label Jan 22, 2023

github-actions bot closed this Jan 31, 2023

aron-bram reopened this Feb 14, 2023

github-actions bot removed the Stale label Feb 15, 2023

github-actions bot added the Stale label May 30, 2023

github-actions bot closed this Jun 6, 2023

AmirAlavi mentioned this pull request Jul 19, 2023

[Question] Will this project be maintained and updated? #1676

Closed

eddiebergman mentioned this pull request Jul 21, 2023

What's in store for Auto-Sklearn? -- From the Developers #1677

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update scikit learn 1.2 #1611

Update scikit learn 1.2 #1611

Uh oh!

eddiebergman commented Nov 15, 2022 •

edited

Loading

Uh oh!

codecov bot commented Nov 15, 2022 •

edited

Loading

Uh oh!

ecederstrand commented Mar 30, 2023

Uh oh!

forzagreen commented Jun 7, 2023

Uh oh!

AmirAlavi commented Jun 14, 2023

Uh oh!

Uh oh!

Update scikit learn 1.2 #1611

Update scikit learn 1.2 #1611

Uh oh!

Conversation

eddiebergman commented Nov 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Next steps

Uh oh!

codecov bot commented Nov 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ecederstrand commented Mar 30, 2023

Uh oh!

forzagreen commented Jun 7, 2023

Uh oh!

AmirAlavi commented Jun 14, 2023

Uh oh!

Uh oh!

eddiebergman commented Nov 15, 2022 •

edited

Loading

codecov bot commented Nov 15, 2022 •

edited

Loading