-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Closed
Labels
EnhancementModerateAnything that requires some knowledge of conventions and best practicesAnything that requires some knowledge of conventions and best practiceshelp wanted
Description
fetch_openml currently rejects STRING-valued attributes and ordinal-encodes all NOMINAL attributes, in order to return an array or sparse matrix of floats by default.
We should have a parameter that instead returns a DataFrame of features as the 'data' entry in the returned Bunch. This would (by default) keep nominals as pd.Categorical
and strings as objects. Columns would have names determined from the ARFF attribute names / OpenML metadata. Perhaps we would also set the DataFrame's index corresponding to the is_row_identifier
attribute in OpenML.
See #10733 for the general issue of an API for returning DataFrames in sklearn.datasets
.
Metadata
Metadata
Assignees
Labels
EnhancementModerateAnything that requires some knowledge of conventions and best practicesAnything that requires some knowledge of conventions and best practiceshelp wanted