Skip to content

fetch_openml: Add an option to which returns a DataFrame #11818

@jnothman

Description

@jnothman

fetch_openml currently rejects STRING-valued attributes and ordinal-encodes all NOMINAL attributes, in order to return an array or sparse matrix of floats by default.

We should have a parameter that instead returns a DataFrame of features as the 'data' entry in the returned Bunch. This would (by default) keep nominals as pd.Categorical and strings as objects. Columns would have names determined from the ARFF attribute names / OpenML metadata. Perhaps we would also set the DataFrame's index corresponding to the is_row_identifier attribute in OpenML.

See #10733 for the general issue of an API for returning DataFrames in sklearn.datasets.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementModerateAnything that requires some knowledge of conventions and best practiceshelp wanted

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions