Undo accidental commit

mfeurer · eddiebergman · commit 56e6ac06eabc · 2022-08-18T20:08:49.000+02:00
diff --git a/doc/faq.rst b/doc/faq.rst
@@ -31,30 +31,26 @@ General
     Optionally, you can measure the ability of this fitted model to generalize to unseen data by
     providing an optional testing pair (X_test/Y_test). For further details, please refer to the
     Example :ref:`sphx_glr_examples_40_advanced_example_pandas_train_test.py`.
+    Supported formats for these training and testing pairs are: np.ndarray,
+    pd.DataFrame, scipy.sparse.csr_matrix and python lists.
 
-    Regarding the features, there are multiple things to consider:
+    If your data contains categorical values (in the features or targets), autosklearn will automatically encode your
+    data using a `sklearn.preprocessing.LabelEncoder <https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html>`_
+    for unidimensional data and a `sklearn.preprocessing.OrdinalEncoder <https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OrdinalEncoder.html>`_
+    for multidimensional data.
+
+    Regarding the features, there are two methods to guide *auto-sklearn* to properly encode categorical columns:
 
     * Providing a X_train/X_test numpy array with the optional flag feat_type. For further details, you
       can check the Example :ref:`sphx_glr_examples_40_advanced_example_feature_types.py`.
-    * You can provide a pandas DataFrame with properly formatted columns. If a column has numerical
-      dtype, *auto-sklearn* will not encode it and it will be passed directly to scikit-learn. *auto-sklearn*
-      supports both categorical or string as column type. Please ensure that you are using the correct
-      dtype for your task. By default *auto-sklearn* treats object and string columns as strings and
-      encodes the data using `sklearn.feature_extraction.text.CountVectorizer <https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html>`_
-    * If your data contains categorical values (in the features or targets), ensure that you explicitly label them as categorical.
-      data labeled as categorical is encoded by using a `sklearn.preprocessing.LabelEncoder <https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html>`_
-      for unidimensional data and a `sklearn.preprodcessing.OrdinalEncoder <https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OrdinalEncoder.html>`_ for multidimensional data.
-    * For further details on how to properly encode your data, you can check the Pandas Example
-      `Working with categorical data <https://pandas.pydata.org/pandas-docs/stable/user_guide/categorical.html>`_). If you are working with time series, it is recommended that you follow this approach
+    * You can provide a pandas DataFrame, with properly formatted columns. If a column has numerical
+      dtype, *auto-sklearn* will not encode it and it will be passed directly to scikit-learn. If the
+      column has a categorical/boolean class, it will be encoded. If the column is of any other type
+      (Object or Timeseries), an error will be raised. For further details on how to properly encode
+      your data, you can check the Pandas Example
+      `Working with categorical data <https://pandas.pydata.org/pandas-docs/stable/user_guide/categorical.html>`_).
+      If you are working with time series, it is recommended that you follow this approach
       `Working with time data <https://stats.stackexchange.com/questions/311494/>`_.
-    * If you prefer not using the string option at all you can disable this option. In this case
-      objects, strings and categorical columns are encoded as categorical.
-
-    .. code:: python
-
-        import autosklearn.classification
-        automl = autosklearn.classification.AutoSklearnClassifier(allow_string_features=False)
-        automl.fit(X_train, y_train)
 
     Regarding the targets (y_train/y_test), if the task involves a classification problem, such features will be
     automatically encoded. It is recommended to provide both y_train and y_test during fit, so that a common encoding
diff --git a/doc/manual.rst b/doc/manual.rst
@@ -317,12 +317,14 @@ Other
     Optionally, you can measure the ability of this fitted model to generalize to unseen data by
     providing an optional testing pair (X_test/Y_test). For further details, please refer to the
     Example :ref:`sphx_glr_examples_40_advanced_example_pandas_train_test.py`.
+    Supported formats for these training and testing pairs are: np.ndarray,
+    pd.DataFrame, scipy.sparse.csr_matrix and python lists.
 
     Regarding the features, there are multiple things to consider:
 
     * Providing a X_train/X_test numpy array with the optional flag feat_type. For further details, you
       can check the Example :ref:`sphx_glr_examples_40_advanced_example_feature_types.py`.
-    * You can provide a pandas DataFrame with properly formatted columns. If a column has numerical
+    * You can provide a pandas DataFrame, with properly formatted columns. If a column has numerical
       dtype, *auto-sklearn* will not encode it and it will be passed directly to scikit-learn. *auto-sklearn*
       supports both categorical or string as column type. Please ensure that you are using the correct
       dtype for your task. By default *auto-sklearn* treats object and string columns as strings and