Skip to content

ENH: Restore original convert_objects and add _convert #11173

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 2, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 9 additions & 24 deletions doc/source/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1711,36 +1711,26 @@ then the more *general* one will be used as the result of the operation.
object conversion
~~~~~~~~~~~~~~~~~

.. note::

The syntax of :meth:`~DataFrame.convert_objects` changed in 0.17.0. See
:ref:`API changes <whatsnew_0170.api_breaking.convert_objects>`
for more details.

:meth:`~DataFrame.convert_objects` is a method that converts columns from
the ``object`` dtype to datetimes, timedeltas or floats. For example, to
attempt conversion of object data that are *number like*, e.g. could be a
string that represents a number, pass ``numeric=True``. By default, this will
attempt a soft conversion and so will only succeed if the entire column is
convertible. To force the conversion, add the keyword argument ``coerce=True``.
This will force strings and number-like objects to be numbers if
possible, and other values will be set to ``np.nan``.
:meth:`~DataFrame.convert_objects` is a method to try to force conversion of types from the ``object`` dtype to other types.
To force conversion of specific types that are *number like*, e.g. could be a string that represents a number,
pass ``convert_numeric=True``. This will force strings and numbers alike to be numbers if possible, otherwise
they will be set to ``np.nan``.

.. ipython:: python

df3['D'] = '1.'
df3['E'] = '1'
df3.convert_objects(numeric=True).dtypes
df3.convert_objects(convert_numeric=True).dtypes

# same, but specific dtype conversion
df3['D'] = df3['D'].astype('float16')
df3['E'] = df3['E'].astype('int32')
df3.dtypes

To force conversion to ``datetime64[ns]``, pass ``datetime=True`` and ``coerce=True``.
To force conversion to ``datetime64[ns]``, pass ``convert_dates='coerce'``.
This will convert any datetime-like object to dates, forcing other values to ``NaT``.
This might be useful if you are reading in data which is mostly dates,
but occasionally contains non-dates that you wish to represent as missing.
but occasionally has non-dates intermixed and you want to represent as missing.

.. ipython:: python

Expand All @@ -1749,15 +1739,10 @@ but occasionally contains non-dates that you wish to represent as missing.
'foo', 1.0, 1, pd.Timestamp('20010104'),
'20010105'], dtype='O')
s
s.convert_objects(datetime=True, coerce=True)
s.convert_objects(convert_dates='coerce')

Without passing ``coerce=True``, :meth:`~DataFrame.convert_objects` will attempt
*soft* conversion of any *object* dtypes, meaning that if all
In addition, :meth:`~DataFrame.convert_objects` will attempt the *soft* conversion of any *object* dtypes, meaning that if all
the objects in a Series are of the same type, the Series will have that dtype.
Note that setting ``coerce=True`` does not *convert* arbitrary types to either
``datetime64[ns]`` or ``timedelta64[ns]``. For example, a series containing string
dates will not be converted to a series of datetimes. To convert between types,
see :ref:`converting to timestamps <timeseries.converting>`.

gotchas
~~~~~~~
Expand Down
68 changes: 3 additions & 65 deletions doc/source/whatsnew/v0.17.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -640,71 +640,6 @@ New Behavior:
Timestamp.now()
Timestamp.now() + offsets.DateOffset(years=1)

.. _whatsnew_0170.api_breaking.convert_objects:

Changes to convert_objects
^^^^^^^^^^^^^^^^^^^^^^^^^^

``DataFrame.convert_objects`` keyword arguments have been shortened. (:issue:`10265`)

===================== =============
Previous Replacement
===================== =============
``convert_dates`` ``datetime``
``convert_numeric`` ``numeric``
``convert_timedelta`` ``timedelta``
===================== =============

Coercing types with ``DataFrame.convert_objects`` is now implemented using the
keyword argument ``coerce=True``. Previously types were coerced by setting a
keyword argument to ``'coerce'`` instead of ``True``, as in ``convert_dates='coerce'``.

.. ipython:: python

df = pd.DataFrame({'i': ['1','2'],
'f': ['apple', '4.2'],
's': ['apple','banana']})
df

The old usage of ``DataFrame.convert_objects`` used ``'coerce'`` along with the
type.

.. code-block:: python

In [2]: df.convert_objects(convert_numeric='coerce')

Now the ``coerce`` keyword must be explicitly used.

.. ipython:: python

df.convert_objects(numeric=True, coerce=True)

In earlier versions of pandas, ``DataFrame.convert_objects`` would not coerce
numeric types when there were no values convertible to a numeric type. This returns
the original DataFrame with no conversion.

.. code-block:: python

In [1]: df = pd.DataFrame({'s': ['a','b']})
In [2]: df.convert_objects(convert_numeric='coerce')
Out[2]:
s
0 a
1 b

The new behavior will convert all non-number-like strings to ``NaN``,
when ``coerce=True`` is passed explicity.

.. ipython:: python

pd.DataFrame({'s': ['a','b']})
df.convert_objects(numeric=True, coerce=True)

In earlier versions of pandas, the default behavior was to try and convert
datetimes and timestamps. The new default is for ``DataFrame.convert_objects``
to do nothing, and so it is necessary to pass at least one conversion target
in the method call.

Changes to Index Comparisons
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down Expand Up @@ -992,6 +927,7 @@ Deprecations
- ``Series.is_time_series`` deprecated in favor of ``Series.index.is_all_dates`` (:issue:`11135`)
- Legacy offsets (like ``'A@JAN'``) listed in :ref:`here <timeseries.legacyaliases>` are deprecated (note that this has been alias since 0.8.0), (:issue:`10878`)
- ``WidePanel`` deprecated in favor of ``Panel``, ``LongPanel`` in favor of ``DataFrame`` (note these have been aliases since < 0.11.0), (:issue:`10892`)
- ``DataFrame.convert_objects`` has been deprecated in favor of type-specific function ``pd.to_datetime``, ``pd.to_timestamp`` and ``pd.to_numeric`` (:issue:`11133`).

.. _whatsnew_0170.prior_deprecations:

Expand Down Expand Up @@ -1187,3 +1123,5 @@ Bug Fixes
- Bug in ``DataFrame`` construction from nested ``dict`` with ``timedelta`` keys (:issue:`11129`)
- Bug in ``.fillna`` against may raise ``TypeError`` when data contains datetime dtype (:issue:`7095`, :issue:`11153`)
- Bug in ``.groupby`` when number of keys to group by is same as length of index (:issue:`11185`)
- Bug in ``convert_objects`` where converted values might not be returned if all null and ``coerce`` (:issue:`9589`)
- Bug in ``convert_objects`` where ``copy`` keyword was not respected (:issue:`9589`)
1 change: 1 addition & 0 deletions pandas/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@
from pandas.tools.pivot import pivot_table, crosstab
from pandas.tools.plotting import scatter_matrix, plot_params
from pandas.tools.tile import cut, qcut
from pandas.tools.util import to_numeric
from pandas.core.reshape import melt
from pandas.util.print_versions import show_versions
import pandas.util.testing
Expand Down
65 changes: 0 additions & 65 deletions pandas/core/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -1858,71 +1858,6 @@ def _maybe_box_datetimelike(value):
_values_from_object = lib.values_from_object


def _possibly_convert_objects(values,
datetime=True,
numeric=True,
timedelta=True,
coerce=False,
copy=True):
""" if we have an object dtype, try to coerce dates and/or numbers """

conversion_count = sum((datetime, numeric, timedelta))
if conversion_count == 0:
import warnings
warnings.warn('Must explicitly pass type for conversion. Defaulting to '
'pre-0.17 behavior where datetime=True, numeric=True, '
'timedelta=True and coerce=False', DeprecationWarning)
datetime = numeric = timedelta = True
coerce = False

if isinstance(values, (list, tuple)):
# List or scalar
values = np.array(values, dtype=np.object_)
elif not hasattr(values, 'dtype'):
values = np.array([values], dtype=np.object_)
elif not is_object_dtype(values.dtype):
# If not object, do not attempt conversion
values = values.copy() if copy else values
return values

# If 1 flag is coerce, ensure 2 others are False
if coerce:
if conversion_count > 1:
raise ValueError("Only one of 'datetime', 'numeric' or "
"'timedelta' can be True when when coerce=True.")

# Immediate return if coerce
if datetime:
return pd.to_datetime(values, errors='coerce', box=False)
elif timedelta:
return pd.to_timedelta(values, errors='coerce', box=False)
elif numeric:
return lib.maybe_convert_numeric(values, set(), coerce_numeric=True)

# Soft conversions
if datetime:
values = lib.maybe_convert_objects(values,
convert_datetime=datetime)

if timedelta and is_object_dtype(values.dtype):
# Object check to ensure only run if previous did not convert
values = lib.maybe_convert_objects(values,
convert_timedelta=timedelta)

if numeric and is_object_dtype(values.dtype):
try:
converted = lib.maybe_convert_numeric(values,
set(),
coerce_numeric=True)
# If all NaNs, then do not-alter
values = converted if not isnull(converted).all() else values
values = values.copy() if copy else values
except:
pass

return values


def _possibly_castable(arr):
# return False to force a non-fastpath

Expand Down
132 changes: 132 additions & 0 deletions pandas/core/convert.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
"""
Functions for converting object to other types
"""

import numpy as np

import pandas as pd
from pandas.core.common import (_possibly_cast_to_datetime, is_object_dtype,
isnull)
import pandas.lib as lib

# TODO: Remove in 0.18 or 2017, which ever is sooner
def _possibly_convert_objects(values, convert_dates=True,
convert_numeric=True,
convert_timedeltas=True,
copy=True):
""" if we have an object dtype, try to coerce dates and/or numbers """

# if we have passed in a list or scalar
if isinstance(values, (list, tuple)):
values = np.array(values, dtype=np.object_)
if not hasattr(values, 'dtype'):
values = np.array([values], dtype=np.object_)

# convert dates
if convert_dates and values.dtype == np.object_:

# we take an aggressive stance and convert to datetime64[ns]
if convert_dates == 'coerce':
new_values = _possibly_cast_to_datetime(
values, 'M8[ns]', errors='coerce')

# if we are all nans then leave me alone
if not isnull(new_values).all():
values = new_values

else:
values = lib.maybe_convert_objects(
values, convert_datetime=convert_dates)

# convert timedeltas
if convert_timedeltas and values.dtype == np.object_:

if convert_timedeltas == 'coerce':
from pandas.tseries.timedeltas import to_timedelta
new_values = to_timedelta(values, coerce=True)

# if we are all nans then leave me alone
if not isnull(new_values).all():
values = new_values

else:
values = lib.maybe_convert_objects(
values, convert_timedelta=convert_timedeltas)

# convert to numeric
if values.dtype == np.object_:
if convert_numeric:
try:
new_values = lib.maybe_convert_numeric(
values, set(), coerce_numeric=True)

# if we are all nans then leave me alone
if not isnull(new_values).all():
values = new_values

except:
pass
else:
# soft-conversion
values = lib.maybe_convert_objects(values)

values = values.copy() if copy else values

return values


def _soft_convert_objects(values, datetime=True, numeric=True, timedelta=True,
coerce=False, copy=True):
""" if we have an object dtype, try to coerce dates and/or numbers """

conversion_count = sum((datetime, numeric, timedelta))
if conversion_count == 0:
raise ValueError('At least one of datetime, numeric or timedelta must '
'be True.')
elif conversion_count > 1 and coerce:
raise ValueError("Only one of 'datetime', 'numeric' or "
"'timedelta' can be True when when coerce=True.")


if isinstance(values, (list, tuple)):
# List or scalar
values = np.array(values, dtype=np.object_)
elif not hasattr(values, 'dtype'):
values = np.array([values], dtype=np.object_)
elif not is_object_dtype(values.dtype):
# If not object, do not attempt conversion
values = values.copy() if copy else values
return values

# If 1 flag is coerce, ensure 2 others are False
if coerce:
# Immediate return if coerce
if datetime:
return pd.to_datetime(values, errors='coerce', box=False)
elif timedelta:
return pd.to_timedelta(values, errors='coerce', box=False)
elif numeric:
return pd.to_numeric(values, errors='coerce')

# Soft conversions
if datetime:
values = lib.maybe_convert_objects(values,
convert_datetime=datetime)

if timedelta and is_object_dtype(values.dtype):
# Object check to ensure only run if previous did not convert
values = lib.maybe_convert_objects(values,
convert_timedelta=timedelta)

if numeric and is_object_dtype(values.dtype):
try:
converted = lib.maybe_convert_numeric(values,
set(),
coerce_numeric=True)
# If all NaNs, then do not-alter
values = converted if not isnull(converted).all() else values
values = values.copy() if copy else values
except:
pass

return values
11 changes: 4 additions & 7 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -3543,9 +3543,8 @@ def combine(self, other, func, fill_value=None, overwrite=True):
# convert_objects just in case
return self._constructor(result,
index=new_index,
columns=new_columns).convert_objects(
datetime=True,
copy=False)
columns=new_columns)._convert(datetime=True,
copy=False)

def combine_first(self, other):
"""
Expand Down Expand Up @@ -4026,9 +4025,7 @@ def _apply_standard(self, func, axis, ignore_failures=False, reduce=True):

if axis == 1:
result = result.T
result = result.convert_objects(datetime=True,
timedelta=True,
copy=False)
result = result._convert(datetime=True, timedelta=True, copy=False)

else:

Expand Down Expand Up @@ -4158,7 +4155,7 @@ def append(self, other, ignore_index=False, verify_integrity=False):
other = DataFrame(other.values.reshape((1, len(other))),
index=index,
columns=combined_columns)
other = other.convert_objects(datetime=True, timedelta=True)
other = other._convert(datetime=True, timedelta=True)

if not self.columns.equals(combined_columns):
self = self.reindex(columns=combined_columns)
Expand Down
Loading