Skip to content

BUG: to_datetime when called with a unit and coerce is buggy #13033

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 24 additions & 2 deletions doc/source/whatsnew/v0.18.1.txt
Original file line number Diff line number Diff line change
Expand Up @@ -428,6 +428,30 @@ In addition to this error change, several others have been made as well:
- ``pd.read_csv()`` no longer allows a combination of strings and integers for the ``usecols`` parameter (:issue:`12678`)


.. _whatsnew_0181.api.to_datetime:

``to_datetime`` error changes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Bugs in ``pd.to_datetime()`` when passing a ``unit`` with convertible entries and ``errors='coerce'`` or non-convertible with ``errors='ignore'`` (:issue:`11758`)

Previous behaviour:

.. code-block:: python

In [27]: pd.to_datetime(1420043460, unit='s', errors='coerce')
Out[27]: NaT

In [28]: pd.to_datetime(11111111, unit='D', errors='ignore')
OverflowError: Python int too large to convert to C long

New behaviour:

.. ipython:: python

pd.to_datetime(1420043460, unit='s', errors='coerce')
pd.to_datetime(11111111, unit='D', errors='ignore')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You show the changes for errors='ignore' and errors='coerce', but I think the default behaviour (for errors='raise') also changed? (from OverflowError to ValueError)?
Maybe good to add this case as well to the examples


.. _whatsnew_0181.api.other:

Other API changes
Expand All @@ -444,7 +468,6 @@ Other API changes
- ``pd.concat(ignore_index=True)`` now uses ``RangeIndex`` as default (:issue:`12695`)
- ``pd.merge()`` and ``DataFrame.join()`` will show a ``UserWarning`` when merging/joining a single- with a multi-leveled dataframe (:issue:`9455`, :issue:`12219`)


.. _whatsnew_0181.deprecations:

Deprecations
Expand Down Expand Up @@ -514,7 +537,6 @@ Bug Fixes
- Bug in aligning a ``Series`` with a ``DataFrame`` (:issue:`13037`)



- Bug in consistency of ``.name`` on ``.groupby(..).apply(..)`` cases (:issue:`12363`)


Expand Down
2 changes: 1 addition & 1 deletion pandas/io/json.py
Original file line number Diff line number Diff line change
Expand Up @@ -400,7 +400,7 @@ def _try_convert_to_date(self, data):
try:
new_data = to_datetime(new_data, errors='raise',
unit=date_unit)
except OverflowError:
except ValueError:
continue
except:
break
Expand Down
39 changes: 39 additions & 0 deletions pandas/tseries/tests/test_timeseries.py
Original file line number Diff line number Diff line change
Expand Up @@ -4151,6 +4151,7 @@ def test_basics_nanos(self):
self.assertEqual(stamp.nanosecond, 500)

def test_unit(self):

def check(val, unit=None, h=1, s=1, us=0):
stamp = Timestamp(val, unit=unit)
self.assertEqual(stamp.year, 2000)
Expand Down Expand Up @@ -4217,6 +4218,44 @@ def check(val, unit=None, h=1, s=1, us=0):
result = Timestamp('NaT')
self.assertIs(result, NaT)

def test_unit_errors(self):
# GH 11758
# test proper behavior with erros

with self.assertRaises(ValueError):
to_datetime([1], unit='D', format='%Y%m%d')

values = [11111111, 1, 1.0, tslib.iNaT, pd.NaT, np.nan,
'NaT', '']
result = to_datetime(values, unit='D', errors='ignore')
expected = Index([11111111, Timestamp('1970-01-02'),
Timestamp('1970-01-02'), pd.NaT,
pd.NaT, pd.NaT, pd.NaT, pd.NaT],
dtype=object)
tm.assert_index_equal(result, expected)

result = to_datetime(values, unit='D', errors='coerce')
expected = DatetimeIndex(['NaT', '1970-01-02', '1970-01-02',
'NaT', 'NaT', 'NaT', 'NaT', 'NaT'])
tm.assert_index_equal(result, expected)

with self.assertRaises(ValueError):
to_datetime(values, unit='D', errors='raise')

values = [1420043460000, tslib.iNaT, pd.NaT, np.nan, 'NaT']

result = to_datetime(values, errors='ignore', unit='s')
expected = Index([1420043460000, pd.NaT, pd.NaT,
pd.NaT, pd.NaT], dtype=object)
tm.assert_index_equal(result, expected)

result = to_datetime(values, errors='coerce', unit='s')
expected = DatetimeIndex(['NaT', 'NaT', 'NaT', 'NaT', 'NaT'])
tm.assert_index_equal(result, expected)

with self.assertRaises(ValueError):
to_datetime(values, errors='raise', unit='s')

def test_roundtrip(self):

# test value to string and back conversions
Expand Down
17 changes: 12 additions & 5 deletions pandas/tseries/tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,7 @@ def _guess_datetime_format_for_array(arr, **kwargs):
mapping={True: 'coerce', False: 'raise'})
def to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False,
utc=None, box=True, format=None, exact=True, coerce=None,
unit='ns', infer_datetime_format=False):
unit=None, infer_datetime_format=False):
"""
Convert argument to datetime.

Expand Down Expand Up @@ -293,7 +293,7 @@ def to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False,

def _to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False,
utc=None, box=True, format=None, exact=True,
unit='ns', freq=None, infer_datetime_format=False):
unit=None, freq=None, infer_datetime_format=False):
"""
Same as to_datetime, but accept freq for
DatetimeIndex internal construction
Expand Down Expand Up @@ -323,9 +323,17 @@ def _convert_listlike(arg, box, format, name=None):
arg = arg.tz_convert(None).tz_localize('UTC')
return arg

elif format is None and com.is_integer_dtype(arg) and unit == 'ns':
result = arg.astype('datetime64[ns]')
elif unit is not None:
if format is not None:
raise ValueError("cannot specify both format and unit")
arg = getattr(arg, 'values', arg)
result = tslib.array_with_unit_to_datetime(arg, unit,
errors=errors)
if box:
if errors == 'ignore':
from pandas import Index
return Index(result, dtype=object)

return DatetimeIndex(result, tz='utc' if utc else None,
name=name)
return result
Expand Down Expand Up @@ -387,7 +395,6 @@ def _convert_listlike(arg, box, format, name=None):
dayfirst=dayfirst,
yearfirst=yearfirst,
freq=freq,
unit=unit,
require_iso8601=require_iso8601
)

Expand Down
Loading