Skip to content

COMPAT: should an empty string match a format (or just be NaT) #13044

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jreback opened this issue Apr 30, 2016 · 3 comments · Fixed by #37835
Closed

COMPAT: should an empty string match a format (or just be NaT) #13044

jreback opened this issue Apr 30, 2016 · 3 comments · Fixed by #37835
Assignees
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Apr 30, 2016

after #13033

I left this alone, but should an empty string just be a NaT (like we do elsewhere), even when a formatis present?

In [1]:  td = pd.Series(['May 04', 'Jun 02', ''], index=[1, 2, 3])

In [2]: td
Out[2]: 
1    May 04
2    Jun 02
3          
dtype: object

In [3]: pd.to_datetime(td, format='%b %y', errors='coerce')
Out[3]: 
1   2004-05-01
2   2002-06-01
3          NaT
dtype: datetime64[ns]

In [4]: pd.to_datetime(td, format='%b %y', errors='raise')
ValueError: time data '' does not match format '%b %y' (match)

related is that we don't coerce empty strings here either (and we have an odd error message).

In [1]: pd.to_datetime([1, ''], unit='s', errors='coerce')
Out[1]: DatetimeIndex(['1970-01-01 00:00:01', 'NaT'], dtype='datetime64[ns]', freq=None)

In [2]: pd.to_datetime([1, ''], unit='s')
ValueError: invalid literal for long() with base 10: ''
@jreback jreback added Datetime Datetime data dtype Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Compat pandas objects compatability with Numpy or Python functions labels Apr 30, 2016
@jreback jreback added this to the 0.18.2 milestone Apr 30, 2016
@jreback
Copy link
Contributor Author

jreback commented Apr 30, 2016

@sinhrks

@jorisvandenbossche jorisvandenbossche modified the milestones: 0.20.0, 0.19.0 Aug 21, 2016
@jreback jreback modified the milestones: 0.20.0, Next Major Release Mar 23, 2017
@atulagrwl
Copy link

This seems to be fixed now. I can see same output in to_datetime whether errors='coerce' or without any errors param

In [1]: import pandas as pd

In [2]: pd.__version__
Out[2]: '0.23.0'

In [3]: pd.to_datetime([1, ''], unit='s', errors='coerce')
Out[3]: DatetimeIndex(['1970-01-01 00:00:01', 'NaT'], dtype='datetime64[ns]', freq=None)

In [4]: pd.to_datetime([1, ''], unit='s')
Out[4]: DatetimeIndex(['1970-01-01 00:00:01', 'NaT'], dtype='datetime64[ns]', freq=None)

@jreback

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Compat pandas objects compatability with Numpy or Python functions Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Datetime Datetime data dtype labels Mar 31, 2020
fgebhart added a commit to fgebhart/pandas that referenced this issue Nov 14, 2020
fgebhart added a commit to fgebhart/pandas that referenced this issue Nov 14, 2020
@jreback jreback modified the milestones: Contributions Welcome, 1.2 Nov 14, 2020
@fgebhart
Copy link
Contributor

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants