Skip to content

where() corrupts tz-aware datetime column data #15701

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
haroldfox opened this issue Mar 16, 2017 · 5 comments
Closed

where() corrupts tz-aware datetime column data #15701

haroldfox opened this issue Mar 16, 2017 · 5 comments
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Internals Related to non-user accessible pandas implementation Timezones Timezone data dtype
Milestone

Comments

@haroldfox
Copy link

Code Sample, a copy-pastable example if possible

# Your code here

s = pd.Series([pd.Timestamp(s) for s in ['2016-12-31 12:00:04+00:00', '2016-12-31 12:00:04.010000+00:00']])
p = pd.Series([False, True])
s.where(p)

Problem description

receive

0 NaT
1 2016-12-31 12:00:04.009999872+00:00

seems similar to #14872, which I also ran into.

works as expected if you do: s.where(p, other=pd.NaT)

Expected Output

I would expect:

0 NaT
1 2016-12-31 12:00:04.010000+00:00

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.1.35 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.19.2
nose: None
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.24.1
numpy: 1.11.2
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.4.6
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5.2
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: None
xlsxwriter: 0.9.3
lxml: None
bs4: 4.5.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.9.4
boto: None
pandas_datareader: None

@chrisaycock
Copy link
Contributor

Interesting...

In [10]: s[1]
Out[10]: Timestamp('2016-12-31 12:00:04.010000+0000', tz='UTC')

In [11]: s.where(p)[1]
Out[11]: Timestamp('2016-12-31 12:00:04.009999872+0000', tz='UTC')

@jreback
Copy link
Contributor

jreback commented Mar 16, 2017

these prob pass thru a float conversion; these should be handled in the Block better.

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves Internals Related to non-user accessible pandas implementation Timezones Timezone data dtype Difficulty Intermediate labels Mar 16, 2017
@jreback jreback added this to the Next Major Release milestone Mar 16, 2017
@jreback
Copy link
Contributor

jreback commented Mar 16, 2017

https://github.com/pandas-dev/pandas/blob/master/pandas/core/internals.py#L2443

needs a or isnull(other) and I think this will work. Its keeping the default nan around, which forces conversion to float, rather than using the correct iNaT (integer)

@jreback
Copy link
Contributor

jreback commented Mar 16, 2017

if anyone wants to do a PR, pls do!

@chrisaycock
Copy link
Contributor

@jreback I can take a stab at this tonight.

@jreback jreback modified the milestones: 0.20.0, Next Major Release Mar 17, 2017
AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this issue Mar 21, 2017
…andas-dev#15701)

closes pandas-dev#15701

Author: Christopher C. Aycock <[email protected]>

Closes pandas-dev#15711 from chrisaycock/GH15701 and squashes the following commits:

b77f5ed [Christopher C. Aycock] BUG: TZ-aware Series.where() appropriately handles default other=nan (pandas-dev#15701)
mattip pushed a commit to mattip/pandas that referenced this issue Apr 3, 2017
…andas-dev#15701)

closes pandas-dev#15701

Author: Christopher C. Aycock <[email protected]>

Closes pandas-dev#15711 from chrisaycock/GH15701 and squashes the following commits:

b77f5ed [Christopher C. Aycock] BUG: TZ-aware Series.where() appropriately handles default other=nan (pandas-dev#15701)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Internals Related to non-user accessible pandas implementation Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants