BUG: concat of tz-aware with missing #16230

watercrossing · 2017-05-04T10:50:32Z

Code Sample

I am not sure this is the simplest way to produce this error.

import pandas as pd
import pytz
from datetime import datetime
ldn = pytz.timezone("Europe/London")
df = pd.DataFrame(data={"times" : [ldn.localize(datetime(2017,5,4, 11, 18)), 
                                   ldn.localize(datetime(2017,5,4,13,20)),
                                   ldn.localize(datetime(2017,3,4, 11, 18)), 
                                   ldn.localize(datetime(2017,3,4,13,20))],
                       "toGroupBy" : ["a", "a", "b","b"]})

def timeoffset(df):
    col = df.times
    if df.toGroupBy.iloc[0] == "b":
        forward = [None for i in range(len(col))]
    else:
        forward = [None if i == len(col) -1 else col[i+1] for i in range(len(col))]
    return pd.DataFrame(data={"forward" : forward})

gb = df.groupby("toGroupBy")
gb.apply(timeoffset)

Problem description

the last line above throws the following stacktrace:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-59-40778c60c413> in <module>()
----> 1 gb.apply(timeoffset)

/home/me/git/pandas/pandas/core/groupby.py in apply(self, func, *args, **kwargs)
    714         # ignore SettingWithCopy here in case the user mutates
    715         with option_context('mode.chained_assignment', None):
--> 716             return self._python_apply_general(f)
    717 
    718     def _python_apply_general(self, f):

/home/me/git/pandas/pandas/core/groupby.py in _python_apply_general(self, f)
    723             keys,
    724             values,
--> 725             not_indexed_same=mutated or self.mutated)
    726 
    727     def _iterate_slices(self):

/home/me/git/pandas/pandas/core/groupby.py in _wrap_applied_output(self, keys, values, not_indexed_same)
   3524         elif isinstance(v, DataFrame):
   3525             return self._concat_objects(keys, values,
-> 3526                                         not_indexed_same=not_indexed_same)
   3527         elif self.grouper.groupings is not None:
   3528             if len(self.grouper.groupings) > 1:

/home/me/git/pandas/pandas/core/groupby.py in _concat_objects(self, keys, values, not_indexed_same)
    913 
    914                 result = concat(values, axis=self.axis, keys=group_keys,
--> 915                                 levels=group_levels, names=group_names)
    916             else:
    917 

/home/me/git/pandas/pandas/core/reshape/concat.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
    205                        verify_integrity=verify_integrity,
    206                        copy=copy)
--> 207     return op.get_result()
    208 
    209 

/home/me/git/pandas/pandas/core/reshape/concat.py in get_result(self)
    405             new_data = concatenate_block_managers(
    406                 mgrs_indexers, self.new_axes, concat_axis=self.axis,
--> 407                 copy=self.copy)
    408             if not self.copy:
    409                 new_data._consolidate_inplace()

/home/me/git/pandas/pandas/core/internals.py in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy)
   4830     blocks = [make_block(
   4831         concatenate_join_units(join_units, concat_axis, copy=copy),
-> 4832         placement=placement) for placement, join_units in concat_plan]
   4833 
   4834     return BlockManager(blocks, axes)

/home/me/git/pandas/pandas/core/internals.py in <listcomp>(.0)
   4830     blocks = [make_block(
   4831         concatenate_join_units(join_units, concat_axis, copy=copy),
-> 4832         placement=placement) for placement, join_units in concat_plan]
   4833 
   4834     return BlockManager(blocks, axes)

/home/me/git/pandas/pandas/core/internals.py in concatenate_join_units(join_units, concat_axis, copy)
   4937     to_concat = [ju.get_reindexed_values(empty_dtype=empty_dtype,
   4938                                          upcasted_na=upcasted_na)
-> 4939                  for ju in join_units]
   4940 
   4941     if len(to_concat) == 1:

/home/me/git/pandas/pandas/core/internals.py in <listcomp>(.0)
   4937     to_concat = [ju.get_reindexed_values(empty_dtype=empty_dtype,
   4938                                          upcasted_na=upcasted_na)
-> 4939                  for ju in join_units]
   4940 
   4941     if len(to_concat) == 1:

/home/me/git/pandas/pandas/core/internals.py in get_reindexed_values(self, empty_dtype, upcasted_na)
   5210                     pass
   5211                 else:
-> 5212                     missing_arr = np.empty(self.shape, dtype=empty_dtype)
   5213                     missing_arr.fill(fill_value)
   5214                     return missing_arr

TypeError: data type not understood

Expected Output

                              forward
toGroupBy                            
a         0 2017-05-04 13:20:00+01:00
          1                       NaT
b         0                       NaT
          1                       NaT

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.6.0.final.0 python-bits: 64 OS: Linux OS-release: 2.6.32-696.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_GB.utf8 LANG: en_GB.utf8 LOCALE: en_GB.UTF-8

pandas: 0.20.0rc1+48.gae70ece
pytest: None
pip: 9.0.1
setuptools: 28.8.0
Cython: 0.25.2
numpy: 1.12.1
scipy: None
xarray: None
IPython: 6.0.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

jreback · 2017-05-04T10:59:46Z

your example is not copy-pastable.

ldn.localize(datetime(2017,5,4, 11, 18)) ldn is not defined.

watercrossing · 2017-05-04T11:36:58Z

Sorry, I forgot to paste one line. Now it should be copyable.

jreback · 2017-05-04T11:42:20Z

what exactly are you trying to do ?

using groupby in this way is very very odd

watercrossing · 2017-05-04T12:10:53Z

Well this is just a reduced minimal example.
I am analysing some web log files, with each line having a userID as one of the columns. For each user, I want to extract set of sessions, i.e. from event login to event logout. So I group by userid, and then apply to each group the method that finds starts and end times, and returns a dataframe of them.

The method works fine unless there is a user without any complete session, i.e. the apply method returns a column with None's only. Actually this example is even more compact:

import pandas as pd
import pytz
from datetime import datetime
ldn = pytz.timezone("Europe/London")
df = pd.DataFrame(data={"times" : [ldn.localize(datetime(2017, 5, 4, 11, 18)), 
                                   ldn.localize(datetime(2017, 5, 4, 13, 20)),
                                   ldn.localize(datetime(2017, 3, 4, 11, 18))],
                       "userID" : [1, 1, 2]})

def timeoffset(df):
    col = df.times
    forward = [None if i == len(col) -1 else col[i+1] for i in range(len(col))] # This is a simplification
    return pd.DataFrame(data={"forward" : forward})

gb = df.groupby("userID")
gb.apply(timeoffset)

It seems to me quite a natural way - group by, apply to each group, get a dataframe back for each group that gets combined into one big list of sessions ?

jreback · 2017-05-04T12:16:26Z

pd.concat([pd.DataFrame({'A': [pd.Timestamp('2017-05-04 13:20:00+01:00')]}), 
                       pd.DataFrame({'A': [None]})])

(also with pd.NaT) raises

this repros.
and some more cases:

In [11]: pd.concat([Series([pd.Timestamp('2017-05-04 13:20:00+01:00')]), pd.Series([None])])
Out[11]: 
0    2017-05-04 13:20:00+01:00
0                         None
dtype: object

In [12]: pd.concat([Series([pd.Timestamp('2017-05-04 13:20:00+01:00')]), pd.Series([pd.NaT])])
Out[12]: 
0    2017-05-04 13:20:00+01:00
0                          NaT
dtype: object

looks like these are some untested cases.

a pull-request to fix welcome!

jreback · 2017-11-25T15:51:29Z

dupe of #12396

jreback added Difficulty Intermediate Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Reshaping Concat, Merge/Join, Stack/Unstack, Explode Datetime Datetime data dtype and removed Difficulty Intermediate labels May 4, 2017

jreback added this to the 0.21.0 milestone May 4, 2017

jreback changed the title ~~Cannot create empty DatetimeTZDtype~~ BUG: concat of tz-aware with missing May 4, 2017

chris-b1 mentioned this issue Jun 12, 2017

Concat fails with tz-aware and empty series (axis=1) #16684

Closed

jreback added Bug Timezones Timezone data dtype labels Jun 13, 2017

jreback modified the milestones: 0.21.0, Next Major Release Sep 23, 2017

jreback closed this as completed Nov 25, 2017

jreback added the Duplicate Report Duplicate issue or pull request label Nov 25, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: concat of tz-aware with missing #16230

BUG: concat of tz-aware with missing #16230

watercrossing commented May 4, 2017 •

edited

Loading

jreback commented May 4, 2017

Uh oh!

watercrossing commented May 4, 2017

Uh oh!

jreback commented May 4, 2017

Uh oh!

watercrossing commented May 4, 2017 •

edited

Loading

Uh oh!

jreback commented May 4, 2017 •

edited

Loading

Uh oh!

jreback commented Nov 25, 2017

Uh oh!

Uh oh!

BUG: concat of tz-aware with missing #16230

BUG: concat of tz-aware with missing #16230

Comments

watercrossing commented May 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Sample

Problem description

Expected Output

Output of pd.show_versions()

jreback commented May 4, 2017

Uh oh!

watercrossing commented May 4, 2017

Uh oh!

jreback commented May 4, 2017

Uh oh!

watercrossing commented May 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jreback commented May 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jreback commented Nov 25, 2017

Uh oh!

watercrossing commented May 4, 2017 •

edited

Loading

Output of `pd.show_versions()`

watercrossing commented May 4, 2017 •

edited

Loading

jreback commented May 4, 2017 •

edited

Loading