Skip to content

Apply method bug with NaT type and dictionaries #16308

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
AbhinavanT opened this issue May 9, 2017 · 2 comments
Closed

Apply method bug with NaT type and dictionaries #16308

AbhinavanT opened this issue May 9, 2017 · 2 comments
Labels
Duplicate Report Duplicate issue or pull request

Comments

@AbhinavanT
Copy link

Code Sample

I've come across a peculiar case, it has two conditions:

  1. The dataframe contains NaT values (I've tried NoneType and that seems to work just fine)
  2. The applied function returns a dict
sample = pd.DataFrame({'date': [pd.NaT, pd.NaT, pd.NaT, pd.NaT], 'period': [1,1,1,1], 'parent_id': ['a', 'b', 'c', 'd']})
sample.apply(lambda x: {'parent_user_id': x.parent_id}, axis=1, reduce=True)

Problem description

This is flawed since the output should be a series where each element is a dictionary, instead this outputs a dataframe of NaNs.

Expected Output

Out[40]: 
0    {'parent_user_id': 'a'}
1    {'parent_user_id': 'b'}
2    {'parent_user_id': 'c'}
3    {'parent_user_id': 'd'}

Output

Out[46]: 
   date  parent_id  period
0   NaN        NaN     NaN
1   NaN        NaN     NaN
2   NaN        NaN     NaN
3   NaN        NaN     NaN
# Paste the output here pd.show_versions() here
@TomAugspurger
Copy link
Contributor

TomAugspurger commented May 9, 2017

Nothing to do with NaT, as this same thing happens after you fill the values

In [15]: sample.fillna(dict(date=pd.Timestamp('2017'))).apply(lambda x: {'parent_user_id': x.parent_id}, axis=1, reduce=True)
    ...:
    ...:
Out[15]:
   date  parent_id  period
0   NaN        NaN     NaN
1   NaN        NaN     NaN
2   NaN        NaN     NaN
3   NaN        NaN     NaN

NaT and None might have behaved differently, if using None forced an object dtype.

This is more about the output shape inference that apply does. You'll be much better off avoiding .apply(..., axis=1) and just doing things directly:

In [20]: pd.Series([{'parent_user_id': x.parent_id} for x in sample.itertuples()])
Out[20]:
0    {'parent_user_id': 'a'}
1    {'parent_user_id': 'b'}
2    {'parent_user_id': 'c'}
3    {'parent_user_id': 'd'}
dtype: object

@TomAugspurger
Copy link
Contributor

This falls under #15628

@TomAugspurger TomAugspurger added the Duplicate Report Duplicate issue or pull request label May 10, 2017
@TomAugspurger TomAugspurger added this to the No action milestone May 10, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

No branches or pull requests

2 participants