Problem with DataFrame.diff() when using groupby getting "unexpected keyword argument 'axis'" due to built-in wrapper #17345

AdamHede · 2017-08-26T08:52:50Z

Code Sample, a copy-pastable example if possible

data = pd.read_stata("myfile.dta")
data = data.set_index(['country', 'year'])
data_delta = data.groupby('count').diff()

Problem description

Hi everyone! My first bug report :)

I'm having some problems with the .diff() argument, and first thought I was just being an idiot, but now I'm fairly confident I've isolated the bug.

Note, when I run this manually line-by-line it works fine, but I depend on this being inside a function (because I remove some columns before doing the differences and then reinstate them in a highly repetitive fashion).

For a long time I was on pandas 0.18.x and was using the following command fine:

data = data.groupby('country).diff().shift(-1)

But after upgrading to pandas 0.20.1, the behavior of diff seems to have changed, and now takes a periods argument, which is very useful to me! Now, the problem is I get thrown a error everytime I use it. The traceback looks like this:

Traceback (most recent call last):
  File "/Users/myname/anaconda/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2881, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-7-5e1d634b8803>", line 1, in <module>
    dat = feature_expand(data_everything, lags=2, lag_y=True, delta=True)
  File "<ipython-input-3-184a59b406db>", line 126, in feature_expand
    data_delta = data_delta.diff()
  File "<string>", line 21, in diff
  File "/Users/myname/anaconda/lib/python2.7/site-packages/pandas/core/groupby.py", line 612, in wrapper
    *args, **kwargs)
  File "/Users/myname/anaconda/lib/python2.7/site-packages/pandas/core/groupby.py", line 3481, in _aggregate_item_by_item
    raise errors
TypeError: diff() got an unexpected keyword argument 'axis'

Following the traceback I find a wrapper function in groupby.py under, _GroupBy._make_wrapper().wrapper, which says it does some "trickery for aggregation functions that need an axis", and seems to add the axis keyword argument by itself. This has probably been useful behaviour previously, but now it breaks .diff() as it doesn't take an axis argument anymore.

I hope someone has time to help me and the community with this.

Cheers :)

Expected Output

A dataframe of country-level first differences.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.20.1
pytest: 2.9.2
pip: 9.0.1
setuptools: 35.0.2
Cython: 0.24.1
numpy: 1.13.1
scipy: 0.19.1
xarray: None
IPython: 5.1.0
sphinx: 1.4.6
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.1.0
tables: 3.2.3.1
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: 3.7.3
bs4: 4.5.3
html5lib: 0.9999999
sqlalchemy: 1.1.9
pymysql: 0.7.9.None
psycopg2: 2.7.1 (dt dec pq3 ext lo64)
jinja2: 2.8
s3fs: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

chris-b1 · 2017-08-26T10:14:11Z

Could you please make this a runnable example? i.e., mock up something for data that reproduces your problem, thanks.

AdamHede · 2017-08-26T11:18:23Z

Yes! Here is an extract of the script im writing. I'm working with publicly available data (Varieties of Democrac)

In the attached .zip you'll find a python script with a single function. I load the data and then I try to process it. If you need any more information, please let me know :)

github_example.zip

pratapvardhan · 2017-08-26T15:45:30Z

Could you change the dtype of _merge and merge2 columns from int8 to int32. It works for me. I suspect if this has to do with dtypes. Could you confirm?

pratapvardhan · 2017-08-26T15:50:46Z

Suspect, it's related to #14773

AdamHede · 2017-08-27T09:54:10Z

Hi Everyone!

I tried changing the dtype of all int8 columns to int32, and now it works! it's a little strange, but a workable solution for me. Is there anything I can do to help fix the bug from here?

Thank you so much for your help

jreback · 2017-08-28T10:39:34Z

closing as duplicate

gfyoung added Groupby Regression Functionality that used to work in a prior pandas version labels Aug 26, 2017

jreback closed this as completed Aug 28, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Problem with DataFrame.diff() when using groupby getting "unexpected keyword argument 'axis'" due to built-in wrapper #17345

Problem with DataFrame.diff() when using groupby getting "unexpected keyword argument 'axis'" due to built-in wrapper #17345

AdamHede commented Aug 26, 2017 •

edited

Loading

INSTALLED VERSIONS

chris-b1 commented Aug 26, 2017

Uh oh!

AdamHede commented Aug 26, 2017

Uh oh!

pratapvardhan commented Aug 26, 2017

Uh oh!

pratapvardhan commented Aug 26, 2017

Uh oh!

AdamHede commented Aug 27, 2017

Uh oh!

jreback commented Aug 28, 2017

Uh oh!

Uh oh!

Problem with DataFrame.diff() when using groupby getting "unexpected keyword argument 'axis'" due to built-in wrapper #17345

Problem with DataFrame.diff() when using groupby getting "unexpected keyword argument 'axis'" due to built-in wrapper #17345

Comments

AdamHede commented Aug 26, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

chris-b1 commented Aug 26, 2017

Uh oh!

AdamHede commented Aug 26, 2017

Uh oh!

pratapvardhan commented Aug 26, 2017

Uh oh!

pratapvardhan commented Aug 26, 2017

Uh oh!

AdamHede commented Aug 27, 2017

Uh oh!

jreback commented Aug 28, 2017

Uh oh!

AdamHede commented Aug 26, 2017 •

edited

Loading

Output of `pd.show_versions()`