Skip to content

Problem with DataFrame.diff() when using groupby getting "unexpected keyword argument 'axis'" due to built-in wrapper #17345

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
AdamHede opened this issue Aug 26, 2017 · 6 comments
Labels
Groupby Regression Functionality that used to work in a prior pandas version

Comments

@AdamHede
Copy link

AdamHede commented Aug 26, 2017

Code Sample, a copy-pastable example if possible

data = pd.read_stata("myfile.dta")
data = data.set_index(['country', 'year'])
data_delta = data.groupby('count').diff()

Problem description

Hi everyone! My first bug report :)

I'm having some problems with the .diff() argument, and first thought I was just being an idiot, but now I'm fairly confident I've isolated the bug.

Note, when I run this manually line-by-line it works fine, but I depend on this being inside a function (because I remove some columns before doing the differences and then reinstate them in a highly repetitive fashion).

For a long time I was on pandas 0.18.x and was using the following command fine:

data = data.groupby('country).diff().shift(-1)

But after upgrading to pandas 0.20.1, the behavior of diff seems to have changed, and now takes a periods argument, which is very useful to me! Now, the problem is I get thrown a error everytime I use it. The traceback looks like this:

Traceback (most recent call last):
  File "/Users/myname/anaconda/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2881, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-7-5e1d634b8803>", line 1, in <module>
    dat = feature_expand(data_everything, lags=2, lag_y=True, delta=True)
  File "<ipython-input-3-184a59b406db>", line 126, in feature_expand
    data_delta = data_delta.diff()
  File "<string>", line 21, in diff
  File "/Users/myname/anaconda/lib/python2.7/site-packages/pandas/core/groupby.py", line 612, in wrapper
    *args, **kwargs)
  File "/Users/myname/anaconda/lib/python2.7/site-packages/pandas/core/groupby.py", line 3481, in _aggregate_item_by_item
    raise errors
TypeError: diff() got an unexpected keyword argument 'axis'

Following the traceback I find a wrapper function in groupby.py under, _GroupBy._make_wrapper().wrapper, which says it does some "trickery for aggregation functions that need an axis", and seems to add the axis keyword argument by itself. This has probably been useful behaviour previously, but now it breaks .diff() as it doesn't take an axis argument anymore.

I hope someone has time to help me and the community with this.

Cheers :)

Expected Output

A dataframe of country-level first differences.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.20.1
pytest: 2.9.2
pip: 9.0.1
setuptools: 35.0.2
Cython: 0.24.1
numpy: 1.13.1
scipy: 0.19.1
xarray: None
IPython: 5.1.0
sphinx: 1.4.6
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.1.0
tables: 3.2.3.1
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: 3.7.3
bs4: 4.5.3
html5lib: 0.9999999
sqlalchemy: 1.1.9
pymysql: 0.7.9.None
psycopg2: 2.7.1 (dt dec pq3 ext lo64)
jinja2: 2.8
s3fs: None
pandas_gbq: None
pandas_datareader: None

@chris-b1
Copy link
Contributor

Could you please make this a runnable example? i.e., mock up something for data that reproduces your problem, thanks.

@AdamHede
Copy link
Author

Yes! Here is an extract of the script im writing. I'm working with publicly available data (Varieties of Democrac)

In the attached .zip you'll find a python script with a single function. I load the data and then I try to process it. If you need any more information, please let me know :)

github_example.zip

@gfyoung gfyoung added Groupby Regression Functionality that used to work in a prior pandas version labels Aug 26, 2017
@pratapvardhan
Copy link
Contributor

Could you change the dtype of _merge and merge2 columns from int8 to int32. It works for me. I suspect if this has to do with dtypes. Could you confirm?

@pratapvardhan
Copy link
Contributor

Suspect, it's related to #14773

@AdamHede
Copy link
Author

Hi Everyone!

I tried changing the dtype of all int8 columns to int32, and now it works! it's a little strange, but a workable solution for me. Is there anything I can do to help fix the bug from here?

Thank you so much for your help

@jreback
Copy link
Contributor

jreback commented Aug 28, 2017

closing as duplicate

@jreback jreback closed this as completed Aug 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

No branches or pull requests

5 participants