Skip to content

BUG: Getting got an unexpected keyword argument 'engine_kwargs' when doing groupby in 12.1.0 #55006

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
0x26res opened this issue Sep 5, 2023 · 1 comment · Fixed by #55042
Closed
2 of 3 tasks
Assignees
Labels
Apply Apply, Aggregate, Transform, Map Bug Groupby Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@0x26res
Copy link

0x26res commented Sep 5, 2023

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import sys

import pandas as pd


df = pd.concat(
    [
        pd.Series([], dtype="datetime64[ns]", name="key_1"),
        pd.Series([], dtype=float, name="value_1"),
        pd.Series([], dtype=float, name="value_2"),
        pd.Series([], dtype=float, name="value_2"),
    ],
    axis=1,
)

df.to_markdown(sys.stdout)

df.groupby(["key_1"]).aggregate({"value_1": "sum"})

Issue Description

I'm doing a groupby followed by aggregate, with a dictionary argument. My DataFrame has got duplicated column names, but none of the operations I'm using refer to the duplicate columns.

I get this error:

File "JetBrains/PyCharm2023.1/scratches/scratch_223.py", line 18, in <module>
    df.groupby(["key_1"]).aggregate({"value_1": "sum"})
  File "base_error_forward_featurevenv/lib/python3.10/site-packages/pandas/core/groupby/generic.py", line 1442, in aggregate
    result = op.agg()
  File "base_error_forward_featurevenv/lib/python3.10/site-packages/pandas/core/apply.py", line 175, in agg
    return self.agg_dict_like()
  File "base_error_forward_featurevenv/lib/python3.10/site-packages/pandas/core/apply.py", line 406, in agg_dict_like
    return self.agg_or_apply_dict_like(op_name="agg")
  File "base_error_forward_featurevenv/lib/python3.10/site-packages/pandas/core/apply.py", line 1390, in agg_or_apply_dict_like
    result_index, result_data = self.compute_dict_like(
  File "base_error_forward_featurevenv/lib/python3.10/site-packages/pandas/core/apply.py", line 463, in compute_dict_like
    key_data = [
  File "base_error_forward_featurevenv/lib/python3.10/site-packages/pandas/core/apply.py", line 464, in <listcomp>
    getattr(selected_obj._ixs(indice, axis=1), op_name)(how, **kwargs)
  File "base_error_forward_featurevenv/lib/python3.10/site-packages/pandas/core/series.py", line 4606, in aggregate
    result = op.agg()
  File "base_error_forward_featurevenv/lib/python3.10/site-packages/pandas/core/apply.py", line 1204, in agg
    result = super().agg()
  File "base_error_forward_featurevenv/lib/python3.10/site-packages/pandas/core/apply.py", line 172, in agg
    return self.apply_str()
  File "base_error_forward_featurevenv/lib/python3.10/site-packages/pandas/core/apply.py", line 580, in apply_str
    return self._apply_str(obj, func, *self.args, **self.kwargs)
  File "base_error_forward_featurevenv/lib/python3.10/site-packages/pandas/core/apply.py", line 663, in _apply_str
    return f(*args, **kwargs)
  File "base_error_forward_featurevenv/lib/python3.10/site-packages/pandas/core/series.py", line 6205, in sum
    return NDFrame.sum(self, axis, skipna, numeric_only, min_count, **kwargs)
  File "base_error_forward_featurevenv/lib/python3.10/site-packages/pandas/core/generic.py", line 12055, in sum
    return self._min_count_stat_function(
  File "base_error_forward_featurevenv/lib/python3.10/site-packages/pandas/core/generic.py", line 12020, in _min_count_stat_function
    nv.validate_func(name, (), kwargs)
  File "base_error_forward_featurevenv/lib/python3.10/site-packages/pandas/compat/numpy/function.py", line 416, in validate_func
    return validation_func(args, kwargs)
  File "base_error_forward_featurevenv/lib/python3.10/site-packages/pandas/compat/numpy/function.py", line 88, in __call__
    validate_args_and_kwargs(
  File "base_error_forward_featurevenv/lib/python3.10/site-packages/pandas/util/_validators.py", line 223, in validate_args_and_kwargs
    validate_kwargs(fname, kwargs, compat_args)
  File "base_error_forward_featurevenv/lib/python3.10/site-packages/pandas/util/_validators.py", line 164, in validate_kwargs
    _check_for_invalid_keys(fname, kwargs, compat_args)
  File "base_error_forward_featurevenv/lib/python3.10/site-packages/pandas/util/_validators.py", line 138, in _check_for_invalid_keys
    raise TypeError(f"{fname}() got an unexpected keyword argument '{bad_arg}'")
TypeError: sum() got an unexpected keyword argument 'engine_kwargs'
  • This is only happening with duplicated columns.
  • Digging in the code I thought it could be related to this change: c126eeb but I'm not 100% sure.
  • This was not happening in 12.0.3, only now in 12.1.0

Expected Behavior

This should not throw an exception.

Installed Versions

INSTALLED VERSIONS

commit : ba1cccd
python : 3.10.11.final.0
python-bits : 64
OS : Darwin
OS-release : 22.6.0
Version : Darwin Kernel Version 22.6.0: Wed Jul 5 22:22:05 PDT 2023; root:xnu-8796.141.3~6/RELEASE_ARM64_T6000
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8

pandas : 2.1.0
numpy : 1.25.1
pytz : 2023.3
dateutil : 2.8.2
setuptools : 68.1.0
pip : 23.2.1
Cython : None
pytest : 7.4.0
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.3
html5lib : None
pymysql : None
psycopg2 : 2.9.5
jinja2 : 3.1.2
IPython : 8.14.0
pandas_datareader : 0.10.0
bs4 : 4.12.2
bottleneck : None
dataframe-api-compat: None
fastparquet : None
fsspec : 2023.6.0
gcsfs : None
matplotlib : 3.6.0
numba : None
numexpr : 2.8.4
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 12.0.0
pyreadstat : None
pyxlsb : None
s3fs : 2023.6.0
scipy : 1.10.1
sqlalchemy : 2.0.9
tables : None
tabulate : 0.9.0
xarray : 2023.7.0
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : 2.2.0
pyqt5 : None

@0x26res 0x26res added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 5, 2023
@phofl phofl added this to the 2.1.1 milestone Sep 5, 2023
@rhshadrach rhshadrach self-assigned this Sep 5, 2023
@rhshadrach
Copy link
Member

rhshadrach commented Sep 6, 2023

Result of a git-bisect:

commit 76e02e459e84801377f5021dca859f01f4c7dcb2
Author: Thomas Li <[email protected]>
Date:   Fri Jun 2 14:01:51 2023 -0700

    ENH: Groupby agg support multiple funcs numba (#53486)

cc @lithomas1 - I'm already working on a PR for this.

The groupby code uses if maybe_use_numba(engine): to determine whether to add engine/engine_kwargs to kwargs for the purposes of passing through. I think we can do that here too.

Edit: Actually, I think it's safer only to passthrough when they are in self.kwargs.

This was wrong too; see the linked PR.

@rhshadrach rhshadrach added Groupby Regression Functionality that used to work in a prior pandas version Apply Apply, Aggregate, Transform, Map and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Bug Groupby Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants