Skip to content

QST: Pandas agg(list) behavior changed since version 1.3.0 #42727

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
buhtz opened this issue Jul 26, 2021 · 2 comments · Fixed by #42762
Closed

QST: Pandas agg(list) behavior changed since version 1.3.0 #42727

buhtz opened this issue Jul 26, 2021 · 2 comments · Fixed by #42762
Assignees
Labels
Apply Apply, Aggregate, Transform, Map Bug Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@buhtz
Copy link

buhtz commented Jul 26, 2021

This question (I am not sure if this is a bug) is still on StackOverflow
https://stackoverflow.com/q/68526846/4865723
I was asked there to open an Issue about it here.

The result of .agg(list, axis=1) changed since pandas Version 1.3.0. The goal of my question is to understand what changed and why; and of course how to solve this.

#!/usr/bin/env python3
import pandas as pd
import numpy as np

print(pd.__version__)
df = pd.DataFrame(
    {
        'PERSON': ['Maya', 'Maya', 'Jim', 'Jim'],
        'DAY': ['2016-01-14', '2016-01-14', '2016-02-21', '2016-02-21'],
        'FOO': [12, 12, 9, 7],
        'BAR': range(4)
    }
)
print(df)
res = df.loc[:, ['FOO', 'BAR']].agg(list, axis=1)
print(res)

This is the result with the last pre-1.30 version of Pandas. The two selected columns are "joined" into a list.

1.2.5
  PERSON         DAY  FOO  BAR
0   Maya  2016-01-14   12    0
1   Maya  2016-01-14   12    1
2    Jim  2016-02-21    9    2
3    Jim  2016-02-21    7    3

0    [12, 0]
1    [12, 1]
2     [9, 2]
3     [7, 3]
dtype: object
>>> 

But since pandas 1.3.0 the result is.

   FOO  BAR
0   12    0
1   12    1
2    9    2
3    7    3

I looked into the changelog of pandas 1.3.0. There is nothing about agg() but a lot about apply() and transform(). But I do not understand the details and I can see which one of this points is related to my situation.

I am sure the pandas devs have a good reason to change this behavior. When I understand the background of that decision I am maybe able to find a solution for it.

Sideinfo: In the final productive code I like to do things like this

df['NEW_COLUMN'] = df.loc[:, ['FOO', 'BAR']].agg(list, axis=1)
@buhtz buhtz added Needs Triage Issue that has not been reviewed by a pandas team member Usage Question labels Jul 26, 2021
@MarcoGorelli
Copy link
Member

MarcoGorelli commented Jul 26, 2021

Thanks for the report - looks like this changed with #40428, so tagging @rhshadrach

(labelling as "regression" for now, though note that apart from running git bisect I haven't looked into this yet)

@MarcoGorelli MarcoGorelli added Apply Apply, Aggregate, Transform, Map Regression Functionality that used to work in a prior pandas version and removed Needs Triage Issue that has not been reviewed by a pandas team member Usage Question labels Jul 26, 2021
@rhshadrach
Copy link
Member

Thanks @Codeberg-AsGithubAlternative-buhtz and @MarcoGorelli. Agreed this is a regression. The issue is that agg falls back to apply. Prior to the PR identified above, we would do

df[['FOO', 'BAR']].apply(list, axis=1)

giving the pre-1.3.0 result. After the PR, we now do

df[['FOO', 'BAR']].T.apply(list, axis=0)

When axis=0, apply interprets the list result as being "Series-like" and converts the result into a Series, resulting in the identity operation. When axis=1, this no longer happens.

Ideally we would not fallback to apply here, but for fixing this regression, I do not see another option besides reverting #40428. Will do so for 1.3.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Bug Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants