Skip to content

PERF: Groupby aggregations with Categorical #52120

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 22, 2023

Conversation

jbrockmendel
Copy link
Member

import numpy as np
np.random.seed(94356)
import pandas as pd

arr = np.arange(5).repeat(10**5)
grps = np.random.randint(0, 3, size=arr.size)

cat = pd.Categorical(arr, ordered=True)
ser = pd.Series(cat)
gb = ser.groupby(grps)

%timeit gb.min()
5.4 ms ± 212 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  # <- main
1.61 ms ± 14.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)  # <- PR

%timeit gb.first()
5.59 ms ± 136 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  # <- main
1.64 ms ± 14.5 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)  # <- PR

@mroeschke mroeschke added Groupby Performance Memory or execution speed performance Categorical Categorical Data Type labels Mar 22, 2023
@mroeschke mroeschke added this to the 2.1 milestone Mar 22, 2023
@mroeschke mroeschke merged commit 14affe0 into pandas-dev:main Mar 22, 2023
@mroeschke
Copy link
Member

Thanks @jbrockmendel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Groupby Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants