Skip to content

PERF: GroupBy.quantile #51385

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jbrockmendel opened this issue Feb 14, 2023 · 1 comment · Fixed by #51722
Closed

PERF: GroupBy.quantile #51385

jbrockmendel opened this issue Feb 14, 2023 · 1 comment · Fixed by #51722
Labels
Groupby Performance Memory or execution speed performance quantile quantile method
Milestone

Comments

@jbrockmendel
Copy link
Member

jbrockmendel commented Feb 14, 2023

import numpy as np
import pandas as pd

nrows = 10**7
ncols=10
ngroups = 6

qs = [0.5, 0.75]
arr = np.random.randn(nrows, ncols)
df = pd.DataFrame(arr)
df["A"] = np.random.randint(ngroups, size=nrows)

gb = df.groupby("A")

%timeit v1 = gb.quantile(qs)
39.6 s ± 1.74 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit v2 = {key: gb.get_group(key).quantile(qs) for key in gb.groups}
3.37 s ± 316 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

alt = pd.concat(v2).drop("A", axis=1)
alt.index.names = ["A", None]
assert alt.equals(v1)

Is GroupBy.quantile doing dramatically too much work?

@jbrockmendel jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 14, 2023
@lithomas1 lithomas1 added Groupby Performance Memory or execution speed performance quantile quantile method and removed Needs Triage Issue that has not been reviewed by a pandas team member Bug labels Feb 14, 2023
@jbrockmendel
Copy link
Member Author

Looks like if you pump ngroups up high enough (parity around 10**4) the non-cython version gets slower compared to the cython version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Performance Memory or execution speed performance quantile quantile method
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants