-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Feature Request for Ungroup Method for Grouped Data Frames #43902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
-1 as |
That would be a nice feature and may come handy at some times. |
It's definitely a handy tool to implement that would help beginners seeking to extract groups. It also parallels R's tidyverse which has ungroup() in dplyr, so it might make it easier for R users to transition to pandas. |
For me, this feature could be useful. For reference, It would really come into its own if the (df.assign(normalised_value = lambda x: x['value'] / x.groupby('group').transform('sum')['value'],
normalising_value = lambda x: x.groupby('group').transform('sum')['value'])
.more_methods...()
) It'd be nice to have something like: (df.groupby('group')
.assign(normalised_value = lambda x: x['value']/x['value'].sum(),
normalising_value = lambda x: x['value'].sum())
.ungroup()
.more_methods...()
) The R dplyr equivalent being: df %>%
group_by(group) %>%
mutate(normalised_value = value / sum(value)
normalising_value = sum(value)) %>%
ungroup() %>%
more_methods...() |
Looks like some of you are leaning toward R/dplyr styles. Check out An example with @s-pike 's R code: >>> from datar.all import f, tibble, group_by, mutate, ungroup, row_number, sum
[2022-03-17 11:25:33][datar][WARNING] Builtin name "sum" has been overriden by datar.
>>> df = tibble(group=[1,1,2,2], value=[1,2,3,4])
>>> (
... df
... >> group_by(f.group)
... >> mutate(normalised_value=f.value/sum(f.value), normalising_value=sum(f.value))
... >> ungroup()
... >> mutate(n=row_number())
... )
group value normalised_value normalising_value n
<int64> <int64> <float64> <int64> <float64>
0 1 1 0.333333 3 1.0
1 1 2 0.666667 3 2.0
2 2 3 0.428571 7 3.0
3 2 4 0.571429 7 4.0 |
ungroup as a simple wrapper seems like a no brainer. Especially for people new to python that came from R. But in general why would you write |
if u think this is a useful then show a complete example the above is not very compelling |
I'm not sure I understand what you're looking for @jreback. Especially if you're referring to @s-pike 's example. An example of why it might be useful to have a wrapper for ungrouping a dataframe? If you need to recover the original order as is common with unlabeled numpy data for machine learning, having an ordered df by group makes matching the two datasets difficult. This is a task that happens to me frequently. |
a compelling example in code not words |
Can you answer my question by any chance? That'll make it easier for me to know what you're looking for. |
yes if u have something that could be useful api i need a compelling example |
The point isn't that the above can't be used before |
PS because you are being kind of rude in how you're responding to me and the other users: more people in this thread think this would be useful than do not so it would be great if you could explain why you think this isn't useful beyond an appeal to tradition that there's a more "idiomatic" way of doing it. |
@M-Harrington it's amazing how these comments just hurt open source maintainers - woa if i actually criticized something. that said - your example still doesn't explain how ungroup actually adds anything to syntax, clarity or understanding of the code i was expecting a lot more from someone who teaches |
@jreback Here's a slimmed down version of the code: def combine_multiple_datasets(backing_dses: List[Dataset]) -> Dataset:
"""A simple wiring together of multiple Datasets into one Dataset that is effectively the children, combined."""
assert len(backing_dses), "Should have at least one backing dataset"
ds_instance = Dataset.__new__(Dataset)
# elided: copy fields (grouping_key, feature_columns, etc.) from children.
# Combining Groupby's manually
all_data = [
(group_name, df)
for groupby in map(attrgetter('grouped'), backing_dses)
for group_name, df in groupby
]
ds_instance.grouped = pd.concat([df for _, df in all_data]).groupby(ds_instance.grouping_key)
return ds_instance When an def combine_multiple_datasets_PREFERED(backing_dses: List[Dataset]) -> Dataset:
"""A simple wiring together of multiple Datasets into one Dataset that is effectively the children, combined."""
assert len(backing_dses), "Should have at least one backing dataframe"
ds_instance = Dataset.__new__(Dataset)
# elided: copy fields (grouping_key, feature_columns, etc.) from children.
# Combining Groupby's with DataFrameGroupby.ungroup()
ds_instance.grouped = pd.concat([ds.grouped.ungroup() for ds in backing_dses]).groupby(ds_instance.grouping_key)
return ds_instance Also, I don't have an R background. Just do OOP occasionally and want to leverage convenient lower-level abstractions in an elegant way. |
jreback, nobody is forcing you to ad hominem. If that's what being part of the open source community means to you, by all means, please stop doing so. No seriously, just don't respond to this issue or this comment. Somebody else will pick it up, or not and then whatever. When you treat the people who use your package poorly, you're not doing anyone a service, either the package, or the people who are trying to use and learn about it. As @stephenjfox said, we're just asking for something that "leverage[s] convenient lower-level abstractions in an elegant way". Other benefits include: a chance to implement it in a more efficient way than allocating more memory to an object that already exists within the groupby object as df.groupby.obj . |
Hi, thanks for your work developing
pandas
. I'd like to request a feature to add an ungroup() method for grouped data frames. It's related to this StackOverflow where I've developed a hack to extract using the.obj
to pull out the original data frame from the grouped data frame.However, it would be helpful to have a method developed, which would do the extraction to prevent users from depending on my hack.
The text was updated successfully, but these errors were encountered: