Skip to content

Weighted eCDF #5119

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
May 20, 2024
Merged

Weighted eCDF #5119

merged 14 commits into from
May 20, 2024

Conversation

teunbrand
Copy link
Collaborator

This PR aims to fix #5058.

Briefly, it adds an optional weight aesthetic to stat_ecdf(). If a weight is present, we calculate the ecdf in a different way, wherein each observations is weighted by the amount of the observation's weight relative to the sum of all weights in the group.

@thomasp85
Copy link
Member

I have no experience with ecdfs so I can't really comment on the correctness of the weighted implementation. @clauswilke or @yutannihilation do you feel confident in reviewing this?

@yutannihilation
Copy link
Member

I too am not familiar with eCDF, sorry...

@clauswilke
Copy link
Member

clauswilke commented Mar 22, 2023

The calculation looks correct on first glance. I might want to read it through a little more carefully before signing off on it, but fundamentally it's very simple. An eCDF is simply a cumulative sum of the ordered values, divided by the total sum. To make this weighted, you multiply each value by a weight before you sum.

I have one concern though: I don't particularly like using a built-in function to calculate eCDF and a custom function to calculate weCDF. Why not use the same function for both and set the weights to 1 if not provided?

@teunbrand
Copy link
Collaborator Author

I have one concern though: I don't particularly like using a built-in function to calculate eCDF and a custom function to calculate weCDF. Why not use the same function for both and set the weights to 1 if not provided?

Fair point, I don't think there is a particular reason I did it this way. Setting the weights to 1 should indeed give identical output.

Merge branch 'main' into weighted_ecdf

# Conflicts:
#	R/stat-ecdf.R
#	tests/testthat/test-stat-ecdf.R
Merge branch 'main' into weighted_ecdf

# Conflicts:
#	R/stat-ecdf.R
#	tests/testthat/test-stat-ecdf.R
Merge branch 'weighted_ecdf' of https://github.com/teunbrand/ggplot2 into weighted_ecdf

# Conflicts:
#	R/stat-ecdf.R
@teunbrand teunbrand added feature a feature request or enhancement layers 📈 labels Jul 9, 2023
Copy link
Member

@thomasp85 thomasp85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart from the one comment

R/stat-ecdf.R Outdated
Comment on lines 124 to 128
if (is.null(data$weight)) {
data_ecdf <- ecdf(data$x)(x)
} else {
data_ecdf <- wecdf(data$x, data$weight)(x)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should just use wecdf() as per the discussion

teunbrand added 4 commits May 20, 2024 10:39
Merge branch 'main' into weighted_ecdf

# Conflicts:
#	R/stat-ecdf.R
@teunbrand teunbrand merged commit e942833 into tidyverse:main May 20, 2024
11 checks passed
@teunbrand teunbrand deleted the weighted_ecdf branch May 20, 2024 09:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement layers 📈
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add weights to stat_ecdf
4 participants