-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Weighted eCDF #5119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weighted eCDF #5119
Conversation
I have no experience with ecdfs so I can't really comment on the correctness of the weighted implementation. @clauswilke or @yutannihilation do you feel confident in reviewing this? |
I too am not familiar with eCDF, sorry... |
The calculation looks correct on first glance. I might want to read it through a little more carefully before signing off on it, but fundamentally it's very simple. An eCDF is simply a cumulative sum of the ordered values, divided by the total sum. To make this weighted, you multiply each value by a weight before you sum. I have one concern though: I don't particularly like using a built-in function to calculate eCDF and a custom function to calculate weCDF. Why not use the same function for both and set the weights to 1 if not provided? |
Fair point, I don't think there is a particular reason I did it this way. Setting the weights to 1 should indeed give identical output. |
Merge branch 'main' into weighted_ecdf # Conflicts: # R/stat-ecdf.R # tests/testthat/test-stat-ecdf.R
Merge branch 'main' into weighted_ecdf # Conflicts: # R/stat-ecdf.R # tests/testthat/test-stat-ecdf.R
Merge branch 'weighted_ecdf' of https://github.com/teunbrand/ggplot2 into weighted_ecdf # Conflicts: # R/stat-ecdf.R
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM apart from the one comment
R/stat-ecdf.R
Outdated
if (is.null(data$weight)) { | ||
data_ecdf <- ecdf(data$x)(x) | ||
} else { | ||
data_ecdf <- wecdf(data$x, data$weight)(x) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should just use wecdf()
as per the discussion
Merge branch 'main' into weighted_ecdf # Conflicts: # R/stat-ecdf.R
This PR aims to fix #5058.
Briefly, it adds an optional
weight
aesthetic tostat_ecdf()
. If a weight is present, we calculate the ecdf in a different way, wherein each observations is weighted by the amount of the observation's weight relative to the sum of all weights in the group.