Scaled densities/counts in 2d density/bins plots (#2679) #2680

bjreisman · 2018-06-01T15:57:51Z

I added 'scaled' statistics to the stat_bin2d, stat_binhex, stat_density_2d, and stat_contour functions for plotting 2d distributions normalized to a common height. This would be very useful for faceted 2d plots, where the maximum density/count can vary greatly between panels.

(This is my attempt to add the feature I had requested in issue #2679.)

I tried my best to mirror the corresponding statistics from the 1d functions:

stat_bin	stat_bin2d	stat_binhex	stat_bin2d (updated)	stat_binhex (updated)
`count`	`count`	`count`	`count`	`count`
`ncount`			`ncount`	`ncount`
`density`	`density`	`density`	`density`	`density`
`ndensity`			`ndensity`	`ndensity`

It was a little bit harder to get things to match-up one-to-one for the density based function, so some adjustment to the syntax may be needed.

stat_density	stat__contour	stat_density2d	stat__contour (updated)	stat_density2d (updated)
`count`	-	-	-
`density`	-	`density`	-	`density`
`scaled`	-	-	-	`scaled`
-	`pieces`	-	`pieces`	-
-	`level`	-	`level`	-
-	-	-	`nlevel`	-

Here is an example of the revised functions in action:

library(ggplot2)
library(dplyr)
library(viridis)

ggplot(diamonds, aes(x=x, y= depth)) +
  stat_density_2d(aes(fill = stat(level)),
                  geom = "polygon",
                  n = 100,
                  bins = 10,
    contour = T) +
  facet_wrap(clarity~.) +
  scale_fill_viridis(option = "A")

ggplot(diamonds, aes(x=x, y= depth)) +
  stat_density_2d(aes(fill = stat(nlevel)),
                  geom = "polygon",
                  n = 100,
                  bins = 10,
    contour = T) +
  facet_wrap(clarity~.) +
  scale_fill_viridis(option = "A")

clauswilke · 2018-06-01T18:40:43Z

This seems like a good idea to me. Unfortunately it's too late for 2.3.0, we're in a soft freeze.

A couple of comments:

Please make sure your coding style matches the ggplot2 style (e.g., spaces around /).
The documentation should be updated. Already, the "computed variables" sections of geom_density_2d() and geom_contour() seem incomplete, and they will be even more incomplete after this change.

clauswilke · 2018-06-01T18:43:07Z

R/stat-density-2d.r

@@ -60,12 +60,13 @@ StatDensity2d <- ggproto("StatDensity2d", Stat,
      lims = c(scales$x$dimension(), scales$y$dimension())
    )
    df <- data.frame(expand.grid(x = dens$x, y = dens$y), z = as.vector(dens$z))
+    df$nz <- df$z/max(df$z)


Why nz? This naming seems strange.

I wasn't sure what to call this one. I went with nz for "normalized z" as in ndensity and ncount. In my own code, I normally name variables which are derived from other variables using "x.difference" so z would become, z.scaled or similar, but I wasn't sure what the convention was for ggplot2.

Alternatively, we could just call it scaled, but then the renaming that happens a few lines down would be somewhat redundant.

I find it strange to introduce a variable name and then change it 5 lines down. Also, what happens if the if (contour) { branch is executed? What does StatContour do to the nz column?

I'm inclined to agree; however, I tried to preserve as much of the approach of the original stat_density function as I could and that was the approach taken there:

#exerpt from stat_denisty, line 62 df <- data.frame(expand.grid(x = dens$x, y = dens$y), z = as.vector(dens$z)) df$group <- data$group[1] if (contour) { StatContour$compute_panel(df, scales, bins, binwidth) } else { names(df) <- c("x", "y", "density", "group") df$level <- 1 df$piece <- 1 df }

StatContour doesn't do anything with the nz column, so to avoid having to rename things we could just name it when we create it.

df <- data.frame(expand.grid(x = dens$x, y = dens$y), z = as.vector(dens$z)) df$scaled <- df$z / max(df$z) ## <- here df$group <- data$group[1] if (contour) { StatContour$compute_panel(df, scales, bins, binwidth) } else { names(df) <- c("x", "y", "density", "scaled", "group") df$level <- 1 df$piece <- 1 df }

Or we could just create it later on, which avoids passing a column that won't be used to StatContour

df <- data.frame(expand.grid(x = dens$x, y = dens$y), z = as.vector(dens$z)) df$group <- data$group[1] if (contour) { StatContour$compute_panel(df, scales, bins, binwidth) } else { names(df) <- c("x", "y", "density", "scaled", "group") df$scaled <- df$density / max(df$density) ## <- here df$level <- 1 df$piece <- 1 df }

Or perhaps there's a third even cleaner method to achieve the same goal. Which approach do you think would be best?

I think the later creation is better, though you shouldn't name it before you add it to the data frame. :-)

Fixed! ✨

This may be a separate branch/pull request, but it might be worthwhile to change the behavior of stat_density2d when aes(fill = stat(level)) is called when contour = FALSE and the behavior of aes(fill = stat(density)) when contour = TRUE to return an error or a warning. Neither of those statistics are meaningful when called in the wrong context and it can return some cryptic errors as seen below. 😬

library(ggplot2) library(dplyr) library(viridis) #2d densiy plot colored by level diamonds %>% ggplot(aes(x=x, y= depth)) + stat_density_2d(aes(fill = stat(density)), geom = "raster", n = 100, bins = 10, contour = T) + facet_wrap(clarity~.) + scale_fill_viridis(option = "A") #> Error in is.finite(x) : default method not implemented for type 'closure'

diamonds %>% ggplot(aes(x=x, y= depth)) + stat_density_2d(aes(fill = stat(level)), geom = "polygon", n = 100, bins = 10, contour = F) + facet_wrap(clarity~.) + scale_fill_viridis(option = "A")

bjreisman · 2018-06-01T19:31:56Z

Thanks for the tips! I've updated the styling and documentation as suggested. I wasn't sure how to best describe the pieces computed variable in stat_contour, so that may need more revision than the others.

hadley · 2018-07-08T22:47:53Z

@clauswilke can you please finish this off by approving the PR (if you think it's ready to be merged)

clauswilke · 2018-07-08T23:19:40Z

@hadley Yes. One thing that I think needs to be fixed in the ggplot2 codebase first is a new section in NEWS.md for post 3.0.0 development.

@bjreisman At a minimum, I think roxygen needs to be rerun so all the documentation changes are processed correctly. I would also suggest adding a line to NEWS.md (not NEWS!). It would have to go above the current ggplot2 3.0.0 section. I'll read the code over carefully tomorrow and let you know if I see anything else.

hadley · 2018-07-08T23:24:15Z

Oops yes, done.

clauswilke · 2018-07-09T05:25:05Z

R/stat-density-2d.r

@@ -66,6 +71,7 @@ StatDensity2d <- ggproto("StatDensity2d", Stat,
      StatContour$compute_panel(df, scales, bins, binwidth)
    } else {
      names(df) <- c("x", "y", "density", "group")
+      df$scaled <- df$density / max(df$density)


Is there a reason this is called scaled and not ndensity as in the other stats? Also, does the max function need an argument na.rm = TRUE?

Thanks for taking the time to consider this addition!

I wasn't entirely happy with the switch between ndensity to scaled either and would actually prefer to use ndensity here. However, in stat_density [the 1d version], the rescaled statistic is called scaled while in stat_bin it's called ndensity or ncount.

Since stat_density_2d is the 2d version of stat_density I tried to mirror the syntax by naming it scaled. I think it might make the most sense to use one syntax throughout all the stat functions (ndensity), but I will leave that to your discretion. To summarize, I think there are a three options.

Mirror the syntax from the 1d versions of the stat layers such that stat_density_2d and stat_density use scaled while stat_bin2d, stat_hexbin, and stat_bin use ndensity and ncount. (implemented this PR)

(2a) Switch over to a single syntax for the 2d stats with this version - ndensity replaces scaled in stat_density_2d.

(2b) Switch over to a single syntax for all density based stats - ndensity replaces scaled in both stat_density and stat_denisty_2d. I suppose you would want to just duplicate scaled and name the new variable ndensity to avoid breaking existing code.

--
RE: na.rm = TRUE; you're correct, I've added that to my local version for my next PR. I can submit that change once scaled vs. ndensity is settled and I'll update the documentation and news as well.

I think option 2b (uniform naming with duplicated scaled column for stat_density) is best. But I'd like to hear @hadley's input as well.

Yeah, 2b sounds good (assume it doesn't affect existing code, in which case scaled should be added, instead of replacing the existing variable name)

bjreisman · 2018-07-10T22:06:22Z

Okedoke, I've implemented option 2b with the latest commit, as updating NEWS.md and adding an example to geom_density2d() for the new functionality.

I tried running roxygen to update the documentation, but for some reason it didn't recognize the \section{Aesthetics}{ tag, so those were dropped from every geom on my local version. It may have something to do with the following warning: "Version of roxygen2 last used with this package is 6.0.1.9000. You only have version 6.0.1." When I tried to install the dev version from github it wouldn't run. I think I'm running into this error: Having problems with slash direction in Roxygen2 package. Any tips on how to overcome this?

clauswilke · 2018-07-10T22:47:54Z

NEWS.md

+
+* `stat_density()` now includes the calculated statistic `nlevel`, an alias 
+  for `scaled`, to better match the syntax of `stat_bin` (@bjreisman, #2680)
+


Unfortunately this whole part seems to have ended in the wrong section of the file. It should be at the very top, under ggplot2 3.0.0.9000. You may have to rebase. Also, the whole section should only be 1-2 sentences, such as "stat_contour(), stat_density2d(), stat_bin2d(), stat_binhex() all now also calculate normalized densities, levels, and counts (@bjreisman, #2680)".

Apologies, one more comment: The issue this is closing is #2679, not #2680. #2680 is the pull request. And one of the git commit messages should contain the phrase "Closes #2679" or "Fixes #2679" so that github automatically closes the issue once the pull request is integrated. You can add this to the next commit you make.

clauswilke · 2018-07-10T23:00:28Z

R/stat-density.r

@@ -21,6 +21,8 @@
 #'   \item{count}{density * number of points - useful for stacked density
 #'      plots}
 #'   \item{scaled}{density estimate, scaled to maximum of 1}
+#'   \item{ndensity}{alias for stat(scaled), to mirror the syntax of
+#'    geom_bin}


Can you put stat(scaled) and geom_bin in backticks so they show up as code? And also please write geom_bin() (with parentheses).

clauswilke · 2018-07-10T23:02:49Z

I suspect I can fix the roxygen issue if you give me access to your repo fork. Won't get to it until later in the week though.

bjreisman · 2018-07-11T12:32:41Z

Done! Thanks for taking the time to review these changes.

…stograms (bin2d and binhex).

+ updated documentation (revisions suggested by clauswilke) Notes: Wasn't sure how to best describe the 'pieces' computed variable generated by stat_contour; 'nz' variable remains for now in stat_density_2d, open to suggestions for a better name.

…to minimize renaming of existing variables. 🎨

- swapped `scaled` for `ndensity` in `stat_density2d()`; - added `ndensity` to `stat_density` as an alias for `scaled`; - added an example to `geom_densityd2()`; - updated NEWS.md to reflect these changes.

clauswilke · 2018-07-11T17:21:06Z

@bjreisman I have rebased your PR and made some documentation cleanups. Let's see if all the checks pass.

Most importantly, you were using scale_fill_viridis() in two examples, but that scale does not exist in ggplot2 out of the box. I changed those to use scale_fill_viridis_c(), which does exist. It's important that all examples can be run as written without any unspecified dependencies. Otherwise CRAN check will fail.

clauswilke · 2018-07-11T20:40:17Z

I just merged this, thanks!

lock · 2019-01-07T21:31:20Z

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

clauswilke reviewed Jun 1, 2018

View reviewed changes

clauswilke reviewed Jul 9, 2018

View reviewed changes

clauswilke reviewed Jul 10, 2018

View reviewed changes

bjreisman and others added 8 commits July 11, 2018 14:04

Added scaled statistics to 2d density plots, contour plots, and 2d hi…

00a4d1f

…stograms (bin2d and binhex).

+ ggplot2 styling (spaces around "/")

8b80283

+ updated documentation (revisions suggested by clauswilke) Notes: Wasn't sure how to best describe the 'pieces' computed variable generated by stat_contour; 'nz' variable remains for now in stat_density_2d, open to suggestions for a better name.

Changed where the 'scaled' variable is computed and when it is named …

bc9ca1c

…to minimize renaming of existing variables. 🎨

scaled => ndensity

7885f49

- swapped `scaled` for `ndensity` in `stat_density2d()`; - added `ndensity` to `stat_density` as an alias for `scaled`; - added an example to `geom_densityd2()`; - updated NEWS.md to reflect these changes.

updated test-stat-density with the addition of ndensity.

a0372e6

Fixes tidyverse#2679

7c64c91

update docs

ab633f8

documentation cleanups

3efa400

clauswilke force-pushed the master branch from 814d0d1 to 3efa400 Compare July 11, 2018 17:17

clauswilke approved these changes Jul 11, 2018

View reviewed changes

clauswilke merged commit 08e5f6c into tidyverse:master Jul 11, 2018

bjreisman mentioned this pull request Jul 11, 2018

Feature request: "Scaled" statistic for height. wilkelab/ggridges#24

Closed

lock bot locked and limited conversation to collaborators Jan 7, 2019


		* `stat_density()` now includes the calculated statistic `nlevel`, an alias
		for `scaled`, to better match the syntax of `stat_bin` (@bjreisman, #2680)

Scaled densities/counts in 2d density/bins plots (#2679) #2680

Scaled densities/counts in 2d density/bins plots (#2679) #2680

Uh oh!

Conversation

bjreisman commented Jun 1, 2018

Uh oh!

clauswilke commented Jun 1, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bjreisman Jun 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bjreisman commented Jun 1, 2018

Uh oh!

hadley commented Jul 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clauswilke commented Jul 8, 2018

Uh oh!

hadley commented Jul 8, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bjreisman commented Jul 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clauswilke commented Jul 10, 2018

Uh oh!

bjreisman commented Jul 11, 2018

Uh oh!

clauswilke commented Jul 11, 2018

Uh oh!

clauswilke commented Jul 11, 2018

Uh oh!

lock bot commented Jan 7, 2019

Uh oh!

Uh oh!

bjreisman Jun 1, 2018 •

edited

Loading

hadley commented Jul 8, 2018 •

edited

Loading

bjreisman commented Jul 10, 2018 •

edited

Loading