Skip to content

calculate z.range in stat_contour and stat_contour_filled on the full layer data #3883

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

thomasp85
Copy link
Member

@thomasp85 thomasp85 commented Mar 10, 2020

Fixes #3875

This PR fixes the issue as discussed in the thread. The issue is also present in the standard stat_contour where the levels of the contour lines would not be equivalent across panels

Old buggy behaviour

library(ggplot2)
ggplot(faithfuld, aes(waiting, eruptions, z = density)) + 
  geom_contour_filled() + 
  facet_wrap(~waiting < 70)

Created on 2020-03-10 by the reprex package (v0.3.0.9001)

Working example with PR

library(ggplot2)
ggplot(faithfuld, aes(waiting, eruptions, z = density)) + 
  geom_contour_filled() + 
  facet_wrap(~waiting < 70)

Created on 2020-03-10 by the reprex package (v0.3.0.9001)

@thomasp85 thomasp85 added this to the ggplot2 3.3.1 milestone Mar 10, 2020
@thomasp85 thomasp85 requested a review from clauswilke March 10, 2020 13:29
@clauswilke
Copy link
Member

clauswilke commented Mar 10, 2020

I think this breaks the use of StatContour by StatDensity2d:

dens <- MASS::kde2d(
data$x, data$y, h = h, n = n,
lims = c(scales$x$dimension(), scales$y$dimension())
)
df <- expand.grid(x = dens$x, y = dens$y)
df$z <- as.vector(dens$z)
df$group <- data$group[1]
if (contour) {
StatContour$compute_panel(df, scales, bins, binwidth)

We don't know the overall z range of inferred densities until we have inferred them for all groups.

Does the following reprex work with your branch?

library(ggplot2)

set.seed(4393)
dsmall <- diamonds[sample(nrow(diamonds), 1000), ]
ggplot(dsmall, aes(x, y)) + 
  geom_density_2d(aes(colour = after_stat(level))) + 
  facet_grid(. ~ cut)

Created on 2020-03-10 by the reprex package (v0.3.0)

@clauswilke
Copy link
Member

Continuing discussion from here: #3875 (comment)

Moving to numeric will loose information and remove the possibility of using binned guides which is a very bad move

Just to be clear: I think we should return both. So this is primarily a discussion about what the default mapping for fill should be, and whether the column called level should be numeric or categorical. Maybe the compromise is to keep level numeric for consistency but by default map to the column that returns the factor, whatever its name? :-)

Also, unless I'm missing something, it's fairly easy to go from the numeric limits to the ordered factor. It's much harder the other way round, because we don't have a simple inverse to glue().

library(ggplot2)
library(glue)

set.seed(4393)
dsmall <- diamonds[sample(nrow(diamonds), 1000), ]

ggplot(dsmall, aes(x, y)) +
  geom_density_2d_filled(aes(fill = after_stat(bin))) +
  scale_fill_viridis_d(name = NULL)

  
ggplot(dsmall, aes(x, y)) +
  geom_density_2d_filled(
    aes(fill = after_stat(ordered(glue("({level_low}, {level_high}]"))))
  ) +
  scale_fill_viridis_d(name = NULL)

Created on 2020-03-10 by the reprex package (v0.3.0)

@clauswilke
Copy link
Member

I just realized that there may be a clean way to resolve all of these issues at the same time. Let me give it a shot.

@thomasp85
Copy link
Member Author

Well, if you return level_low and level_high then no information is lost, but a single numeric level will not be able to capture the binning range. We should continue to default map to the ordered factor. There is no mapping to a single numeric variable that is truthful to the data.

You are correct that your example with density_2d doesn't work with this PR. I'm sure that is fixable, but I'll let you work on your solution for now

@clauswilke
Copy link
Member

The latest iteration of my PR #3864 now incorporates the approach you proposed here and fixes all associated issues. See example below. Could you take a look and let me know whether I'm on the right track? If yes I'll clean up the code and rewrite the documentation.

library(ggplot2)

ggplot(faithfuld, aes(waiting, eruptions, z = density)) + 
  geom_contour_filled() + 
  facet_wrap(~waiting < 70)

set.seed(4393)
dsmall <- diamonds[sample(nrow(diamonds), 1000), ]
d <- ggplot(dsmall, aes(x, y))

d + geom_density_2d_filled() +
  facet_wrap(~cut)

d + geom_density_2d_filled(contour_var = "ndensity") +
  facet_wrap(~cut)

d + geom_density_2d_filled(contour_var = "count") +
  facet_wrap(~cut)

d + geom_density_2d(aes(color = after_stat(level))) + 
  facet_wrap(~cut)

d + geom_density_2d(aes(color = after_stat(level)), contour_var = "ndensity") + 
  facet_wrap(~cut)

d + geom_density_2d(aes(color = after_stat(level)), contour_var = "count") + 
  facet_wrap(~cut)
#> Warning: stat_contour(): Zero contours were generated
#> Warning in min(x): no non-missing arguments to min; returning Inf
#> Warning in max(x): no non-missing arguments to max; returning -Inf

Created on 2020-03-11 by the reprex package (v0.3.0)

@clauswilke
Copy link
Member

The key idea of my revised approach is to implement a true chaining of the 2d-density stat and the contour stat, by reimplementing compute_layer().

https://github.com/clauswilke/ggplot2/blob/6546535fc46797a5520de1adcfe4c7c1d281ae0f/R/stat-density-2d.r#L94-L123

@thomasp85
Copy link
Member Author

Closing this now. All in all we are on the same page and I'm happy with #3864

@thomasp85 thomasp85 closed this Mar 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Filled contours don't play well with faceting
2 participants