Skip to content

Filled contours don't play well with faceting #3875

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
clauswilke opened this issue Mar 9, 2020 · 5 comments · Fixed by sthagen/tidyverse-ggplot2#10
Closed

Filled contours don't play well with faceting #3875

clauswilke opened this issue Mar 9, 2020 · 5 comments · Fixed by sthagen/tidyverse-ggplot2#10

Comments

@clauswilke
Copy link
Member

This is technically a bug report against my PR #3864, but the bug isn't caused by that PR, it's just revealed by it. The same bug exists with geom_contour_filled(), just requires more code to generate a reprex.

I have noticed that when I draw filled density contours across facets, the color levels in the legend get jumbled. (Carefully look at the levels in the legend, and you'll see they're not even in order.)

library(ggplot2)

set.seed(4393)
dsmall <- diamonds[sample(nrow(diamonds), 1000), ]

ggplot(dsmall, aes(x, y)) +
  geom_density_2d_filled() +
  facet_grid(. ~ cut)

What happens is that the breaks are calculated separately for each facet, and then they don't match across facets. It's possible to work around this by manually specifying breaks, but I think the default settings should work with facets.

Fundamentally, the problem is that the level variable for contour bands is an ordered factor specifying the bin boundaries:

path_df$level <- ordered(path_df$level, levels = names(isobands))

By contrast, the level variable for contour lines is a numeric:

path_df$level <- as.numeric(path_df$level)

I think instead we should return a numeric value for level in all cases and by default map that to fill for filled contours. We can still return the ordered factor as well as an option, under a different name. The numeric value I would use for level would be the mid-point between the lower and the upper level boundary. This is consistent with other binning approaches, e.g. the placement of bars in histograms.

The result would be something like the following (but without the aes(fill = ...) statement needed):

ggplot(dsmall, aes(x, y)) +
  geom_density_2d_filled(aes(fill = after_stat(level_high))) +
  facet_grid(. ~ cut) +
  scale_fill_viridis_c()

This will be consistent with how contour lines work already:

ggplot(dsmall, aes(x, y)) +
  geom_density_2d(aes(color = after_stat(level))) +
  facet_grid(. ~ cut) +
  scale_color_viridis_c()

If we agree on this approach I'll incorporate it into #3864.

@clauswilke clauswilke added this to the ggplot2 3.3.1 milestone Mar 9, 2020
clauswilke added a commit to wilkelab/ggplot2_archive that referenced this issue Mar 10, 2020
@thomasp85
Copy link
Member

I think it absolutely makes sense to return an ordered factor (I believe we discussed this before).

Can this not be solved by modifying the compute_layer function to correctly assemble the level columns?

@thomasp85
Copy link
Member

thomasp85 commented Mar 10, 2020

I don't think your fix in fab9732 correctly addresses the problem (ignoring the discussion about ordered factors vs numerics)

The root problem is that the breaks are calculated per-group, rather than for the full layer. I think this is fundamentally wrong, as you expect the contouring to be comparable between panels/groups.

If you precalculate the breaks in setup_params or compute_layer I believe all these issues will vanish without needing to move away from ordered factors

@clauswilke
Copy link
Member Author

The root problem is that the breaks are calculated per-group, rather than for the full layer. I think this is fundamentally wrong, as you expect the contouring to be comparable between panels/groups.

Yes, correct, I'm muddling two things together. This is a case where a lot of thinking on my part led to the same outcome as no thinking at all would have. Essentially, I have convinced myself that (i) returning an ordered factor by default is not the right solution and (ii) getting the right bins across facets is fundamentally not possible, and thus I went with my approach.

The argument for (i) is that I got really annoyed when going back and forth between plots like the following:

library(ggplot2)
d <- ggplot(faithful, aes(x = eruptions, y = waiting)) +
  xlim(0.5, 6) +
  ylim(40, 110)

d + geom_density_2d_filled()  # assuming default behavior you suggest

# normalized to maximum intensity
d + geom_density_2d_filled(aes(fill = after_stat(nlevel)))

# normalized to sample size
d + geom_density_2d_filled(aes(fill = after_stat(count)))

Created on 2020-03-10 by the reprex package (v0.3.0)

The three plots are conceptually similar but look totally different, and one uses a discrete scale whereas the others use a continuous scale. This is very confusing for the enduser.

Now point (ii): Why not return ordered factors in all these cases? Because it doesn't work, in particular under faceting. We need different sets of bins for level vs nlevel vs count. A proper solution would have to know which variable you map and then calculate the bins based on that. This may be possible somehow but it's way beyond the scope of this PR.

I'm also concerned about the bugs that will come in complaining that count doesn't return an ordered factor when level does.

So, after all this thinking, I decided a lot of problems would go away when level is numeric, and then the other problem (consistent bins across facets for level) seemed less urgent.

@clauswilke
Copy link
Member Author

Just saw your PR #3883. I have to run now and will look carefully later, but I agree this should also be done. It's separate from the numeric vs. factor discussion, as explained above.

@thomasp85
Copy link
Member

Let’s continue discussions in #3883. I’m convinced ordered factors is the correct way. Moving to numeric will loose information and remove the possibility of using binned guides which is a very bad move

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants