Skip to content

drop = FALSE is not working properly (enforcing empty space for some missing levels in a boxplot) #4877

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
LuisLauM opened this issue Jun 16, 2022 · 12 comments

Comments

@LuisLauM
Copy link

In the following example (using the iris dataset), I am creating a factor class variable in which one of the species does not contain values of level C. When I make the plot, I cannot find a way to make ggplot not drop the empty level (virginica-C). It suppose that the argument drop = FALSE do that, but it does not.

require(dplyr)
require(ggplot2)

iris %>% 
  
  mutate(fct_x = factor(x = sample(x = c("A", "B", "C"), size = nrow(.), replace = TRUE), 
                        levels = c("A", "B", "C"))) %>% 
  
  filter(!(Species == "virginica" & fct_x == "C")) %>% 
  
  ggplot(aes(x = Species, y = Sepal.Length, fill = fct_x)) +
  
  geom_boxplot() +
  
  scale_fill_discrete(drop = FALSE)

In other words, as you can see in the figure that I am attaching, the virginica group does NOT show an empty space for group C (because there are no elements of type virginica-C) and that is exactly what I want to achieve: to show that empty space in the figure.

Rplot

@LuisLauM LuisLauM changed the title drop = FALSE is not working properly (enforcing empty space for some missing levels in a plot) drop = FALSE is not working properly (enforcing empty space for some missing levels in a boxplot) Jun 16, 2022
@smouksassi
Copy link

the position scale is not related to the fill scale see how you can control position using the positioning:

require(dplyr)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
require(ggplot2)
#> Loading required package: ggplot2

set.seed( 123456)
iris %>%
  mutate(fct_x = factor(
    x = sample(
      x = c("A", "B", "C"),
      size = nrow(.),
      replace = TRUE
    ),
    levels = c("A", "B", "C")
  )) %>%
  filter(!(Species == "virginica" & fct_x == "C")) %>%
  ggplot(aes(x = Species, y = Sepal.Length, fill = fct_x)) +
  geom_boxplot(position = position_dodge(preserve = "single")) 

Created on 2022-06-16 by the reprex package (v2.0.1)

@LuisLauM
Copy link
Author

the position scale is not related to the fill scale see how you can control position using the positioning:

require(dplyr)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
require(ggplot2)
#> Loading required package: ggplot2

set.seed( 123456)
iris %>%
  mutate(fct_x = factor(
    x = sample(
      x = c("A", "B", "C"),
      size = nrow(.),
      replace = TRUE
    ),
    levels = c("A", "B", "C")
  )) %>%
  filter(!(Species == "virginica" & fct_x == "C")) %>%
  ggplot(aes(x = Species, y = Sepal.Length, fill = fct_x)) +
  geom_boxplot(position = position_dodge(preserve = "single")) 

Created on 2022-06-16 by the reprex package (v2.0.1)

Well, when you see the documentation (?scale_fill_hue), drop is explicitly shown as an available argument for scale_fill_hue (and scale_fill_discrete as well), so I think drop should be available as a parameter for position spacing. Your solution it is just useful for fixing the boxplot widths, but not for leave empty spaces for missing levels.

@smouksassi
Copy link

can you elaborate on what expected behavior you are looking at see below for how things differ when playing with
scale_x_discrete(drop=TRUE) vs scale_fill_discrete(drop=TRUE)

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2)
library(patchwork)

set.seed( 123456)
a <- iris %>%
  mutate(fct_x = factor(
    x = sample(
      x = c("A", "B", "C",NA),
      size = nrow(.),
      replace = TRUE
    ),
    levels = c("A", "B", "C")
  )) %>%
  filter(!(Species == "virginica"),! fct_x=="C") %>%
  ggplot(aes(x = Species, y = Sepal.Length, fill = fct_x)) +
  geom_boxplot(position = position_dodge(preserve = "single"))+
  scale_x_discrete(drop=FALSE)+
  scale_fill_discrete(drop=FALSE)

b <- a +
  scale_x_discrete(drop=FALSE)+
  scale_fill_discrete(drop=TRUE)
#> Scale for 'x' is already present. Adding another scale for 'x', which will
#> replace the existing scale.
#> Scale for 'fill' is already present. Adding another scale for 'fill', which
#> will replace the existing scale.

c <- a +
  scale_x_discrete(drop=TRUE)+
  scale_fill_discrete(drop=FALSE)
#> Scale for 'x' is already present. Adding another scale for 'x', which will
#> replace the existing scale.
#> Scale for 'fill' is already present. Adding another scale for 'fill', which
#> will replace the existing scale.

d<- a +
  scale_x_discrete(drop=TRUE)+
  scale_fill_discrete(drop=TRUE)
#> Scale for 'x' is already present. Adding another scale for 'x', which will
#> replace the existing scale.
#> Scale for 'fill' is already present. Adding another scale for 'fill', which
#> will replace the existing scale.

(a | b) /
  (c | d)

Created on 2022-06-16 by the reprex package (v2.0.1)

@LuisLauM
Copy link
Author

can you elaborate on what expected behavior you are looking at see below for how things differ when playing with scale_x_discrete(drop=TRUE) vs scale_fill_discrete(drop=TRUE)

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2)
library(patchwork)

set.seed( 123456)
a <- iris %>%
  mutate(fct_x = factor(
    x = sample(
      x = c("A", "B", "C",NA),
      size = nrow(.),
      replace = TRUE
    ),
    levels = c("A", "B", "C")
  )) %>%
  filter(!(Species == "virginica"),! fct_x=="C") %>%
  ggplot(aes(x = Species, y = Sepal.Length, fill = fct_x)) +
  geom_boxplot(position = position_dodge(preserve = "single"))+
  scale_x_discrete(drop=FALSE)+
  scale_fill_discrete(drop=FALSE)

b <- a +
  scale_x_discrete(drop=FALSE)+
  scale_fill_discrete(drop=TRUE)
#> Scale for 'x' is already present. Adding another scale for 'x', which will
#> replace the existing scale.
#> Scale for 'fill' is already present. Adding another scale for 'fill', which
#> will replace the existing scale.

c <- a +
  scale_x_discrete(drop=TRUE)+
  scale_fill_discrete(drop=FALSE)
#> Scale for 'x' is already present. Adding another scale for 'x', which will
#> replace the existing scale.
#> Scale for 'fill' is already present. Adding another scale for 'fill', which
#> will replace the existing scale.

d<- a +
  scale_x_discrete(drop=TRUE)+
  scale_fill_discrete(drop=TRUE)
#> Scale for 'x' is already present. Adding another scale for 'x', which will
#> replace the existing scale.
#> Scale for 'fill' is already present. Adding another scale for 'fill', which
#> will replace the existing scale.

(a | b) /
  (c | d)

Created on 2022-06-16 by the reprex package (v2.0.1)

I expected that scale_fill_discrete(drop = FALSE) will produce something like this:

expected

As you can see, the (empty) space for veriginica-C is left, that is what I understand drop = FALSE means, as scale_x_discrete(drop = FALSE) does.

PS: This last figure was created in MS Paint (so I crop-paste some parts of the figure itself in order to show my point).

@smouksassi
Copy link

this is not how ggplot2 is designed to work scale_fill_discrete(drop = FALSE) will affect the fill scale so you keep or not the missing level color it won't affect the positioning on the x-axis

@LuisLauM
Copy link
Author

this is not how ggplot2 is designed to work scale_fill_discrete(drop = FALSE) will affect the fill scale so you keep or not the missing level color it won't affect the positioning on the x-axis

So, there is no way to preserve empty space for a missing level, is that right?

@smouksassi
Copy link

my reply above with
geom_boxplot(position = position_dodge(preserve = "single"))

produce exactly what you edited in paint please reuse code and seed seed so things stay reproducible
I still don't get what you exactly want

@LuisLauM
Copy link
Author

LuisLauM commented Jun 16, 2022

geom_boxplot(position = position_dodge(preserve = "single"))

I am talking about to a real leaving of space of a missing level.

If you run this piece of code, you will notice that in the output figure the groups A and C are smashed together, omiting the (empty) space for B:

require(dplyr)
require(ggplot2)

set.seed(666)

iris %>% 
  
  mutate(fct_x = factor(x = sample(x = c("A", "B", "C"), size = nrow(.), replace = TRUE), 
                        levels = c("A", "B", "C"))) %>% 
  
  filter(!(Species == "virginica" & fct_x == "B")) %>% 
  
  ggplot(aes(x = Species, y = Sepal.Length, fill = fct_x)) +
  
  geom_boxplot(position = position_dodge(preserve = "single"))

Rplot02

So it is not a matter of preserving widths, but spaces for missing levels.

@smouksassi
Copy link

A way to do it is below

require(dplyr)
require(ggplot2)
set.seed(666)
 plotdata <- iris %>% 
  mutate(fct_x = factor(x = sample(x = c("A", "B", "C"),
                                   size = nrow(.), replace = TRUE), 
                        levels = c("A", "B", "C"))) %>% 
  filter(!(Species == "virginica" & fct_x == "B")) 

  ggplot(plotdata,aes(x = fct_x, y = Sepal.Length, fill =fct_x   )) +
  geom_boxplot(position = position_dodge2(preserve = "single"))+
    facet_grid(~Species,switch = "x")+
    theme(strip.placement = "outside",
          axis.title.x.bottom = element_blank())

Created on 2022-06-16 by the reprex package (v2.0.1)

@LuisLauM
Copy link
Author

Yes, I got the same suggestion from people in Stackoverflow. I am using that way by now, but I hope ggplot team could end up with a more elegant solution. Now, I know that it is an #3345 but it seems it has a not-easy solution.

@LuisLauM LuisLauM closed this as not planned Won't fix, can't repro, duplicate, stale Jun 16, 2022
@danbebber
Copy link

danbebber commented Aug 7, 2024

It's 2 years since the last comment here. This functionality really should be implemented (maintaining correct position of second category boxes)

@teunbrand
Copy link
Collaborator

The issue is tracked in #3345, let's keep the discussion in one place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants