Skip to content

Automatic dodging or position_dodge not enforced on boxplots with missing values #688

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
apontejosea opened this issue Oct 11, 2012 · 5 comments

Comments

@apontejosea
Copy link

I started this issue as a question in StackOverflow:
http://stackoverflow.com/questions/12806260/how-to-enforce-ggplots-position-dodge-on-categories-with-no-data

I'm trying to obtain boxplots of two different signals (ind) sharing the same categories (cat). When there is a category with data for one signal but not for the other one, the boxplot for the signal with data covers all the horizontal spacing, and does not respect the position_dodge instruction for that particular category. As you can see on the example below, the signal x has no data for category B, so it loses the space reserved by position_dodge.

Thanks in advance.

data<-data.frame(cat=c('A','A','A','A','B','B','A','A','A','A','B','B'), 
                 values=c(3,2,1,4,NA,NA,4,5,6,7,8,9), 
                 ind=c('x','x','x','x','x','x','y','y','y','y','y','y'))

print(ggplot() +
        scale_colour_hue(guide='none') +
      geom_boxplot(
           aes(x=as.factor(cat), y=values, 
               fill=ind), 
           position=position_dodge(width=.60), 
           data=data,
           outlier.size = 1.2,
           na.rm=T))

graph with original problem

Here is what I have attempted so far:

  1. I first tried to change the color of boxes with missing data. I couldn't figure out how to directly inject the desired color for a single box, as color information is automatically generated from signals (fill = data$ind in this case).
  2. Since that did not work, I exaggerated the dummy values out of the normal y-axis range (-1000), and I found something interesting happening that looks like a ggplot2 bug:

Here is the example data with a dummy value for the category with missing data:

data            <- data.frame(
cat=c('A','A','A','A','B','B','A','A','A','A','B','B','B'), 
values=c(3,2,1,4,NA,NA,4,5,6,7,8,9, -1000), 
ind=c('x','x','x','x','x','x','y','y','y','y','y','y','x'))

p  <- ggplot() +
      scale_colour_hue(guide='none') +
      geom_boxplot(aes(x=as.factor(cat), y=values, fill=ind),
      position=position_dodge(width=.60), 
      data=data,
      outlier.size = 1.2,
      na.rm=T)

As shown below, printing the plot as it is without defining y-axis limits, we get the correct behavior on the x-axis. Of course the undesired dummy boxplot will also show up.

print(p)

with no y-axis limits

But if we add the y limits, we still get the original undesired behavior.

print(p + ylim(0, 10))

with y-axis limits

@hadley
Copy link
Member

hadley commented Oct 11, 2012

That's by design of position_dodge. Try using facetting for the behaviour you want.

@hadley hadley closed this as completed Oct 11, 2012
@apontejosea
Copy link
Author

After some workarounds, I came up with the outcome I was looking for... (kind of)

data            <- data.frame(
cat=c('A','A','A','A','B','B','A','A','A','A','B','B','B'), 
values=c(3,2,1,4,NA,NA,4,5,6,7,8,9, 0), 
ind=c('x','x','x','x','x','x','y','y','y','y','y','y','x'))

p  <- ggplot() +
      scale_colour_hue(guide='none') +
      geom_boxplot(aes(x=as.factor(cat), y=values, fill=ind),
      position=position_dodge(width=.60), 
      data=data,
      outlier.size = 1.2,
      na.rm=T) +
      geom_line(aes(x=x, y=y), 
                data=data.frame(x=c(0,3),y=rep(0,2)), 
                size = 1, 
                col='white')
print(p)

solution with workaround

Unfortunately, faceting doesn't give the effect I'm looking for. The final graph I was looking for is shown below:

final graph

If you notice, the white major tick mark at y = 10 is thicker than the other tick marks. This thicker line is the geom_line with size=1 that hides unwanted boxplots.

I wish we could combine different geom objects more seamlessly. I guess I'm using ggplot2 in a non-standard way and workarounds are the way to go on these kind of issues. Anyhow, thank you for your fast response, Hadley. And thanks for putting together ggplot2, wish I love.

@hadley
Copy link
Member

hadley commented Oct 15, 2012

Hmmm, maybe we should have an option to position_dodge (maybe drop = FALSE) that would support this behaviour. It is a fairly common request.

@hadley hadley reopened this Oct 15, 2012
@hadley
Copy link
Member

hadley commented Feb 24, 2014

This sounds like a great feature, but unfortunately we don't currently have the development bandwidth to support it. If you'd like to submit a pull request that implements this feature, please follow the instructions in the development vignette.

@hadley hadley closed this as completed Feb 24, 2014
@larry77
Copy link

larry77 commented Jun 9, 2017

Any progress on this? I am not so technically competent to implement this myself, but it would be a nice feature to have!

@lock lock bot locked as resolved and limited conversation to collaborators Jun 19, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants