Skip to content

small update to position-dodge2.r #2481

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Conversation

frostell
Copy link

Small update to position-dodge2.r to enable alignment of geom_boxplot() and geom_point() with NA's through position_dodge2(), possible patch for #2480.

@frostell
Copy link
Author

Hmm, I realise reading the logs for the failed checks that my suggested update breaks functionality as defined in test-position-dodge2.r (it gives a different result for geom_col() and geom_rect() when there is x-overlap, but no grouping).

  • changed one word in a new commit and I hope the tests will pass now!

@frostell
Copy link
Author

I don't think the travis-ci check broke because of my change, it passes all your tests scripts @karawoo, what do you think?

@karawoo
Copy link
Member

karawoo commented Apr 29, 2018

Thanks for looking into this @frostell. When I try your update I still get misaligned points and boxes on the left, does this match what you see?

# using example from #2480
set.seed(5820)
dat <- data.frame("value" = rnorm(n=30, mean=2, sd=0.5),
                  "group" = LETTERS[1:3],
                  "x" = factor(1:2))

dat$value[dat$group=="A"&dat$x=="1"] <- NA

ggplot(dat, aes(x=x, y=value)) +
  geom_boxplot(aes(fill=group), alpha=0.3) +
  geom_point(aes(colour=group), position=position_dodge2(width=0.75), size=3, alpha=0.5)
#> Warning: Removed 5 rows containing non-finite values (stat_boxplot).
#> Warning: Removed 5 rows containing missing values (geom_point).

@frostell
Copy link
Author

frostell commented Apr 29, 2018

Hello @karawoo!
Hmm, the exact same code gives the correct alignment on my installation...

Which version of ggplot are you using? I have 2.2.1.9000 installed, maybe that's the problem?
In your example, it looks like geom_boxplot is using position="dodge" and not position="dodge2".

I think you will get the correct alignment if you write:

ggplot(dat, aes(x=x, y=value)) +
  geom_boxplot(aes(fill=group), position="dodge2", alpha=0.3) +
  geom_point(aes(colour=group), position=position_dodge2(width=0.75), size=3, alpha=0.5)

Does that work for you?

The other possibility is that the default preserve argument differs, then you could try:

ggplot(dat, aes(x=x, y=value)) +
  geom_boxplot(aes(fill=group), position=position_dodge2(width=0.75, preserve = "single"), alpha=0.3) +
  geom_point(aes(colour=group), position=position_dodge2(width=0.75, preserve = "single"), size=3, alpha=0.5)

(or "total" instead of "single")

@karawoo
Copy link
Member

karawoo commented Apr 29, 2018

Ah I see what's happened. I applied your changes onto the current master branch. position_dodge2() was updated in #2386 to make preserve = "total" the default. I realized in looking into this that I neglected to make one change in that PR that caused it to revert back to "single" for the points but not the boxes, hence the mismatch. This is addressed in #2545.

@hadley
Copy link
Member

hadley commented May 2, 2018

Can you start by describing the key problem with the existing algorithm and how your code fixes it? The current logic is not obvious to me.

@frostell
Copy link
Author

frostell commented May 2, 2018

With this small update pos.dodge2() works with geom_point too and it's possible to align boxplots and points even in trickier cases when some unique combinations are all NA's.

Background:
Displaying raw data points together with box plots can be useful. When using a fill aesthetic for the box plots, position.dodge2() gives great positioning options. Raw data can be plotted with geom_point() and position.dodge() for alignment.

Problem:
When one of the unique combinations of a category and fill/color is all NA's, position.dodge() and position.dodge2() gives different x-positions and the boxes and points end up misaligned. For geom_point, the current position.dodge2() does not work because it treats every unique point as a category that should be dodged from all the other points. See #2480 for reprex.

Solution:
Change pos.dodge2() so that:
- It evaluates what kind of data is fed into it by checking for existing column names
- If data is not a stat: save the full df and the reduce it to just the unique combinations of fill/color and x-category (the rest of pos.dodge2() runs like before and calculates appropriate x-positions)
- at the end, repopulate the full df with the x-positions calculated

@hadley
Copy link
Member

hadley commented May 9, 2018

Your logic doesn't seem quite general enough to me. I'll take a stab at it myself.

@hadley hadley closed this May 9, 2018
@frostell
Copy link
Author

frostell commented May 9, 2018

ok, I admit it was patchy even though it worked for the specific problem :)

I guess ideally there would be just one position_dodge function to handle all geoms and stats with appropriate options to choose styling of the "dodging"... Looking forward to see your solution!

@lock
Copy link

lock bot commented Nov 5, 2018

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Nov 5, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants