Skip to content

text is misplaced with position_dodge() #3022

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
slowkow opened this issue Dec 2, 2018 · 16 comments · Fixed by #6100
Closed

text is misplaced with position_dodge() #3022

slowkow opened this issue Dec 2, 2018 · 16 comments · Fixed by #6100
Labels
feature a feature request or enhancement positions 🥇

Comments

@slowkow
Copy link
Contributor

slowkow commented Dec 2, 2018

In the example below, I would expect all of the text labels to be positioned perfectly on top of the data points. Instead, some of the text labels are not positioned correctly.

I think the issue is due to position_dodge(). I'm not sure exactly where to look to find the relevant code.

In the last example, I use ggrepel to help illustrate the problem more clearly. You can see the blue labels 34 and 290 are not pointing to the correct positions. It seems like they're pointing to the "undodged" positions instead of the "dodged" positions.

This issue was originally reported by @raviselker in ggrepel issues: slowkow/ggrepel#122

library(tidyverse)
library(ggrepel)
# remotes::install_github("thomasp85/patchwork)
library(patchwork)

set.seed(1337)

df <- tibble(
  x = rnorm(500),
  g1 = factor(sample(c("A", "B"), 500, replace = TRUE)),
  g2 = factor(sample(c("A", "B"), 500, replace = TRUE)),
  rownames = 1:500
)

is_outlier <- function(x) {
  return(x < quantile(x, 0.25) - 1.5 * IQR(x) | x > quantile(x, 0.75) + 1.5 * IQR(x))
}

df_outliers <- df %>% group_by(g1, g2) %>% mutate(outlier = is_outlier(x))

p1 <- ggplot(df_outliers, aes(x = g1, y = x, fill = g2)) +
  geom_boxplot(width = 0.3, position = position_dodge(0.5))

p2 <- p1 +
  geom_text(
    data = . %>% filter(outlier),
    mapping = aes(label = rownames),
    position = position_dodge(0.5)
  )

p1 + p2

ggplot(df_outliers, aes(x = g1, y = x, fill = g2)) +
  geom_boxplot(width = 0.3, position = position_dodge(0.5)) +
  ggrepel::geom_label_repel(
    min.segment.length = 0,
    data = . %>% filter(outlier),
    mapping = aes(label = rownames),
    position = position_dodge(0.5)
  )

Created on 2018-12-02 by the reprex package (v0.2.1)

@clauswilke
Copy link
Member

The underlying principle is that dodging doesn't work as one might expect when some data groupings don't exist.

library(ggplot2)
df <- data.frame(
  x = c("A", "A", "B"),
  type = c("a", "b", "a")
)

ggplot(df, aes(x, 1, color = type)) +
  geom_point(position = position_dodge(width = .5), size = 5)

Created on 2018-12-02 by the reprex package (v0.2.1)

I'm not sure this can be fixed with the current positioning approach, because the position adjustments never see the entire dataset. The question is whether we can come up with some delicate surgery that fixes this problem without completely changing how position adjustments work.

@clauswilke
Copy link
Member

Maybe I spoke too soon. It appears that the various position functions do receive the entire dataset, at least the dataset per panel:

ggplot2/R/position-.r

Lines 16 to 34 in 5e4a6ef

#' - `compute_layer(self, data, params, panel)` is called once
#' per layer. `panel` is currently an internal data structure, so
#' this method should not be overridden.
#'
#' - `compute_panel(self, data, params, panel)` is called once per
#' panel and should return a modified data frame.
#'
#' `data` is a data frame containing the variables named according
#' to the aesthetics that they're mapped to. `scales` is a list
#' containing the `x` and `y` scales. There functions are called
#' before the facets are trained, so they are global scales, not local
#' to the individual panels. `params` contains the parameters returned by
#' `setup_params()`.
#' - `setup_params(data, params)`: called once for each layer.
#' Used to setup defaults that need to complete dataset, and to inform
#' the user of important choices. Should return list of parameters.
#' - `setup_data(data, params)`: called once for each layer,
#' after `setup_params()`. Should return modified `data`.
#' Default checks that required aesthetics are present.

So this should be fixable. The relevant code is here:

ggplot2/R/position-dodge.r

Lines 117 to 156 in 23a23cd

compute_panel = function(data, params, scales) {
collide(
data,
params$width,
name = "position_dodge",
strategy = pos_dodge,
n = params$n,
check.width = FALSE
)
}
)
# Dodge overlapping interval.
# Assumes that each set has the same horizontal position.
pos_dodge <- function(df, width, n = NULL) {
if (is.null(n)) {
n <- length(unique(df$group))
}
if (n == 1)
return(df)
if (!all(c("xmin", "xmax") %in% names(df))) {
df$xmin <- df$x
df$xmax <- df$x
}
d_width <- max(df$xmax - df$xmin)
# Have a new group index from 1 to number of groups.
# This might be needed if the group numbers in this set don't include all of 1:n
groupidx <- match(df$group, sort(unique(df$group)))
# Find the center for each group, then use that to calculate xmin and xmax
df$x <- df$x + width * ((groupidx - 0.5) / n - .5)
df$xmin <- df$x - d_width / n / 2
df$xmax <- df$x + d_width / n / 2
df
}

@yutannihilation
Copy link
Member

It appears that the various position functions do receive the entire dataset, at least the dataset per panel

I'm afraid not. Position$compute_panel() is called from Position$compute_layer(), and Position$compute_layer() is called from Layer$compute_position(), which is called per layer with each layer's data. So, it doesn't know the other layer's data.

data <- by_layer(function(l, d) l$compute_position(d, layout))

BTW, I feel this description is not quite right. Maybe, "once per panel per layer"?

ggplot2/R/position-.r

Lines 20 to 21 in 5e4a6ef

#' - `compute_panel(self, data, params, panel)` is called once per
#' panel and should return a modified data frame.

@clauswilke
Copy link
Member

But that should still be good enough to get the dodging right within each layer and panel. I think the other problem is that we're not using an explicit dodging aesthetic. position_dodge() simply finds all distinct groups at each x position and spreads them out. If we gave it an explicit aesthetic, e.g. aes(dodge = type), or maybe as an optional argument to position_dodge(), e.g. position_dodge(dodge_by = type), then the position adjustment could make smarter decisions about where to place which data points.

@slowkow
Copy link
Contributor Author

slowkow commented Dec 3, 2018

Here is another example, building on Claus' code.

It seems that color and fill are not treated the same way by ggplot2. I found this surprising and unexpected -- perhaps this is intended behavior?

library(ggplot2)
df <- data.frame(
  x = c("A", "A", "B"),
  type = c("a", "b", "a")
)

pos <- position_dodge(width = 0.5)

p <- ggplot(df) +
  geom_point(position = pos, shape = 21, size = 10, stroke = 1) +
  geom_text(aes(label = type), color = "black", position = pos)

p + aes(x, 1, color = type)

image

p + aes(x, 1, color = type, group = type)

image

p + aes(x, 1, fill = type)

image

Created on 2018-12-02 by the reprex package (v0.2.1)

Session info
devtools::session_info()
#> ─ Session info ──────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 3.5.1 (2018-07-02)
#>  os       macOS High Sierra 10.13.6   
#>  system   x86_64, darwin15.6.0        
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       America/New_York            
#>  date     2018-12-02                  
#> 
#> ─ Packages ──────────────────────────────────────────────────────────────
#>  package     * version    date       lib
#>  assertthat    0.2.0      2017-04-11 [1]
#>  backports     1.1.2      2017-12-13 [1]
#>  base64enc     0.1-3      2015-07-28 [1]
#>  bindr         0.1.1      2018-03-13 [1]
#>  bindrcpp      0.2.2      2018-03-29 [1]
#>  callr         3.0.0      2018-08-24 [1]
#>  cli           1.0.1      2018-09-25 [1]
#>  colorspace    1.3-2      2016-12-14 [1]
#>  crayon        1.3.4      2017-09-16 [1]
#>  curl          3.2        2018-03-28 [1]
#>  desc          1.2.0      2018-05-01 [1]
#>  devtools      2.0.1      2018-10-26 [1]
#>  digest        0.6.18     2018-10-10 [1]
#>  dplyr         0.7.8      2018-11-10 [1]
#>  evaluate      0.12       2018-10-09 [1]
#>  fs            1.2.6      2018-08-23 [1]
#>  ggplot2     * 3.1.0.9000 2018-12-02 [1]
#>  glue          1.3.0      2018-07-17 [1]
#>  gtable        0.2.0      2016-02-26 [1]
#>  htmltools     0.3.6      2017-04-28 [1]
#>  httr          1.3.1      2017-08-20 [1]
#>  knitr         1.20       2018-02-20 [1]
#>  labeling      0.3        2014-08-23 [1]
#>  lazyeval      0.2.1      2017-10-29 [1]
#>  magrittr      1.5        2014-11-22 [1]
#>  memoise       1.1.0      2017-04-21 [1]
#>  mime          0.6        2018-10-05 [1]
#>  munsell       0.5.0      2018-06-12 [1]
#>  pillar        1.3.0      2018-07-14 [1]
#>  pkgbuild      1.0.2      2018-10-16 [1]
#>  pkgconfig     2.0.2      2018-08-16 [1]
#>  pkgload       1.0.2      2018-10-29 [1]
#>  plyr          1.8.4      2016-06-08 [1]
#>  prettyunits   1.0.2      2015-07-13 [1]
#>  processx      3.2.0      2018-08-16 [1]
#>  ps            1.2.1      2018-11-06 [1]
#>  purrr         0.2.5      2018-05-29 [1]
#>  R6            2.3.0      2018-10-04 [1]
#>  Rcpp          1.0.0      2018-11-07 [1]
#>  remotes       2.0.2      2018-10-30 [1]
#>  rlang         0.3.0.1    2018-10-25 [1]
#>  rmarkdown     1.10       2018-06-11 [1]
#>  rprojroot     1.3-2      2018-01-03 [1]
#>  scales        1.0.0      2018-08-09 [1]
#>  sessioninfo   1.1.1      2018-11-05 [1]
#>  stringi       1.2.4      2018-07-20 [1]
#>  stringr       1.3.1      2018-05-10 [1]
#>  testthat      2.0.1      2018-10-13 [1]
#>  tibble        1.4.2      2018-01-22 [1]
#>  tidyselect    0.2.5      2018-10-11 [1]
#>  usethis       1.4.0      2018-08-14 [1]
#>  withr         2.1.2      2018-03-15 [1]
#>  xml2          1.2.0      2018-01-24 [1]
#>  yaml          2.2.0      2018-07-25 [1]
#>  source                            
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.1)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  Github (tidyverse/ggplot2@23a23cd)
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#>  CRAN (R 3.5.0)                    
#> 
#> [1] /Library/Frameworks/R.framework/Versions/3.5/Resources/library

@clauswilke
Copy link
Member

@slowkow What you're seeing is color = "black" shadowing the color aesthetic in the text layer. Apparently the label aesthetic is not considered when groups are calculated.

library(ggplot2)
df <- data.frame(
  x = c("A", "A", "B"),
  type = c("a", "b", "a")
)

pos <- position_dodge(width = 0.5)

p <- ggplot(df) +
  geom_point(position = pos, shape = 21, size = 10, stroke = 1) +
  geom_text(aes(label = type), position = pos)

p + aes(x, 1, color = type)

Created on 2018-12-02 by the reprex package (v0.2.1)

@clauswilke
Copy link
Member

Yes, labels are not considered when calculating grouping, and that is done by design. (Presumably because it's not uncommon for labels to be all different even within a group.)

ggplot2/R/grouping.r

Lines 7 to 10 in 1c09bae

# If the `group` variable is not present, then a new group
# variable is generated from the interaction of all discrete (factor or
# character) vectors, excluding `label`. The special value `NO_GROUP`
# is used for all observations if no discrete variables exist.

@yutannihilation
Copy link
Member

yutannihilation commented Dec 3, 2018

to get the dodging right within each layer and panel.

Sorry, I don't get the point yet... Are we talking about the inconsistency of the positions between layers, not within each layer, right?

Letting positions to have aesthetics sounds cool to me, which you've also indicated in #2977 (comment).

@clauswilke
Copy link
Member

I am talking within each layer. I think there should be an option that guarantees that dodging always looks the same across all x values. In the example here, we would want type = "a" always be dodged to the left and type = "b" always be dodged to the right, regardless of whether the other type is present at a given x or not. As a side effect, this would fix the original problem.

@clauswilke
Copy link
Member

On a related note, see this closed PR that wasn't merged, and the issue of violins moving in the wrong spot under preserve = "single": #2813

It's the same problem. The dodging doesn't know about the variable that it is dodging by, and therefore it does strange things.

@yutannihilation
Copy link
Member

Thanks, I got what you mean. It's still unclear to me how to map groups to dodged positions without training over all layers,, but I think I'll find it later :)

In case this is still useful, here's another version of reprex which I believe is minimal for this issue:

library(ggplot2)

d <- data.frame(x = c("x", "x"), g = c("a", "b"), stringsAsFactors = FALSE)
pos <- position_dodge(width = .5)

ggplot(mapping = aes(x, 0, colour = g, label = g)) +
  geom_point(data = d, size = 5, position = pos) +
  geom_label(data = d[2, ], size = 5, position = pos)

Created on 2018-12-03 by the reprex package (v0.2.1)

@karawoo
Copy link
Member

karawoo commented Dec 4, 2018

I think there should be an option that guarantees that dodging always looks the same across all x values. In the example here, we would want type = "a" always be dodged to the left and type = "b" always be dodged to the right, regardless of whether the other type is present at a given x or not. As a side effect, this would fix the original problem.

This has been requested before in #2076 and I agree that it would be a nice feature to have, though if I remember correctly it would require some significant refactoring. We'd also have to think through how geoms with different widths across groups would get placed (e.g. box plots with varwidth = TRUE). For this reason I don't know that fixing this would solve the original problem unless the position calculation knew about other layers. One of the things that's tricky about dodging points and labels in particular is that they have no width in the data space, so the position calculations that calculate where things go based on width don't work right.

@paleolimbot paleolimbot added feature a feature request or enhancement positions 🥇 labels May 23, 2019
@hadley
Copy link
Member

hadley commented Jun 18, 2019

Is this the same issue as #2480?

@karawoo
Copy link
Member

karawoo commented Jun 18, 2019

yes I think so

@teunbrand
Copy link
Collaborator

I think this issue was fixed in #5928, where we can now use position_dodge(preserve = "single") for points. As such, I'll close this issue.

devtools::load_all("~/packages/ggplot2")
#> ℹ Loading ggplot2

set.seed(1337)

df <- data.frame(
  x = rnorm(500),
  g1 = factor(sample(c("A", "B"), 500, replace = TRUE)),
  g2 = factor(sample(c("A", "B"), 500, replace = TRUE)),
  rownames = 1:500
)

is_outlier <- function(x) {
  return(x < quantile(x, 0.25) - 1.5 * IQR(x) | x > quantile(x, 0.75) + 1.5 * IQR(x))
}

df_outliers <- df |> dplyr::group_by(g1, g2) |> dplyr::mutate(outlier = is_outlier(x))

ggplot(df_outliers, aes(x = g1, y = x, fill = g2)) +
  geom_boxplot(width = 0.3, position = position_dodge(0.5)) +
  ggrepel::geom_label_repel(
    min.segment.length = 0,
    data = ~ dplyr::filter(.x, outlier),
    mapping = aes(label = rownames),
    position = position_dodge(0.5, preserve = "single")
  )

Created on 2024-12-05 with reprex v2.1.1

@teunbrand
Copy link
Collaborator

Nevermind, I found a flow with that approach, so I'll reopen this. The flaw will be fixed by #6100 though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement positions 🥇
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants