Skip to content

Robust position_dodge(preserve = "single") #5928

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 4, 2024

Conversation

teunbrand
Copy link
Collaborator

This PR aims to fix #2801 and revives #2813.

Briefly, instead of counting the number of rows per PANEL/xmin interaction, we are counting the number of unique groups per PANEL/xmin interaction. Argueably, this is the metric that should be counted.

The reprex from the issue now shows violin plots of appropriate widths:

devtools::load_all("~/packages/ggplot2")
#> ℹ Loading ggplot2

ggplot(mtcars, aes(factor(cyl), mpg, fill = factor(vs))) +
  geom_violin(position = position_dodge(preserve = "single"))
#> Warning: Groups with fewer than two datapoints have been dropped.
#> ℹ Set `drop = FALSE` to consider such groups for position adjustment purposes.

Created on 2024-06-03 with reprex v2.1.0

One concern raised in #2813 (comment) was performance. This PR uses {vctrs} and is faster than the current implementation.

library(vctrs)

old_strategy <- function(data) {
  panels <- unname(split(data, data$PANEL))
  ns <- vapply(panels, function(panel) max(table(panel$xmin)), double(1))
  max(ns)
}

new_strategy <- function(data) {
  n <- vec_unique(data[c("group", "PANEL", "xmin")])
  n <- vec_group_id(n[c("PANEL", "xmin")])
  max(tabulate(n, attr(n, "n")))
}

data <- 
  transform(
    ggplot2::diamonds, 
    PANEL = 1L, 
    group = interaction(color, clarity), 
    xmin = clarity
  )

# Replicate boxplot: 1-row per group/xmin combination
data <- data[!duplicated(data[c("PANEL", "group", "xmin")]), ]

bench::mark(
  old_strategy(data),
  new_strategy(data)
)
#> # A tibble: 2 × 6
#>   expression              min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>         <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 old_strategy(data)   99.6µs  103.6µs     9235.   133.3KB     43.1
#> 2 new_strategy(data)   29.2µs   30.5µs    31879.    16.8KB     38.3

# If we are generous towards the 'old strategy' by having smaller data to split
data <- data[c("PANEL", "group", "xmin")]

bench::mark(
  old_strategy(data),
  new_strategy(data)
)
#> # A tibble: 2 × 6
#>   expression              min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>         <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 old_strategy(data)   58.9µs     61µs    16072.    5.62KB     39.7
#> 2 new_strategy(data)   29.3µs   30.2µs    32595.    4.03KB     39.2

Created on 2024-06-03 with reprex v2.1.0

teunbrand added a commit to teunbrand/ggplot2 that referenced this pull request Jun 3, 2024
Copy link
Member

@thomasp85 thomasp85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

teunbrand added a commit that referenced this pull request Jun 4, 2024
* dodge by max number of groups per panel/position

* fill missing defaults

* `position_jitterdodge()` doesn't fail this test-case anymore

* add news bullet

* Copy faster approach from #5928
@teunbrand teunbrand merged commit 78660a9 into tidyverse:main Jun 4, 2024
11 checks passed
@teunbrand teunbrand deleted the dodge_preserve_single branch June 4, 2024 11:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

preserve = "single" breaks violin plots
2 participants