Skip to content

New dodging algorithm for box plots #2196

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 47 commits into from
Jul 28, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
512fb33
New position calculations for box plots
karawoo Jul 1, 2017
4be63d9
Update documentation
karawoo Jul 6, 2017
54c1a2a
Add test to ensure that variable width boxes don't overlap
karawoo Jul 6, 2017
1f4a984
Don't warn about overlapping intervals for box plots
karawoo Jul 6, 2017
698ced7
Scale boxes all at once, rather than by group
karawoo Jul 9, 2017
10942d1
Scale boxes across all the data based on the max number that need to …
karawoo Jul 9, 2017
8e058b9
Dodge boxes when there's a truly continuous x
karawoo Jul 12, 2017
6c5dd86
Make sure boxes are ordered consistently
karawoo Jul 12, 2017
133ed71
Don't overwrite the n argument passed to pos_boxdodge
karawoo Jul 12, 2017
1e08220
Ensure proper behavior when `preserve = "total"`.
karawoo Jul 13, 2017
502ad20
Modify test for overlapping boxes
karawoo Jul 14, 2017
4707fc3
Add padding between boxes that occupy the same position
karawoo Jul 14, 2017
e6e6f00
Add note to NEWS.md
karawoo Jul 14, 2017
e2b1ded
Merge branch 'master' into position-dodge
karawoo Jul 14, 2017
59c014e
Remove print statement in test :flushed:
karawoo Jul 14, 2017
327597f
Indent code in examples
karawoo Jul 14, 2017
d169dc9
Replace rowMeans with (df$xmin + df$xmax) / 2
karawoo Jul 14, 2017
64c3688
Replace plyr code
karawoo Jul 14, 2017
f916986
Find overlapping groups with a for loop
karawoo Jul 14, 2017
8213782
Change padding to 0.05
karawoo Jul 14, 2017
0efaed7
Modifications that make pos_boxdodge work with geom_rect
karawoo Jul 14, 2017
3da2f49
Make sure elements are placed at the correct x location
karawoo Jul 17, 2017
b689a4c
"boxes" -> "elements" since this is no longer only used for boxes
karawoo Jul 17, 2017
03e3d46
Add bar examples to position_boxdodge documentation
karawoo Jul 17, 2017
029a5d0
Ordering in collide_box needs to be the reverse of what it was to mat…
karawoo Jul 17, 2017
dd78a80
PositionBoxdodge should use find_x_overlaps to find n when x is missing
karawoo Jul 17, 2017
ae594df
Drop extra computed columns when they're no longer needed
karawoo Jul 17, 2017
99fe422
Fix bug that was subtly flipping boxes horizontally
karawoo Jul 17, 2017
30b189a
Add tests for position_boxdodge
karawoo Jul 17, 2017
9f7dcbc
Rename position_boxdodge to position_dodge2
karawoo Jul 20, 2017
d05a7df
Merge branch 'master' into position-dodge
karawoo Jul 20, 2017
dbdc9a9
Update geom-bar documentation to mention position_dodge2()
karawoo Jul 20, 2017
79aedfb
Don't dodge if current xmin is *equal* to previous xmax
karawoo Jul 25, 2017
7c9aec8
Set default padding to 0 for position_dodge2, but override for boxes
karawoo Jul 25, 2017
3706b49
Change default box plot padding back to 0.1
karawoo Jul 25, 2017
10e8616
Merge branch 'master' into position-dodge
karawoo Jul 25, 2017
b059e39
Update position_dodge2 documentation
karawoo Jul 25, 2017
4e2052f
collide_box() does need to reorder the differently than collide() in …
karawoo Jul 25, 2017
09ee427
Rename collide_box() to collide2() to match position_dodge2()
karawoo Jul 25, 2017
04666e7
Return to default padding of 0.1 for position_dodge2()
karawoo Jul 26, 2017
8398e7e
Document position_dodge2 together with position_dodge
karawoo Jul 28, 2017
b7553ac
Merge branch 'master' into position-dodge
karawoo Jul 28, 2017
34002fa
Add description of position_dodge2 to NEWS.md
karawoo Jul 28, 2017
b437848
Revert the order of `preserve` arguments for dodge2
karawoo Jul 28, 2017
a05c3e0
Update dodge examples
karawoo Jul 28, 2017
1cdcc09
Add stats:: before aggregate()
karawoo Jul 28, 2017
9cea3ef
Merge branch 'master' into position-dodge
karawoo Jul 28, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,7 @@ Collate:
'position-.r'
'position-collide.r'
'position-dodge.r'
'position-dodge2.r'
'position-identity.r'
'position-jitter.r'
'position-jitterdodge.R'
Expand Down
2 changes: 2 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,7 @@ export(GeomViolin)
export(GeomVline)
export(Position)
export(PositionDodge)
export(PositionDodge2)
export(PositionFill)
export(PositionIdentity)
export(PositionJitter)
Expand Down Expand Up @@ -355,6 +356,7 @@ export(merge_element)
export(panel_cols)
export(panel_rows)
export(position_dodge)
export(position_dodge2)
export(position_fill)
export(position_identity)
export(position_jitter)
Expand Down
7 changes: 7 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
# ggplot2 2.2.1.9000

* Box plot position is now controlled by `position_dodge2()`, which can also be
used for bars and rectangles. `position_dodge2()` compares the `xmin` and
`xmax` values of each element to determine which ones overlap, and dodges them
accordingly. This makes it possible to dodge box plots created with
`geom_boxplot(varwidth = TRUE)`. The `padding` parameter adds a small amount
of padding between elements (@karawoo, #2143).

* `fortify()` gains a method for tbls (@karawoo, #2218)

* `stat_summary_bin()` now understands the `breaks` parameter (@karawoo, #2214)
Expand Down
13 changes: 7 additions & 6 deletions R/geom-bar.r
Original file line number Diff line number Diff line change
Expand Up @@ -15,18 +15,19 @@
#' topic}. This is why it doesn't make sense to use a log-scaled y axis with a
#' bar chart.
#'
#' By default, multiple bar occupying the same `x` position will be
#' stacked atop one another by [position_stack()]. If you want them
#' to be dodged side-to-side, use [position_dodge()]. Finally,
#' [position_fill()] shows relative proportions at each `x` by
#' stacking the bars and then standardising each bar to have the same height.
#' By default, multiple bar occupying the same `x` position will be stacked atop
#' one another by [position_stack()]. If you want them to be dodged
#' side-to-side, use [position_dodge()] or [position_dodge2()]. Finally,
#' [position_fill()] shows relative proportions at each `x` by stacking the bars
#' and then standardising each bar to have the same height.
#'
#' @section Aesthetics:
#' \aesthetics{geom}{bar}
#'
#' @seealso
#' [geom_histogram()] for continuous data,
#' [position_dodge()] for creating side-by-side barcharts.
#' [position_dodge()] and [position_dodge2()] for creating side-by-side
#' barcharts.
#' @export
#' @inheritParams layer
#' @inheritParams geom_point
Expand Down
11 changes: 10 additions & 1 deletion R/geom-boxplot.r
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@
#' )
#' }
geom_boxplot <- function(mapping = NULL, data = NULL,
stat = "boxplot", position = "dodge",
stat = "boxplot", position = "dodge2",
...,
outlier.colour = NULL,
outlier.color = NULL,
Expand All @@ -110,6 +110,15 @@ geom_boxplot <- function(mapping = NULL, data = NULL,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE) {

# varwidth = TRUE is not compatible with preserve = "total"
if (!is.character(position)) {
if (identical(position$preserve, "total") & varwidth == TRUE) {
warning("Can't preserve total widths when varwidth = TRUE.", call. = FALSE)
position$preserve <- "single"
}
}

layer(
data = data,
mapping = mapping,
Expand Down
35 changes: 32 additions & 3 deletions R/position-collide.r
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Detect and prevent collisions.
# Powers dodging, stacking and filling.
collide <- function(data, width = NULL, name, strategy, ..., check.width = TRUE, reverse = FALSE) {
collide_setup <- function(data, width = NULL, name, strategy,
check.width = TRUE, reverse = FALSE) {
# Determine width
if (!is.null(width)) {
# Width set manually
Expand All @@ -26,6 +27,15 @@ collide <- function(data, width = NULL, name, strategy, ..., check.width = TRUE,
width <- widths[1]
}

list(data = data, width = width)
}

collide <- function(data, width = NULL, name, strategy,
..., check.width = TRUE, reverse = FALSE) {
dlist <- collide_setup(data, width, name, strategy, check.width, reverse)
data <- dlist$data
width <- dlist$width

# Reorder by x position, then on group. The default stacking order reverses
# the group in order to match the legend order.
if (reverse) {
Expand All @@ -34,7 +44,6 @@ collide <- function(data, width = NULL, name, strategy, ..., check.width = TRUE,
data <- data[order(data$xmin, -data$group), ]
}


# Check for overlap
intervals <- as.numeric(t(unique(data[c("xmin", "xmax")])))
intervals <- intervals[!is.na(intervals)]
Expand All @@ -44,7 +53,7 @@ collide <- function(data, width = NULL, name, strategy, ..., check.width = TRUE,
# This is where the algorithm from [L. Wilkinson. Dot plots.
# The American Statistician, 1999.] should be used
}

if (!is.null(data$ymax)) {
plyr::ddply(data, "xmin", strategy, ..., width = width)
} else if (!is.null(data$y)) {
Expand All @@ -56,3 +65,23 @@ collide <- function(data, width = NULL, name, strategy, ..., check.width = TRUE,
stop("Neither y nor ymax defined")
}
}

# Alternate version of collide() used by position_dodge2()
collide2 <- function(data, width = NULL, name, strategy,
..., check.width = TRUE, reverse = FALSE) {
dlist <- collide_setup(data, width, name, strategy, check.width, reverse)
data <- dlist$data
width <- dlist$width

# Reorder by x position, then on group. The default stacking order is
# different than for collide() because of the order in which pos_dodge2 places
# elements
if (reverse) {
data <- data[order(data$x, -data$group), ]
} else {
data <- data[order(data$x, data$group), ]
}

pos <- match.fun(strategy)
pos(data, width, ...)
}
37 changes: 29 additions & 8 deletions R/position-dodge.r
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
#' Dodge overlapping objects side-to-side
#'
#' Dodging preserves the vertical position of an geom while adjusting the
#' horizontal position.
#' horizontal position. `position_dodge2` is a special case of `position_dodge`
#' for arranging box plots, which can have variable widths. `position_dodge2`
#' also works with bars and rectangles.
#'
#' @inheritParams position_identity
#' @param width Dodging width, when different to the width of the individual
Expand All @@ -13,17 +15,17 @@
#' @export
#' @examples
#' ggplot(mtcars, aes(factor(cyl), fill = factor(vs))) +
#' geom_bar(position = "dodge")
#' geom_bar(position = "dodge2")
#'
#' # By default, dodging preserves the total width. You can choose
#' # to preserve the width of each element:
#' # By default, dodging with `position_dodge2()` preserves the width of each
#' # element. You can choose to preserve the total width with:
#' ggplot(mtcars, aes(factor(cyl), fill = factor(vs))) +
#' geom_bar(position = position_dodge(preserve = "single"))
#' geom_bar(position = position_dodge(preserve = "total"))
#'
#' \donttest{
#' ggplot(diamonds, aes(price, fill = cut)) +
#' geom_histogram(position="dodge")
#' # see ?geom_boxplot and ?geom_bar for more examples
#' geom_histogram(position="dodge2")
#' # see ?geom_bar for more examples
#'
#' # In this case a frequency polygon is probably a better choice
#' ggplot(diamonds, aes(price, colour = cut)) +
Expand Down Expand Up @@ -58,6 +60,19 @@
#' width = 0.2,
#' position = position_dodge(width = 0.9)
#' )
#'
#' # Box plots use position_dodge2 by default, and bars can use it too
#' ggplot(data = iris, aes(Species, Sepal.Length)) +
#' geom_boxplot(aes(colour = Sepal.Width < 3.2))
#'
#' ggplot(data = iris, aes(Species, Sepal.Length)) +
#' geom_boxplot(aes(colour = Sepal.Width < 3.2), varwidth = TRUE)
#'
#' ggplot(mtcars, aes(factor(cyl), fill = factor(vs))) +
#' geom_bar(position = position_dodge2(preserve = "single"))
#'
#' ggplot(mtcars, aes(factor(cyl), fill = factor(vs))) +
#' geom_bar(position = position_dodge2(preserve = "total"))
position_dodge <- function(width = NULL, preserve = c("total", "single")) {
ggproto(NULL, PositionDodge,
width = width,
Expand All @@ -70,7 +85,6 @@ position_dodge <- function(width = NULL, preserve = c("total", "single")) {
#' @usage NULL
#' @export
PositionDodge <- ggproto("PositionDodge", Position,
required_aes = "x",
width = NULL,
preserve = "total",
setup_params = function(self, data) {
Expand All @@ -91,6 +105,13 @@ PositionDodge <- ggproto("PositionDodge", Position,
)
},

setup_data = function(self, data, params) {
if (!"x" %in% names(data) & all(c("xmin", "xmax") %in% names(data))) {
data$x <- (data$xmin + data$xmax) / 2
}
data
},

compute_panel = function(data, params, scales) {
collide(
data,
Expand Down
133 changes: 133 additions & 0 deletions R/position-dodge2.r
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
#' @export
#' @rdname position_dodge
#' @param padding Padding between elements at the same position. Elements are
#' shrunk by this proportion to allow space between them. Defaults to 0.1.
position_dodge2 <- function(width = NULL, preserve = c("single", "total"),
padding = 0.1) {
ggproto(NULL, PositionDodge2,
width = width,
preserve = match.arg(preserve),
padding = padding
)
}

#' @rdname ggplot2-ggproto
#' @format NULL
#' @usage NULL
#' @export
PositionDodge2 <- ggproto("PositionDodge2", PositionDodge,
preserve = "single",
padding = 0.1,
setup_params = function(self, data) {
if (is.null(data$xmin) && is.null(data$xmax) && is.null(self$width)) {
warning("Width not defined. Set with `position_dodge2(width = ?)`",
call. = FALSE)
}

if (identical(self$preserve, "total")) {
n <- NULL
} else if ("x" %in% names(data)){
n <- max(table(data$x))
} else {
n <- max(table(find_x_overlaps(data)))
}

list(
width = self$width,
n = n,
padding = self$padding
)
},

compute_panel = function(data, params, scales) {
collide2(
data,
params$width,
name = "position_dodge2",
strategy = pos_dodge2,
n = params$n,
padding = params$padding,
check.width = FALSE
)
}
)

pos_dodge2 <- function(df, width, n = NULL, padding = 0.1) {

if (length(unique(df$group)) == 1) {
return(df)
}

if (!all(c("xmin", "xmax") %in% names(df))) {
df$xmin <- df$x
df$xmax <- df$x
}

# xid represents groups of boxes that share the same position
df$xid <- find_x_overlaps(df)

# based on xid find newx, i.e. the center of each group of overlapping
# elements. for boxes, bars, etc. this should be the same as original x, but
# for arbitrary rects it may not be
newx <- (tapply(df$xmin, df$xid, min) + tapply(df$xmax, df$xid, max)) / 2
df$newx <- newx[df$xid]

if (is.null(n)) {
# If n is null, preserve total widths of elements at each position by
# dividing widths by the number of elements at that position
n <- table(df$xid)
df$new_width <- (df$xmax - df$xmin) / as.numeric(n[df$xid])
} else {
df$new_width <- (df$xmax - df$xmin) / n
}

df$xmin <- df$x - (df$new_width / 2)
df$xmax <- df$x + (df$new_width / 2)

# Find the total width of each group of elements
group_sizes <- stats::aggregate(
list(size = df$new_width),
list(newx = df$newx),
sum
)

# Starting xmin for each group of elements
starts <- group_sizes$newx - (group_sizes$size / 2)

# Set the elements in place
for (i in seq_along(starts)) {
divisions <- cumsum(c(starts[i], df[df$xid == i, "new_width"]))
df[df$xid == i, "xmin"] <- divisions[-length(divisions)]
df[df$xid == i, "xmax"] <- divisions[-1]
}

# x values get moved to between xmin and xmax
df$x <- (df$xmin + df$xmax) / 2

# If no elements occupy the same position, there is no need to add padding
if (!any(duplicated(df$xid))) {
return(df)
}

# Shrink elements to add space between them
df$pad_width <- df$new_width * (1 - padding)
df$xmin <- df$x - (df$pad_width / 2)
df$xmax <- df$x + (df$pad_width / 2)

df[, c("xid", "newx", "new_width", "pad_width")] <- NULL

df
}

# Find groups of overlapping elements that need to be dodged from one another
find_x_overlaps <- function(df) {
overlaps <- vector(mode = "numeric", length = nrow(df))
overlaps[1] <- counter <- 1
for (i in 2:nrow(df)) {
if (df$xmin[i] >= df$xmax[i - 1]) {
counter <- counter + 1
}
overlaps[i] <- counter
}
overlaps
}
2 changes: 1 addition & 1 deletion R/stat-boxplot.r
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
#' }
#' @export
stat_boxplot <- function(mapping = NULL, data = NULL,
geom = "boxplot", position = "dodge",
geom = "boxplot", position = "dodge2",
...,
coef = 1.5,
na.rm = FALSE,
Expand Down
Loading