Skip to content

Way to apply custom function to labels and variables in a plot (similar to rename_with for a tibble) #4728

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
torfason opened this issue Feb 7, 2022 · 1 comment

Comments

@torfason
Copy link

torfason commented Feb 7, 2022

I have found that there are very many situations where it is useful to update both the labels and variable values that are used in a ggplot plot using a function mapping character() -> character(). A very typical use case is when a variable is named total_population but the axis label should read Total population. Currently some suggested solutions I have seen/used are:

  • Setting labels manually, which is very flexible, but needs to be repeated for every label in every plot, and can lead to errors if a variable is replaced in a plot but one inadvertently neglects updating the label.
  • Renaming variables prior to plotting. Although this is possible, it requires the use of quotes in the case of spaces, and can require updating the variable name in many places in generating the plot in the case when one simply wants to change the displayed text on the plot.

To address this issue, I have written a function that takes a ggplot object and an arbitrary function that maps one character vector to another. It then applies the function to text elements of the plot. The function also takes arguments to determine if the function should be applied to labels used in the plot, to variables (works on factors and strings), and allows choosing subsets of either labels or variables. An ellipsis argument allows additional parameters to be passed to the mapping function.

A reprex demonstrating the use of this function to plot mass against height using the starwars data set is included below.

The feature request is to add this function or an adaptation of it to the ggplot2 package. I should note that due to limited familiarity with the ggplot object system, this function is not currently implemented as a standard ggplot function that is added to a plot using +. Instead, it has to be used separately, using a pipe or direct function call. It would probably be desireable to rewrite it so that it fit with regular usage.

I have searched widely for a solution for this problem, in case there is already a good way to do this, then my apologies for the noise :-)

library(tidyverse)
library(snakecase)

# The proposed function
gg_apply <- function(p, fun, ..., .labs=TRUE, .vars=TRUE) {

  # Calculate new label values, and test that fun returns a sane result
  labels_new <- lapply(p$labels, fun, ...)
  stopifnot(
    all(sapply(labels_new, is.character)),
    length(labels_new) == length(p$labels)
  )

  # If .labs is true, we replace the labels in p
  if ( isTRUE(.labs) ) {
    .labs <- names(p$labels)
  }
  if ( isFALSE(.labs) || is.null(.labs) || all(is.na(.labs)) ) {
    .labs <- character()
  }
  stopifnot(is.character(.labs))

  # .labs is now a character vector of labels to replace
  for ( lab_name in .labs ) {
    p$labels[[lab_name]] <- labels_new[[lab_name]]
  }

  # Process non-character list values for labs.
  # If neither of these conditions is true, vars MUST be a
  # character vector, or we bail.
  if ( isTRUE(.vars) ) {
    .vars <- names(p$data)
  }
  if ( isFALSE(.vars) || is.null(.vars) || all(is.na(.vars)) ) {
    .vars <- character()
  }
  stopifnot(is.character(.vars))

  # .vars is now a character vector of variables to replace
  for ( var_name in .vars ) {

    # Process a character variable and do some sanity testing
    if ( is.character(p$data[[var_name]]) ) {
      var_new <- fun(p$data[[var_name]], ...)
      stopifnot(
        is.character(var_new),
        length(var_new) == length(p$data[[var_name]])
      )
      p$data[[var_name]] <- var_new
    }

    # Process a factor variable and do some sanity testing
    if ( is.factor(p$data[[var_name]]) ) {
      var_fct_old <- p$data[[var_name]]
      var_chr_new <- fun(as.character(var_fct_old), ...)
      levels_new  <- fun(levels(var_fct_old), ...)
      stopifnot(
        is.character(var_chr_new),
        length(var_chr_new) == length(var_fct_old)
      )
      p$data[[var_name]] <- factor(var_chr_new, levels=levels_new)
    }

  }

  p
}

# Example usage of the function
p <- starwars %>%
    filter(mass < 1000) %>%
    mutate(species = species %>% fct_infreq %>%  fct_lump(5) %>% fct_explicit_na) %>%
    ggplot(aes(height, mass, color=species, size=birth_year)) +
    geom_point()
p %>% gg_apply(snakecase::to_sentence_case)
#> Warning: Removed 23 rows containing missing values (geom_point).

Created on 2022-02-07 by the reprex package (v2.0.1)

In the plot, note that all the labels are formatted using sentence case, as one would expect in a publication. Also note that any function can be used. For example, to create representations of a plot in multiple languages, one could use a lookup function that maps variable names to different language representations.

@yutannihilation
Copy link
Member

Thanks for the idea, but I'm closing this as there's existing discussion. Please feel free to comment there.

#4648

Note that you can implement ggplot_add() so that the object can be added to a plot using +.

c.f. https://github.com/yutannihilation/gghighlight/blob/4bc54b152796a0631c6557ef936c328bcbf9b5b9/R/gghighlight.R#L109

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants