Skip to content

Allow empty data frame #499

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 20, 2012
Merged

Allow empty data frame #499

merged 8 commits into from
Apr 20, 2012

Conversation

wch
Copy link
Member

@wch wch commented Apr 16, 2012

Fix #31 and fix #332. This should be considered just a rough attempt to illustrate how it could be done. I modified facet-null.r, but facet-wrap.r and facet-grid.r should probably also be modified.

Note that there's a change in behavior -- plots with 0 rows of data are now allowed. For example:

empty <- data.frame(x=numeric(0), y=numeric(0))

ggplot(empty, aes(x=x,y=y)) + geom_point()

This works, but there are no points, and no breaks. It's just a gray plotting area with "x" and "y" axis labels.

@wch
Copy link
Member Author

wch commented Apr 16, 2012

This will also fix #176, when #482 is also merged.

@hadley
Copy link
Member

hadley commented Apr 16, 2012

Cool - thanks. I guess another option would be to default to data = waiver() or possible data = inherit() (for a more obvious name in this context) and then NULLs could just be nulls.

@wch
Copy link
Member Author

wch commented Apr 16, 2012

That makes sense. It may require some more extensive waiver checks in the code than what I added here, because there are some operations that work fine with a 0-row data frame, but don't work with NULL.

Do you want to go with waiver() or inherit()?

@hadley
Copy link
Member

hadley commented Apr 16, 2012

I think inherit - it can just be defined as inherit <- waiver, but documented to describe a little bit about the default behaviour.

@wch
Copy link
Member Author

wch commented Apr 16, 2012

Should it also define is.inherit <- is.waive?

@hadley
Copy link
Member

hadley commented Apr 16, 2012

No, it's more of a user-facing change than a developer-facing change. (Although I might uniformly prefer inherit to waiver - I'm not sure)

@wch
Copy link
Member Author

wch commented Apr 16, 2012

Should all the geoms and stats be updated to have a default of data = inherit(), or should they remain as data = NULL? The latter works right now because fortify.null simply returns inherit().

@hadley
Copy link
Member

hadley commented Apr 16, 2012

Probably all updated, but let's hang off on that until the layer rewrite.

@wch
Copy link
Member Author

wch commented Apr 16, 2012

I changed to inherit(), rebased, and pushed. Please check if the help makes sense. You said that inherit() is meant to be only a user-facing change, but the user can't actually see it anywhere at this point, since it's only called from fortify.null -- so should we use waiver() instead?

A couple more things left before it can be merged. I have to test for facet grid and wrap, and figure out why this fails:

qplot(1:5, 1:5)
# Error in data.frame(evaled, PANEL = data$PANEL) : 
#  arguments imply differing number of rows: 5, 0

ggplot() + geom_point(aes(x=1:5, y=1:5))
# Works fine

@wch
Copy link
Member Author

wch commented Apr 18, 2012

Everything works now. There's slightly complicated logic at the end of compute_aesthetics() in layer.r. It was necessary to support weird cases, like when data is not set, but aesthetics are given vectors, like ggplot(mapping=aes(x=1:4, y=1:4)) + geom_point().

I also don't think defining inherit() is necessary, since it's not used in any user-facing code. And, as discussed above, it's tested with is.waive(), which I think is a little odd.

This is the test battery I've been using. (It passes all the regular tests, and visual tests are unchanged.) All of the tests with the one-row data frame are here for comparison, but they could be removed. I'd prefer to turn these into regular (non-visual) tests, but it's not clear to me what parts can be tested this way. Any advice?

# Zero rows of data
df0 <- data.frame(x=numeric(0), y=numeric(0), am=numeric(0), cyl=numeric(0))
# One row of data
df1 <- data.frame(x=1, y=1, am=1, cyl=4)


# No visible points
ggplot(df0, aes(x=x,y=y)) + geom_point()
ggplot() + geom_point(data=df0, aes(x=x,y=y))

# One point at 1, 1
ggplot() + geom_point(data=df1, aes(x=x,y=y))


# Regular mtcars data, x=mpg, y=wt, plus df0 data frame
ggplot(mtcars, aes(x=mpg, y=wt)) + geom_point() + geom_point(data=df0, aes(x=x, y=y))

# Regular mtcars data, no visible points
ggplot(mtcars, aes(x=mpg, y=wt)) + geom_point(data=df0, aes(x=x,y=y))

# mtcars data with df1 data, one red point at 1, 1
ggplot(mtcars, aes(x=mpg, y=wt)) + geom_point() + geom_point(data=df1, aes(x=x,y=y), colour="red")


# == facets ==

# mtcars data plus df0 data frame, facet_wrap
ggplot(mtcars, aes(x=mpg, y=wt)) + geom_point() + geom_point(data=df0, aes(x=x,y=y)) + facet_wrap(~ cyl)
# mtcars data plus df0 data frame, facet_grid
ggplot(mtcars, aes(x=mpg, y=wt)) + geom_point() + geom_point(data=df0, aes(x=x,y=y)) + facet_grid(am ~ cyl)

# mtcars data plus one red point at 1, 1, cyl=4,facet_wrap
ggplot(mtcars, aes(x=mpg, y=wt)) + geom_point() + geom_point(data=df1, aes(x=x,y=y), colour="red") +
  facet_wrap(~ cyl)
# mtcars data plus one red point at 1, 1, am=1, cyl=4, facet_grid
ggplot(mtcars, aes(x=mpg, y=wt)) + geom_point() + geom_point(data=df1, aes(x=x,y=y), colour="red") + 
  facet_grid(am ~ cyl)

# Empty data, facet_wrap, throws error
ggplot(df0, aes(x=x, y=y)) + geom_point() + facet_wrap(~ cyl)
# Empty data, facet_grid, throws error
ggplot(df0, aes(x=x, y=y)) + geom_point() + facet_grid(am ~ cyl)

# One data point, facet_wrap
ggplot(df1, aes(x=x, y=y)) + geom_point() + facet_wrap(~ cyl)
# One data point, facet_grid
ggplot(df1, aes(x=x, y=y)) + geom_point() + facet_grid(am ~ cyl)


# Empty data with x and y mapped to vector of values
qplot(1:5, 1:5)
ggplot(mapping=aes(x=1:5, y=1:5)) + geom_point()
qplot(x, y, data=data.frame(x=1:5, y=1:5))
ggplot() + geom_point(aes(x=1:5, y=1:5))

@@ -15,7 +15,7 @@
fortify <- function(model, data, ...) UseMethod("fortify")

fortify.data.frame <- function(model, data, ...) model
fortify.NULL <- function(model, data, ...) data.frame()
fortify.NULL <- function(model, data, ...) inherit()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should that be waiver ?

@hadley
Copy link
Member

hadley commented Apr 18, 2012

Looks good - thanks! For the tests, I'd suggest looking at the output from ggplot_build

@hadley
Copy link
Member

hadley commented Apr 18, 2012

I'm still a bit worried about the change to compute_aesthetics. Could you add a test like:

qplot(mpg, wt, data = mtcars) + 
  facet_wrap(~cyl) + 
  geom_point(data = data.frame(), x = 20, y = 3, colour = "red", size = 5)

@wch
Copy link
Member Author

wch commented Apr 18, 2012

You're right - that added point is getting lost. I'll fix it.

The point shows up, though, when data is NULL:

qplot(mpg, wt, data = mtcars) +
  facet_wrap(~cyl) +
  geom_point(data = NULL, x = 20, y = 3, colour = "red", size = 5)

Now that I think about it, I believe this is the correct behavior. If you want to add a single point, you should use annotate; if you use a zero-row data frame, there's no data for which the aesthetics should be set.

@wch
Copy link
Member Author

wch commented Apr 18, 2012

Here's a roadblock for having data.frame() and NULL result in the same behavior. The fortify methods aren't the same:

fortify.data.frame <- function(model, data, ...) model
fortify.NULL <- function(model, data, ...) waiver()

As things stand right now, if you pass data=NULL to a layer, that means that it should inherit the data. When we replace all the default data=NULL arguments with data=waiver() in the geoms and stats, then we should be able to address this problem better. That is, waiver() should behave one way, and data.frame() and NULL should behave another.

@hadley
Copy link
Member

hadley commented Apr 18, 2012

I think I got confused before - NULL and data.frame() should behave differently. (Eventually they'll be the same when we replace the default NULL behaviour with waiver). NULL is translated to waiver() and so the layer inherits from the plot data frame. data.frame() is for when you want a layer with no data associated with it - this can occur when you're subsetting, or if you wanted to have no scaling, just mapping.

@@ -190,6 +190,7 @@ facet_train_layout.grid <- function(facet, data) {

#' @S3method facet_map_layout grid
facet_map_layout.grid <- function(facet, data, layout) {
if (nrow(data) == 0) return(cbind(data, PANEL = integer(0)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should actually return the whole panel data frame - then there's a row for each panel in the plot.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried taking out this line and putting this at the top of locate_grid(), to the same effect as what you suggested:

  if (empty(data))
    return(panels)

But it fails on this test, I think because it later looks for a variable that's not part of the panels data frame:

df0 <- data.frame(mpg=numeric(0), wt=numeric(0), am=numeric(0), cyl=numeric(0))
ggplot(mtcars, aes(x=mpg, y=wt)) + geom_point() + geom_point(data=df0) + facet_grid(am ~ cyl)
# Error in eval(expr, envir, enclos) : object 'wt' not found
#
# Enter a frame number, or 0 to exit   
#
#  1: print(list(data = list(mpg = c(21, 21, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 
#  2: print.ggplot(list(data = list(mpg = c(21, 21, 22.8, 21.4, 18.7, 18.1, 14.3,
#  3: plot-render.r#171: ggplot_build(x)
#  4: plot-build.r#39: dlapply(function(d, p) p$compute_aesthetics(d, plot))
#  5: plot-build.r#26: f(d = data[[i]], p = layers[[i]])
#  6: plot-build.r#26: p$compute_aesthetics(d, plot)
#  7: plot-build.r#26: get(x, envir = this, inherits = inh)(this, ...)
#  8: layer.r#159: compact(eval.quoted(aesthetics, data, plot$plot_env))
#  9: layer.r#159: Filter(Negate(is.null), l)
# 10: layer.r#159: sapply(x, f)
# 11: layer.r#159: lapply(X = X, FUN = FUN, ...)
# 12: layer.r#159: eval.quoted(aesthetics, data, plot$plot_env)
# 13: lapply(exprs, eval, envir = envir, enclos = enclos)
# 14: FUN(X[[2]], ...)
# 15: eval(expr, envir, enclos)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's right - you should instead do geom_point(data=df0, x = 10, y = 10, colour = "red") - if there's no data, there's nothing to plot.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the purpose of all these changes to make things work with empty data frames? I suppose facets are a little different though.

df0 <- data.frame(mpg=numeric(0), wt=numeric(0), am=numeric(0), cyl=numeric(0))

# Works
ggplot(mtcars, aes(x=mpg, y=wt)) + geom_point() + geom_point(data=df0)

# Doesn't work with facet_grid
ggplot(mtcars, aes(x=mpg, y=wt)) + geom_point() + geom_point(data=df0) + facet_grid(am ~ cyl)

# Runs (but red point doesn't show up)
ggplot(mtcars, aes(x=mpg, y=wt)) + geom_point() + geom_point(data=df0,x = 10, y = 10, colour = "red") + 
  facet_grid(am ~ cyl)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so confused. Let's try and get it sorted out in our meeting later today.

@wch
Copy link
Member Author

wch commented Apr 19, 2012

I've added more tests, some of which presently fail. It would be good to discuss if the tests are correct.

@wch
Copy link
Member Author

wch commented Apr 19, 2012

Also fixes #506 and fixes #507.

@wch
Copy link
Member Author

wch commented Apr 19, 2012

OK, it should be good to go. It passes all the new and old tests, and visual tests are unchanged.

hadley added a commit that referenced this pull request Apr 20, 2012
Allow empty data frame.
Fixes #506. Fixes #507. Fixes #31. Fixes #332.
@hadley hadley merged commit fe42a64 into tidyverse:master Apr 20, 2012
@lock
Copy link

lock bot commented Jan 19, 2019

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Jan 19, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mapped variable not found when it is in a zero row data.frame For data, NULL != data.frame()
2 participants