Skip to content

Need position_stack_line() #2883

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mikmart opened this issue Sep 6, 2018 · 16 comments
Closed

Need position_stack_line() #2883

mikmart opened this issue Sep 6, 2018 · 16 comments
Labels
feature a feature request or enhancement positions 🥇

Comments

@mikmart
Copy link
Contributor

mikmart commented Sep 6, 2018

geom_area() doesn't play well with duplicated x values. The docs state:

geom_area is a special case of geom_ribbon, where the ymin is fixed to 0.

But when there are multiple values of y for a given value of x, the two are not equivalent. The geom_ribbon() result matches the intuition of "geom_line() but with the area under the curve filled in":

library(tidyverse)
df <- data.frame(x = c(1, 2, 2, 3), y = 1:4)

ggplot(df, aes(x, y)) + geom_ribbon(aes(ymin = 0, ymax = y))

However, it seems this scenario results in geom_area() having the equivalent of ymin = min(y) and ymax = sum(y) in geom_ribbon().

ggplot(df, aes(x, y)) + geom_area()

I'm happy to dig a bit further and see if I can submit a PR for this.

Created on 2018-09-06 by the reprex package (v0.2.0.9000).

@smouksassi
Copy link

This is because (as per the docs) the default position for geom_area is "stack" if you specify "identity" you get the same results.

df <- data.frame(x = c(1, 2, 2, 3), y = 1:4)
ggplot(df, aes(x, y)) +
geom_point(color="red")+
  geom_area(alpha=0.5,aes(fill="geom_area") ,
            position = "identity")+
  geom_ribbon(alpha=0.5,aes(ymin = 0, ymax = y,fill="geom_ribbon"))

@mikmart
Copy link
Contributor Author

mikmart commented Sep 6, 2018

That's true, and perhaps points to a potential issue with position_stack() instead:

library(tidyverse)

df <- data.frame(x = c(1, 2, 2, 3), y = 1:4)
df2 <- bind_rows(a = df, b = df, .id = "g")

ggplot(df2, aes(x, y, fill = g)) + geom_area()

ggplot(df2, aes(x, y, fill = g)) +
  geom_ribbon(aes(ymin = 0, ymax = y), position = "stack")

Created on 2018-09-06 by the reprex package (v0.2.0.9000).

@mikmart
Copy link
Contributor Author

mikmart commented Sep 6, 2018

Then again, position_stack() is just doing what it does, stacking y values with a common x.. i.e. the above would look entirely reasonable with e.g. points. But this doesn't really match the intuition (or mine, at least) for area charts.

@smouksassi
Copy link

what kind of results you are expecting the above are identical

@mikmart
Copy link
Contributor Author

mikmart commented Sep 6, 2018

Yes, that's why this is probably an issue with position_stack(). (If it is an issue at all.)

I would expect to not have the gaps in between the filled in areas.

@mikmart
Copy link
Contributor Author

mikmart commented Sep 6, 2018

Essentially I would want to get this:

library(tidyverse)

df3 <- data.frame(
  x = c(1, 2, 2, 3, 1, 2, 2, 3),
  y = c(1, 2, 3, 4, 2, 4, 6, 8),
  g = rep(c("b", "a"), each = 4)
)

ggplot(df3, aes(x, y, fill = g)) +
    geom_area(position = "identity")

This matches my intuition of area stacking, but doesn't really fit all that well with how position_stack() works. Also, this is not really compatible with how e.g. point stacking should (and does) work..?

Created on 2018-09-06 by the reprex package (v0.2.0.9000).

@mikmart
Copy link
Contributor Author

mikmart commented Sep 6, 2018

The more I think about this, the more it feels like perhaps this is not an issue with position_stack(), but rather area stacking in the above manner would be the job for a different position adjustment.

When presenting data like this, only the maximum of the y values in an "x group" is really a data point: the minimum y is really just a bounding point for a polygon, and doesn't actually represent data (okay it can be data, too: the minimum and maximum together define a range that is the data); and any other y values in between don't do anything.

Perhaps some sort of position_stack_range() might be useful? But I assume out of scope for ggplot2.

@yutannihilation
Copy link
Member

I too feel geom_area() seems against the "intuition." I think this feeling comes from the fact that, while we imagine we are stacking areas, we actually are playing with points. For example, if we add a slight jitter to x axis, they are not stacked as expected. In the following, you can see "a" sinks into "b" because it fails to be stacked.

library(tidyverse)

df <- data.frame(x = rep(1:5, each = 2),
                 y = rep(10 + 1:5, each = 2),
                 g = rep(c("a", "b"), times = 5))
df$x_jitter <- df$x + 10e-4 * runif(10)
df
#>    x  y g x_jitter
#> 1  1 11 a 1.000296
#> 2  1 11 b 1.000814
#> 3  2 12 a 2.000061
#> 4  2 12 b 2.000527
#> 5  3 13 a 3.000078
#> 6  3 13 b 3.000748
#> 7  4 14 a 4.000230
#> 8  4 14 b 4.000891
#> 9  5 15 a 5.000595
#> 10 5 15 b 5.000664

library(egg)
#> Loading required package: gridExtra
#> 
#> Attaching package: 'gridExtra'
#> The following object is masked from 'package:dplyr':
#> 
#>     combine

ggarrange(
  ggplot(df, aes(x, y, fill = g)) + geom_area(),
  ggplot(df, aes(x_jitter, y, fill = g)) + geom_area()
)

Created on 2018-09-06 by the reprex package (v0.2.0.9000).

@mikmart
Copy link
Contributor Author

mikmart commented Sep 6, 2018

@yutannihilation yeah I think you're right. Both examples make perfect sense with points, bars, etc. but don't quite "work" with areas (even though when reasoning about it, the behaviour is logical).

I read some other issues related to this before posting (#2720, #2802; also just now noticed that this is a duplicate from way back in 2013: #795) and thought this was separate; perhaps not.

Maybe it would make sense to have a position adjustment (somewhere) that would handle all of these problems, sort of simulating physical area stacking. That would require automatic interpolation between groups, too, to address the misaligned x values.

@smouksassi
Copy link

yes stacking time series can be done in several ways for example:
there is : https://github.com/AtherEnergy/ggTimeSeries
and this paper.
http://leebyron.com/streamgraph/stackedgraphs_byron_wattenberg.pdf
it may be a special stat on how to handle duplicate x

@mikmart
Copy link
Contributor Author

mikmart commented Sep 6, 2018

The Byron & Wattenberg paper is really cool, but I'm not sure it applies to duplicated y's within x's, or "physical" area stacking, though. It's essentially a stacked position adjustment with a general baseline function rather than a fixed 0 baseline + smart ordering.

@ptoche
Copy link

ptoche commented Dec 15, 2018

@mikmart, is this a new issue or was introduced by an update? I ask out of curiosity because when I filed 2803, I noticed the stacking behaviour had changed over the previous approx 2 years.

@mikmart
Copy link
Contributor Author

mikmart commented Dec 17, 2018

I'm not sure, but I would assume it's been like this for quite a while. The way the stacking algorithm works (replacing y values that share an x value with their cumulative sum) seems pretty fundamental.

@mikmart
Copy link
Contributor Author

mikmart commented Dec 17, 2018

I had a look and I think what caused the change in #2803 was cf716a3, from which onward positive and negative values have been stacked separately. These seem separate to me.

@yutannihilation
Copy link
Member

yutannihilation commented Dec 18, 2018

I've played around this and now feel there's a lot to consider to stack areas properly. I'm afraid this is a bit too complex to implement in ggplot2...

https://gist.github.com/yutannihilation/2d3851adc874a02f42914f1655329c71

@thomasp85 thomasp85 added the bug an unexpected problem or unintended behavior label Apr 11, 2019
@hadley hadley added feature a feature request or enhancement and removed bug an unexpected problem or unintended behavior labels Jun 18, 2019
@hadley hadley changed the title geom_area() stacks y values for duplicated x values Need position_stack_line() Jun 18, 2019
@yutannihilation
Copy link
Member

Closed by #4889

library(tidyverse)

df <- data.frame(x = c(1, 2, 2, 3), y = 1:4)
df2 <- bind_rows(a = df, b = df, .id = "g")

ggplot(df2, aes(x, y, fill = g)) + geom_area()

Created on 2022-07-07 by the reprex package (v2.0.1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement positions 🥇
Projects
None yet
Development

No branches or pull requests

7 participants