Consider improving nowcaster/forecaster `epi_slide` sample in `advanced.Rmd`

Problem described [here](https://github.com/cmu-delphi/epiprocess/issues/287#issue-1641207903).  Allowing negative `after` values would not completely resolve the issue above, as the test-time prediction would still need to be made.  We're just missing the train/test split.  E.g., this version should perform the split (and expands the window to get two time steps (max) worth of training data).  However, it runs into issues with a missing training split when there is no training data available (previously not a problem because there was always the test data to train on):
```r
edf %>%
  epi_slide(function(d, ...) {
    d_split = d %>%
      group_by(geo_value) %>%
      mutate(subset = if_else(time_value == max(time_value), "test", "train")) %>%
      ungroup() %>%
      split(.$subset) %>%
      lapply(select, -"subset")
    obj <- lm(y ~ x, data = d_split$train)
    return(
      as.data.frame(
        predict(obj, newdata = d_split$test,
                interval = "prediction", level = 0.9)
      ))
  }, before = 2, new_col_name = "fc", names_sep = NULL)
```
This gives a mysterious message
```
Error in eval(predvars, data, env) : object 'y' not found
```
due to the carefree coding (because we are trying to pull `y` out of a `NULL` training set).  We can likely get more appropriate error messages via:
*  using a factor instead of string for the subset indicator
* using `group_split`, `nest_by`, `nest(....., .by=.....)`, etc. + a bunch of awkward indexing (`filter` `pull` unwrap)
* filtering all rows once to get the training set, then again to get the test set, e.g.:
```r
edf %>%
  epi_slide(function(d, ...) {
    d_split = d %>%
      group_by(geo_value) %>%
      mutate(subset = if_else(time_value == max(time_value), "test", "train")) %>%
      ungroup()
    obj <- lm(y ~ x, data = d_split %>% filter(subset=="train"))
    return(
      as.data.frame(
        predict(obj, newdata = d_split %>% filter(subset=="test") %>% select(-subset),
                interval = "prediction", level = 0.9)
      ))
  }, before = 2, new_col_name = "fc", names_sep = NULL)
```
This improves the error message
```
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
  0 (non-NA) cases
```
but regardless of whether we get an intelligible error message or not, we still have to have manual code to deal with skipping/completing instances with no training data (see #256).

For now, I plan to just explain the additional problem noted in the linked issue, and hold off on any improvements.  A solution to #256 may give us more options.  Another approach would be to change to using `epipredict` in this example.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consider improving nowcaster/forecaster `epi_slide` sample in `advanced.Rmd` #288

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Consider improving nowcaster/forecaster epi_slide sample in advanced.Rmd #288

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Consider improving nowcaster/forecaster `epi_slide` sample in `advanced.Rmd` #288