Skip to content

Consider improving nowcaster/forecaster epi_slide sample in advanced.Rmd #288

Closed
@brookslogan

Description

@brookslogan

Problem described here. Allowing negative after values would not completely resolve the issue above, as the test-time prediction would still need to be made. We're just missing the train/test split. E.g., this version should perform the split (and expands the window to get two time steps (max) worth of training data). However, it runs into issues with a missing training split when there is no training data available (previously not a problem because there was always the test data to train on):

edf %>%
  epi_slide(function(d, ...) {
    d_split = d %>%
      group_by(geo_value) %>%
      mutate(subset = if_else(time_value == max(time_value), "test", "train")) %>%
      ungroup() %>%
      split(.$subset) %>%
      lapply(select, -"subset")
    obj <- lm(y ~ x, data = d_split$train)
    return(
      as.data.frame(
        predict(obj, newdata = d_split$test,
                interval = "prediction", level = 0.9)
      ))
  }, before = 2, new_col_name = "fc", names_sep = NULL)

This gives a mysterious message

Error in eval(predvars, data, env) : object 'y' not found

due to the carefree coding (because we are trying to pull y out of a NULL training set). We can likely get more appropriate error messages via:

  • using a factor instead of string for the subset indicator
  • using group_split, nest_by, nest(....., .by=.....), etc. + a bunch of awkward indexing (filter pull unwrap)
  • filtering all rows once to get the training set, then again to get the test set, e.g.:
edf %>%
  epi_slide(function(d, ...) {
    d_split = d %>%
      group_by(geo_value) %>%
      mutate(subset = if_else(time_value == max(time_value), "test", "train")) %>%
      ungroup()
    obj <- lm(y ~ x, data = d_split %>% filter(subset=="train"))
    return(
      as.data.frame(
        predict(obj, newdata = d_split %>% filter(subset=="test") %>% select(-subset),
                interval = "prediction", level = 0.9)
      ))
  }, before = 2, new_col_name = "fc", names_sep = NULL)

This improves the error message

Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
  0 (non-NA) cases

but regardless of whether we get an intelligible error message or not, we still have to have manual code to deal with skipping/completing instances with no training data (see #256).

For now, I plan to just explain the additional problem noted in the linked issue, and hold off on any improvements. A solution to #256 may give us more options. Another approach would be to change to using epipredict in this example.

Metadata

Metadata

Assignees

Labels

P2low prioritybugSomething isn't workingdocumentationImprovements or additions to documentation

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions