Description
Problem described here. Allowing negative after
values would not completely resolve the issue above, as the test-time prediction would still need to be made. We're just missing the train/test split. E.g., this version should perform the split (and expands the window to get two time steps (max) worth of training data). However, it runs into issues with a missing training split when there is no training data available (previously not a problem because there was always the test data to train on):
edf %>%
epi_slide(function(d, ...) {
d_split = d %>%
group_by(geo_value) %>%
mutate(subset = if_else(time_value == max(time_value), "test", "train")) %>%
ungroup() %>%
split(.$subset) %>%
lapply(select, -"subset")
obj <- lm(y ~ x, data = d_split$train)
return(
as.data.frame(
predict(obj, newdata = d_split$test,
interval = "prediction", level = 0.9)
))
}, before = 2, new_col_name = "fc", names_sep = NULL)
This gives a mysterious message
Error in eval(predvars, data, env) : object 'y' not found
due to the carefree coding (because we are trying to pull y
out of a NULL
training set). We can likely get more appropriate error messages via:
- using a factor instead of string for the subset indicator
- using
group_split
,nest_by
,nest(....., .by=.....)
, etc. + a bunch of awkward indexing (filter
pull
unwrap) - filtering all rows once to get the training set, then again to get the test set, e.g.:
edf %>%
epi_slide(function(d, ...) {
d_split = d %>%
group_by(geo_value) %>%
mutate(subset = if_else(time_value == max(time_value), "test", "train")) %>%
ungroup()
obj <- lm(y ~ x, data = d_split %>% filter(subset=="train"))
return(
as.data.frame(
predict(obj, newdata = d_split %>% filter(subset=="test") %>% select(-subset),
interval = "prediction", level = 0.9)
))
}, before = 2, new_col_name = "fc", names_sep = NULL)
This improves the error message
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
0 (non-NA) cases
but regardless of whether we get an intelligible error message or not, we still have to have manual code to deal with skipping/completing instances with no training data (see #256).
For now, I plan to just explain the additional problem noted in the linked issue, and hold off on any improvements. A solution to #256 may give us more options. Another approach would be to change to using epipredict
in this example.