diff --git a/_freeze/smooth-qr/execute-results/html.json b/_freeze/smooth-qr/execute-results/html.json new file mode 100644 index 0000000..f1de746 --- /dev/null +++ b/_freeze/smooth-qr/execute-results/html.json @@ -0,0 +1,17 @@ +{ + "hash": "850753d95316a1b705f4a3efeb3c53e4", + "result": { + "engine": "knitr", + "markdown": "---\ntitle: \"Smooth quantile regression\"\noutput: rmarkdown::html_vignette\nvignette: >\n %\\VignetteIndexEntry{Smooth quantile regression}\n %\\VignetteEngine{knitr::rmarkdown}\n %\\VignetteEncoding{UTF-8}\n---\n\n\n\n\n\n\n# Introducing smooth quantile regression\n\nWhereas other time-series forecasting examples in this package have used\n(direct) models for single horizons, in multi-period forecasting, the goal is to\n(directly) forecast several horizons simultaneously. This is useful in\nepidemiological applications where decisions are based on the trend of a signal.\n\nThe idea underlying smooth quantile regression is that set forecast targets can\nbe approximated by a smooth curve. This novel approach from\n[Tuzhilina et al., 2022](https://arxiv.org/abs/2202.09723)\nenforces smoothness across the\nhorizons and can be applied to point estimation by regression or interval\nprediction by quantile regression. Our focus in this vignette is the latter.\n\n# Built-in function for smooth quantile regression and its parameters\n\nThe built-in smooth quantile regression function, `smooth_quantile_reg()`\nprovides a model specification for smooth quantile regression that works under\nthe tidymodels framework. It has the following parameters and default values:\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsmooth_quantile_reg(\n mode = \"regression\",\n engine = \"smoothqr\",\n outcome_locations = NULL,\n quantile_levels = 0.5,\n degree = 3L\n)\n```\n:::\n\n\n\n\nFor smooth quantile regression, the type of model or `mode` is regression.\n\nThe only `engine` that is currently supported is `smooth_qr()` from the\n[`smoothqr` package](https://dajmcdon.github.io/smoothqr/).\n\nThe `outcome_locations` indicate the multiple horizon (ie. ahead) values. These\nshould be specified by the user.\n\nThe `quantile_levels` parameter is a vector of values that indicates the\nquantiles to be estimated. The default is the median (0.5 quantile).\n\nThe `degree` parameter indicates the degree of the polynomials used for\nsmoothing of the response. It should be no more than the number of aheads. If\nthe degree is precisely equal to the number of aheads, then there is no\nsmoothing. To better understand this parameter and how it works, we should look\nto its origins and how it is used in the model.\n\n# Model form\n\nSmooth quantile regression is linear auto-regressive, with the key feature being\na transformation that forces the coefficients to satisfy a smoothing constraint.\nThe purpose of this is for each model coefficient to be a smooth function of\nahead values, and so each such coefficient is set to be a linear combination of\nsmooth basis functions (such as a spline or a polynomial).\n\nThe `degree` parameter controls the number of these polynomials used. It should\nbe no greater than the number of responses. This is a tuning parameter, and so\nit can be chosen by performing a grid search with cross-validation. Intuitively,\n$d = 1$ corresponds to the constant model, $d = 2$ gives straight line\nforecasts, while $d = 3$ gives quadratic forecasts. Since a degree of 3 was\nfound to work well in the tested applications (see Section 9 of\n[Tuzhilina et al., 2022](https://arxiv.org/abs/2202.09723)),\nit is the default value.\n\n# Demonstration of smooth quantile regression\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(epipredict)\nlibrary(dplyr)\nlibrary(purrr)\nlibrary(ggplot2)\ntheme_set(theme_bw())\n```\n:::\n\n\n\n\nWe will now apply smooth quantile regression on the real data used for COVID-19\nforecasting. The built-in dataset we will use is a subset of JHU daily data on\nstate cases and deaths. This sample data ranges from Dec. 31, 2020 to\nDec. 31, 2021.\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nedf <- epidatasets::covid_case_death_rates\n```\n:::\n\n\n\n\nWe will set the forecast date to be November 30, 2021 so that we can produce\nforecasts for target dates of 1 to 28 days ahead. We construct our test data,\n`tedf` from the days beyond this.\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfd <- as.Date(\"2021-11-30\")\n\ntedf <- edf %>% filter(time_value >= fd)\n```\n:::\n\n\n\n\nWe will use the most recent 3 months worth of data up to the forecast date for\ntraining.\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nedf <- edf %>% filter(time_value < fd, time_value >= fd - 90L)\n```\n:::\n\n\n\n\nAnd for plotting our focus will be on a subset of two states - California and\nUtah.\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ngeos <- c(\"ut\", \"ca\")\n```\n:::\n\n\n\n\nSuppose that our goal with this data is to predict COVID-19 death rates at\nseveral horizons for each state. On day $t$, we want to predict new deaths $y$\nthat are $a = 1,\\dots, 28$ days ahead at locations $j$ using the death rates\nfrom today, 1 week ago, and 2 weeks ago. So for each location, we'll predict the\nmedian (0.5 quantile) for each of the target dates by using\n$$\n\\hat{y}_{j}(t+a) = \\alpha(a) + \\sum_{l = 0}^2 \\beta_{l}(a) y_{j}(t - 7l)\n$$\nwhere $\\beta_{l}(a) = \\sum_{i=1}^d \\theta_{il} h_i(a)$ is the smoothing\nconstraint where ${h_1(a), \\dots, h_d(a)}$ are the set of smooth basis functions\nand $d$ is a hyperparameter that manages the flexibility of $\\beta_{l}(a)$.\nRemember that the goal is to have each $\\beta_{l}(a)$ to be a smooth function of\nthe aheads and that is achieved through imposing the smoothing constraint.\n\nNote that this model is intended to be simple and straightforward. Our only\nmodification to this model is to add case rates as another predictive feature\n(we will leave it to the reader to incorporate additional features beyond this\nand the historical response values). We can update the basic model incorporate\nthe $k = 2$ predictive features of case and death rates for each location j,\n$x_j(t) = (x_{j1}(t), x_{j2}(t))$ as follows:\n\n$$\n\\hat{y}_{j}(t+a) = \\alpha(a) + \\sum_{k = 1}^2 \\sum_{l = 0}^2 \\beta_{kl}(a) x_{jk}(t - 7l)\n$$\nwhere $\\beta_{kl}(a) = \\sum_{i=1}^d \\theta_{ikl} h_i(a)$.\n\nNow, we will create our own forecaster from scratch by building up an\n`epi_workflow` (there is no canned forecaster that is currently available).\nBuilding our own forecaster allows for customization and control over the\npre-processing and post-processing actions we wish to take.\n\nThe pre-processing steps we take in our `epi_recipe` are simply to lag the\npredictor (by 0, 7, and 14 days) and lead the response by the multiple aheads\nspecified by the function user.\n\nThe post-processing layers we add to our `frosting` are nearly as simple. We\nfirst predict, unnest the prediction list-cols, omit NAs from them, and enforce\nthat they are greater than 0.\n\nThe third component of an to an `epi_workflow`, the model, is smooth quantile\nregression, which has three main arguments - the quantiles, aheads, and degree.\n\nAfter creating our `epi_workflow` with these components, we get our test data\nbased on longest lag period and make the predictions.\n\nWe input our forecaster into a function for ease of use.\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsmooth_fc <- function(x, aheads = 1:28, degree = 3L, quantiles = 0.5, fd) {\n rec <- epi_recipe(x) %>%\n step_epi_lag(case_rate, lag = c(0, 7, 14)) %>%\n step_epi_lag(death_rate, lag = c(0, 7, 14)) %>%\n step_epi_ahead(death_rate, ahead = aheads)\n\n f <- frosting() %>%\n layer_predict() %>%\n layer_unnest(.pred) %>%\n layer_naomit(distn) %>%\n layer_add_forecast_date() %>%\n layer_threshold(distn)\n\n ee <- smooth_quantile_reg(\n quantile_levels = quantiles,\n outcome_locations = aheads,\n degree = degree\n )\n\n ewf <- epi_workflow(rec, ee, f)\n\n the_fit <- ewf %>% fit(x)\n\n latest <- get_test_data(rec, x)\n\n preds <- predict(the_fit, new_data = latest) %>%\n mutate(forecast_date = fd, target_date = fd + ahead) %>%\n select(geo_value, target_date, distn, ahead) %>%\n pivot_quantiles_wider(distn)\n\n preds\n}\n```\n:::\n\n::: {.cell}\n\n:::\n\n\n\n\nNotice that we allow the function user to specify the aheads, degree, and\nquantile as they may want to change these parameter values. We also allow for\ninput of the forecast date as we fixed that at the onset of this demonstration.\n\nWe now can produce smooth quantile regression predictions for our problem:\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsmooth_preds <- smooth_fc(edf, fd = fd)\nsmooth_preds\n```\n:::\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 1,568 × 4\n#> geo_value target_date ahead `0.5`\n#> \n#> 1 ak 2021-12-01 1 0.323\n#> 2 ak 2021-12-02 2 0.347\n#> 3 ak 2021-12-03 3 0.369\n#> 4 ak 2021-12-04 4 0.389\n#> 5 ak 2021-12-05 5 0.407\n#> 6 ak 2021-12-06 6 0.422\n#> 7 ak 2021-12-07 7 0.436\n#> 8 ak 2021-12-08 8 0.448\n#> 9 ak 2021-12-09 9 0.458\n#> 10 ak 2021-12-10 10 0.465\n#> # ℹ 1,558 more rows\n```\n\n\n:::\n:::\n\n\n\n\n\nMost often, we're not going to want to limit ourselves to just predicting the\nmedian value as there is uncertainty about the predictions, so let's try to\npredict several different quantiles in addition to the median:\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nseveral_quantiles <- c(.1, .25, .5, .75, .9)\nsmooth_preds <- smooth_fc(edf, quantiles = several_quantiles, fd = fd)\nsmooth_preds\n```\n:::\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 1,568 × 8\n#> geo_value target_date ahead `0.1` `0.25` `0.5` `0.75` `0.9`\n#> \n#> 1 ak 2021-12-01 1 0.286 0.315 0.323 0.350 0.434\n#> 2 ak 2021-12-02 2 0.292 0.331 0.347 0.383 0.474\n#> 3 ak 2021-12-03 3 0.298 0.345 0.369 0.414 0.511\n#> 4 ak 2021-12-04 4 0.303 0.358 0.389 0.442 0.544\n#> 5 ak 2021-12-05 5 0.307 0.369 0.407 0.467 0.575\n#> 6 ak 2021-12-06 6 0.310 0.378 0.422 0.490 0.603\n#> 7 ak 2021-12-07 7 0.313 0.387 0.436 0.511 0.629\n#> 8 ak 2021-12-08 8 0.314 0.393 0.448 0.529 0.651\n#> 9 ak 2021-12-09 9 0.315 0.398 0.458 0.544 0.670\n#> 10 ak 2021-12-10 10 0.315 0.402 0.465 0.557 0.687\n#> # ℹ 1,558 more rows\n```\n\n\n:::\n:::\n\n\n\n\nWe can see that we have different columns for the different quantile\npredictions.\n\nLet's visualize these results for the sample of two states. We will create a\nsimple plotting function, under which the median predictions are an orange line\nand the surrounding quantiles are blue bands around this. For comparison, we\nwill include the actual values over time as a black line.\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nplot_preds <- function(preds, geos_to_plot = NULL, train_test_dat, fd) {\n if (!is.null(geos_to_plot)) {\n preds <- preds %>% filter(geo_value %in% geos_to_plot)\n train_test_dat <- train_test_dat %>% filter(geo_value %in% geos_to_plot)\n }\n\n ggplot(preds) +\n geom_ribbon(aes(target_date, ymin = `0.1`, ymax = `0.9`),\n fill = \"cornflowerblue\", alpha = .8\n ) +\n geom_ribbon(aes(target_date, ymin = `0.25`, ymax = `0.75`),\n fill = \"#00488E\", alpha = .8\n ) +\n geom_line(data = train_test_dat, aes(time_value, death_rate)) +\n geom_line(aes(target_date, `0.5`), color = \"orange\") +\n geom_vline(xintercept = fd) +\n facet_wrap(~geo_value) +\n scale_x_date(name = \"\", date_labels = \"%b %Y\", date_breaks = \"2 months\") +\n ylab(\"Deaths per 100K inhabitants\")\n}\n```\n:::\n\n\n\n\nSince we would like to plot the actual death rates for these states over time,\nwe bind the training and testing data together and input this into our plotting\nfunction as follows:\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nplot_preds(smooth_preds, geos, bind_rows(tedf, edf), fd)\n```\n\n::: {.cell-output-display}\n![](smooth-qr_files/figure-html/unnamed-chunk-13-1.png){width=100%}\n:::\n:::\n\n\n\n\nWe can see that the predictions are smooth curves for each state, as expected\nwhen using smooth quantile regression. In addition while the curvature of the\nforecasts matches that of the truth, the forecasts do not look remarkably\naccurate.\n\n## Varying the degrees parameter\n\nWe can test the impact of different degrees by using the `map()` function.\nNoting that this may take some time to run, let's try out all degrees from 1\nto 7:\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsmooth_preds_list <- map(1:7, function(x) {\n smooth_fc(\n edf,\n degree = x,\n quantiles = c(.1, .25, .5, .75, .9),\n fd = fd\n ) %>%\n mutate(degree = x)\n}) %>% list_rbind()\n```\n:::\n\n\n\n\nOne way to quantify the impact of these on the forecasting is to look at the\nmean absolute error (MAE) or mean squared error (MSE) over the degrees. We can\nselect the degree that results in the lowest MAE.\n\nSince the MAE compares the predicted values to the actual values, we will first\njoin the test data to the predicted data for our comparisons:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntedf_sub <- tedf %>%\n rename(target_date = time_value, actual = death_rate) %>%\n select(geo_value, target_date, actual)\n```\n:::\n\n\n\n\nAnd then compute the MAE for each of the degrees:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsmooth_preds_df_deg <- smooth_preds_list %>%\n left_join(tedf_sub, by = c(\"geo_value\", \"target_date\")) %>%\n group_by(degree) %>%\n mutate(error = abs(`0.5` - actual)) %>%\n summarise(mean = mean(error))\n\n# Arrange the MAE from smallest to largest\nsmooth_preds_df_deg %>% arrange(mean)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> # A tibble: 7 × 2\n#> degree mean\n#> \n#> 1 3 0.201\n#> 2 2 0.202\n#> 3 5 0.203\n#> 4 4 0.203\n#> 5 6 0.203\n#> 6 7 0.203\n#> 7 1 0.206\n```\n\n\n:::\n:::\n\n\n\n\nInstead of just looking at the raw numbers, let's create a simple line plot to\nvisualize how the MAE changes over degrees for this data:\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(smooth_preds_df_deg, aes(degree, mean)) +\n geom_line() +\n xlab(\"Degrees of freedom\") +\n ylab(\"Mean MAE\")\n```\n\n::: {.cell-output-display}\n![](smooth-qr_files/figure-html/unnamed-chunk-17-1.png){width=100%}\n:::\n:::\n\n\n\n\nWe can see that the degree that results in the lowest MAE is 3. Hence, we could\npick this degree for future forecasting work on this data.\n\n## A brief comparison between smoothing and no smoothing\n\nNow, we will briefly compare the results from using smooth quantile regression\nto those obtained without smoothing. The latter approach amounts to ordinary\nquantile regression to get predictions for the intended target date. The main\ndrawback is that it ignores the fact that the responses all represent the same\nsignal, just for different ahead values. In contrast, the smooth quantile\nregression approach utilizes this information about the data structure - the\nfact that the aheads in are not be independent of each other, but that they are\nnaturally related over time by a smooth curve.\n\nTo get the basic quantile regression results we can utilize the forecaster that\nwe've already built. We can simply set the degree to be the number of ahead\nvalues to re-run the code without smoothing.\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nbaseline_preds <- smooth_fc(\n edf,\n degree = 28L, quantiles = several_quantiles, fd = fd\n)\n```\n:::\n\n\n\n\nAnd we can produce the corresponding plot to inspect the predictions obtained\nunder the baseline model:\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nplot_preds(baseline_preds, geos, bind_rows(tedf, edf), fd)\n```\n\n::: {.cell-output-display}\n![](smooth-qr_files/figure-html/unnamed-chunk-19-1.png){width=100%}\n:::\n:::\n\n\n\n\nUnlike for smooth quantile regression, the resulting forecasts are not smooth\ncurves, but rather jagged and irregular in shape.\n\nFor a more formal comparison between the two approaches, we could compare the\ntest performance in terms of accuracy through calculating either the, MAE or\nMSE, where the performance measure of choice can be calculated over over all\ntimes and locations for each ahead value\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nbaseline_preds_mae_df <- baseline_preds %>%\n left_join(tedf_sub, by = c(\"geo_value\", \"target_date\")) %>%\n group_by(ahead) %>%\n mutate(error = abs(`0.5` - actual)) %>%\n summarise(mean = mean(error)) %>%\n mutate(type = \"baseline\")\n\nsmooth_preds_mae_df <- smooth_preds %>%\n left_join(tedf_sub, by = c(\"geo_value\", \"target_date\")) %>%\n group_by(ahead) %>%\n mutate(error = abs(`0.5` - actual)) %>%\n summarise(mean = mean(error)) %>%\n mutate(type = \"smooth\")\n\npreds_mae_df <- bind_rows(baseline_preds_mae_df, smooth_preds_mae_df)\n\nggplot(preds_mae_df, aes(ahead, mean, color = type)) +\n geom_line() +\n xlab(\"Ahead\") +\n ylab(\"Mean MAE\") +\n scale_color_manual(values = c(\"darkred\", \"#063970\"), name = \"\")\n```\n\n::: {.cell-output-display}\n![](smooth-qr_files/figure-html/unnamed-chunk-20-1.png){width=100%}\n:::\n:::\n\n\n\n\nor over all aheads, times, and locations for a single numerical summary.\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmean(baseline_preds_mae_df$mean)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 0.203589\n```\n\n\n:::\n\n```{.r .cell-code}\nmean(smooth_preds_mae_df$mean)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 0.2011083\n```\n\n\n:::\n:::\n\n\n\n\nThe former shows that forecasts for the immediate future and for the distant\nfuture are more inaccurate for both models under consideration. The latter shows\nthat the smooth quantile regression model and baseline models perform very\nsimilarly overall, with the smooth quantile regression model only slightly\nbeating the baseline model in terms of overall average MAE.\n\nOne other commonly used metric is the Weighted Interval Score\n(WIS, [Bracher et al., 2021](https://arxiv.org/pdf/2005.12881.pdf)),\nwhich a scoring rule that is based on the population quantiles. The point is to\nscore the interval, whereas MAE only evaluates the accuracy of the point\nforecast.\n\nLet $F$ be a forecast composed of predicted quantiles $q_{\\tau}$ for the set of\nquantile levels $\\tau$. Then, in terms of the predicted quantiles, the WIS for\ntarget variable $Y$ is represented as follows\n([McDonald etal., 2021](https://www.pnas.org/doi/full/10.1073/pnas.2111453118)):\n\n$$\nWIS(F, Y) = 2 \\sum_{\\tau} \\phi_{\\tau} (Y - q_{\\tau})\n$$\nwhere $\\phi_{\\tau}(x) = \\tau |x|$ for $x \\geq 0$\nand$\\phi_{\\tau}(x) = (1 - \\tau) |x|$ for $x < 0$.\n\nThis form is general as it can accommodate both symmetric and asymmetric\nquantile levels. If the quantile levels are symmetric, then we can alternatively\nexpress the WIS as a collection of central prediction intervals\n($\\ell_{\\alpha}, u_{\\alpha}$) parametrized by the exclusion probability\n$\\alpha$:\n\n$$\nWIS(F, Y) = \\sum_{\\alpha} \\{ (u_{\\alpha} - \\ell_{\\alpha}) + 2 \\cdot \\text{dist}(Y, [\\ell_{\\alpha}, u_{\\alpha}]) \\}\n$$\nwhere $\\text{dist}(a,S)$ is the smallest distance between point $a$ and an\nelement of set $S$.\n\nWhile we implement the former representation, we mention this form because it\nshows the that the score can be decomposed into the addition of a sharpness\ncomponent (first term in the summand) and an under/overprediction component\n(second term in the summand). This alternative representation is useful because\nfrom it, we more easily see the major limitation to the WIS, which is that the\nscore tends to prioritize sharpness (how wide the interval is) relative to\ncoverage (if the interval contains the truth).\n\nNow, we write a simple function for the first representation of the score that\nis compatible with the latest version of `epipredict` (adapted from the\ncorresponding function in\n[smoothmpf-epipredict](https://github.com/dajmcdon/smoothmpf-epipredict)). The\ninputs for it are the actual and predicted values and the quantile levels.\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwis_dist_quantile <- function(actual, values, quantile_levels) {\n 2 * mean(pmax(\n quantile_levels * (actual - values),\n (1 - quantile_levels) * (values - actual),\n na.rm = TRUE\n ))\n}\n```\n:::\n\n\n\n\nNext, we apply the `wis_dist_quantile` function to get a WIS score for each\nstate on each target date. We then compute the mean WIS for each ahead value\nover all of the states. The results for each of the smooth and baseline\nforecasters are shown in a similar style line plot as we chose for MAE:\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsmooth_preds_wis_df <- smooth_preds %>%\n left_join(tedf_sub, by = c(\"geo_value\", \"target_date\")) %>%\n rowwise() %>%\n mutate(wis = wis_dist_quantile(\n actual, c(`0.1`, `0.25`, `0.5`, `0.75`, `0.9`),\n several_quantiles\n )) %>%\n group_by(ahead) %>%\n summarise(mean = mean(wis)) %>%\n mutate(type = \"smooth\")\n\nbaseline_preds_wis_df <- baseline_preds %>%\n left_join(tedf_sub, by = c(\"geo_value\", \"target_date\")) %>%\n rowwise() %>%\n mutate(wis = wis_dist_quantile(\n actual, c(`0.1`, `0.25`, `0.5`, `0.75`, `0.9`),\n several_quantiles\n )) %>%\n group_by(ahead) %>%\n summarise(mean = mean(wis)) %>%\n mutate(type = \"baseline\")\n\npreds_wis_df <- bind_rows(smooth_preds_wis_df, baseline_preds_wis_df)\n\nggplot(preds_wis_df, aes(ahead, mean, color = type)) +\n geom_line() +\n xlab(\"Ahead\") +\n ylab(\"Mean WIS\") +\n scale_color_manual(values = c(\"darkred\", \"#063970\"), name = \"\")\n```\n\n::: {.cell-output-display}\n![](smooth-qr_files/figure-html/unnamed-chunk-23-1.png){width=100%}\n:::\n:::\n\n\n\n\nThe results are consistent with what we saw for MAE: The forecasts for the near\nand distant future tend to be inaccurate for both models. The smooth quantile\nregression model only slightly outperforms the baseline model.\n\nThough averaging the WIS score over location and time tends to be the primary\naggregation scheme used in evaluation and model comparisons (see, for example,\n[McDonald et al., 2021](https://www.pnas.org/doi/full/10.1073/pnas.2111453118)),\nwe can also obtain a single numerical summary by averaging over the aheads,\ntimes, and locations:\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmean(baseline_preds_wis_df$mean)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 0.1676765\n```\n\n\n:::\n\n```{.r .cell-code}\nmean(smooth_preds_wis_df$mean)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n#> [1] 0.1647412\n```\n\n\n:::\n:::\n\n\n\n\nOverall, both perspectives agree that the smooth quantile regression model tends\nto perform only slightly better than the baseline model in terms of average WIS,\nillustrating the difficulty of this forecasting problem.\n\n# What we've learned in a nutshell\n\nSmooth quantile regression is used in multi-period forecasting for predicting\nseveral horizons simultaneously with a single smooth curve. It operates under\nthe key assumption that the future of the response can be approximated well by a\nsmooth curve.\n\n# Attribution\n\nThe information presented on smooth quantile regression is from\n[Tuzhilina et al., 2022](https://arxiv.org/abs/2202.09723).\n", + "supporting": [ + "smooth-qr_files" + ], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": {}, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/_freeze/smooth-qr/figure-html/unnamed-chunk-13-1.png b/_freeze/smooth-qr/figure-html/unnamed-chunk-13-1.png new file mode 100644 index 0000000..cf80429 Binary files /dev/null and b/_freeze/smooth-qr/figure-html/unnamed-chunk-13-1.png differ diff --git a/_freeze/smooth-qr/figure-html/unnamed-chunk-17-1.png b/_freeze/smooth-qr/figure-html/unnamed-chunk-17-1.png new file mode 100644 index 0000000..3db2a65 Binary files /dev/null and b/_freeze/smooth-qr/figure-html/unnamed-chunk-17-1.png differ diff --git a/_freeze/smooth-qr/figure-html/unnamed-chunk-19-1.png b/_freeze/smooth-qr/figure-html/unnamed-chunk-19-1.png new file mode 100644 index 0000000..2874fb0 Binary files /dev/null and b/_freeze/smooth-qr/figure-html/unnamed-chunk-19-1.png differ diff --git a/_freeze/smooth-qr/figure-html/unnamed-chunk-20-1.png b/_freeze/smooth-qr/figure-html/unnamed-chunk-20-1.png new file mode 100644 index 0000000..909422b Binary files /dev/null and b/_freeze/smooth-qr/figure-html/unnamed-chunk-20-1.png differ diff --git a/_freeze/smooth-qr/figure-html/unnamed-chunk-23-1.png b/_freeze/smooth-qr/figure-html/unnamed-chunk-23-1.png new file mode 100644 index 0000000..652a104 Binary files /dev/null and b/_freeze/smooth-qr/figure-html/unnamed-chunk-23-1.png differ diff --git a/_quarto.yml b/_quarto.yml index 49e58ee..1322d8b 100644 --- a/_quarto.yml +++ b/_quarto.yml @@ -44,6 +44,7 @@ book: - tidymodels-regression.qmd - preprocessing-and-models.qmd - sliding-forecasters.qmd + - smooth-qr.qmd - references.qmd bibliography: [packages.bib, references.bib] diff --git a/arx-classifier.qmd b/arx-classifier.qmd new file mode 100644 index 0000000..6d9bee4 --- /dev/null +++ b/arx-classifier.qmd @@ -0,0 +1,252 @@ +# ARX Classifier basics +The `arx_classifier()` is an autoregressive classification model for `epi_df` +data that is used to predict a discrete class for each case under consideration. +It is a direct forecaster in that it estimates the classes at a specific horizon +or ahead value. + +To get a sense of how the `arx_classifier()` works, let's consider a simple +example with minimal inputs. For this, we will use the built-in +`covid_case_death_rates` that contains confirmed COVID-19 cases and deaths from +JHU CSSE for all states over Dec 31, 2020 to Dec 31, 2021. From this, we'll take +a subset of data for five states over June 4, 2021 to December 31, 2021. Our +objective is to predict whether the case rates are increasing when considering +the 0, 7 and 14 day case rates: + +```{r} +jhu <- covid_case_death_rates %>% + filter( + time_value >= "2021-06-04", + time_value <= "2021-12-31", + geo_value %in% c("ca", "fl", "tx", "ny", "nj") + ) + +out <- arx_classifier(jhu, outcome = "case_rate", predictors = "case_rate") + +out$predictions +``` + +The key takeaway from the predictions is that there are two prediction classes: +(-Inf, 0.25] and (0.25, Inf). This is because for our goal of classification +the classes must be discrete. The discretization of the real-valued outcome is +controlled by the `breaks` argument, which defaults to 0.25. Such breaks will be +automatically extended to cover the entire real line. For example, the default +break of 0.25 is silently extended to breaks = c(-Inf, .25, Inf) and, therefore, +results in two classes: [-Inf, 0.25] and (0.25, Inf). These two classes are +used to discretize the outcome. The conversion of the outcome to such classes is +handled internally. So if discrete classes already exist for the outcome in the +`epi_df`, then we recommend to code a classifier from scratch using the +`epi_workflow` framework for more control. + +The `trainer` is a `parsnip` model describing the type of estimation such that +`mode = "classification"` is enforced. The two typical trainers that are used +are `parsnip::logistic_reg()` for two classes or `parsnip::multinom_reg()` for +more than two classes. + +```{r} +workflows::extract_spec_parsnip(out$epi_workflow) +``` + +From the parsnip model specification, we can see that the trainer used is +logistic regression, which is expected for our binary outcome. More complicated +trainers like `parsnip::naive_Bayes()` or `parsnip::rand_forest()` may also be +used (however, we will stick to the basics in this gentle introduction to the +classifier). + +If you use the default trainer of logistic regression for binary classification +and you decide against using the default break of 0.25, then you should only +input one break so that there are two classification bins to properly +dichotomize the outcome. For example, let's set a break of 0.5 instead of +relying on the default of 0.25. We can do this by passing 0.5 to the `breaks` +argument in `arx_class_args_list()` as follows: + +```{r} +out_break_0.5 <- arx_classifier( + jhu, + outcome = "case_rate", + predictors = "case_rate", + args_list = arx_class_args_list( + breaks = 0.5 + ) +) + +out_break_0.5$predictions +``` +Indeed, we can observe that the two `.pred_class` are now (-Inf, 0.5] and (0.5, +Inf). See `help(arx_class_args_list)` for other available modifications. + +Additional arguments that may be supplied to `arx_class_args_list()` include the +expected `lags` and `ahead` arguments for an autoregressive-type model. These +have default values of 0, 7, and 14 days for the lags of the predictors and 7 +days ahead of the forecast date for predicting the outcome. There is also +`n_training` to indicate the upper bound for the number of training rows per +key. If you would like some practice with using this, then remove the filtering +command to obtain data within "2021-06-04" and "2021-12-31" and instead set +`n_training` to be the number of days between these two dates, inclusive of the +end points. The end results should be the same. In addition to `n_training`, +there are `forecast_date` and `target_date` to specify the date that the +forecast is created and intended, respectively. We will not dwell on such +arguments here as they are not unique to this classifier or absolutely essential +to understanding how it operates. The remaining arguments will be discussed +organically, as they are needed to serve our purposes. For information on any +remaining arguments that are not discussed here, please see the function +documentation for a complete list and their definitions. + +## Example of using the ARX classifier + +Now, to demonstrate the power and utility of this built-in arx classifier, we +will loosely adapt the classification example that was written from scratch in +`vignette("preprocessing-and-models")`. However, to keep things simple and not +merely a direct translation, we will only consider two prediction categories and +leave the extension to three as an exercise for the reader. + +To motivate this example, a major use of autoregressive classification models is +to predict upswings or downswings like in hotspot prediction models to +anticipate the direction of the outcome (see [McDonald, Bien, Green, Hu, et al. +(2021)](https://www.pnas.org/doi/full/10.1073/pnas.2111453118) for more on +these). In our case, one simple question that such models can help answer is... +Do we expect that the future will have increased case rates or not relative to +the present? + +To answer this question, we can create a predictive model for upswings and +downswings of case rates rather than the raw case rates themselves. For this +situation, our target is to predict whether there is an increase in case rates +or not. Following +[McDonald, Bien, Green, Hu, et al.(2021)](https://www.pnas.org/doi/full/10.1073/pnas.2111453118), +we look at the +relative change between $Y_{l,t}$ and $Y_{l, t+a}$, where the former is the case +rate at location $l$ at time $t$ and the latter is the rate for that location at +time $t+a$. Using these variables, we define a categorical response variable +with two classes + +$$\begin{align} +Z_{l,t} = \left\{\begin{matrix} +\text{up,} & \text{if } Y_{l,t}^\Delta > 0.25\\ +\text{not up,} & \text{otherwise} +\end{matrix}\right. +\end{align}$$ +where $Y_{l,t}^\Delta = (Y_{l, t} - Y_{l, t-7} / Y_{l, t-7}$. If $Y_{l,t}^\Delta$ > 0.25, meaning that the number of new cases over the week has increased by over 25\%, then $Z_{l,t}$ is up. This is the criteria for location $l$ to be a hotspot at time $t$. On the other hand, if $Y_{l,t}^\Delta$ \leq 0.25$, then then $Z_{l,t}$ is categorized as not up, meaning that there has not been a >25\% increase in the new cases over the past week. + +The logistic regression model we use to predict this binary response can be +considered to be a simplification of the multinomial regression model presented +in `vignette("preprocessing-and-models")`: + +$$\begin{align} +\pi_{\text{up}}(x) &= Pr(Z_{l, t} = \text{up}|x) = \frac{e^{g_{\text{up}}(x)}}{1 + e^{g_{\text{up}}(x)}}, \\ +\pi_{\text{not up}}(x)&= Pr(Z_{l, t} = \text{not up}|x) = 1 - Pr(Z_{l, t} = \text{up}|x) = \frac{1}{1 + e^{g_{\text{up}}(x)}} +\end{align}$$ +where + +$$ +g_{\text{up}}(x) = \log\left ( \frac{\Pr(Z_{l, t} = \text{up} \vert x)}{\Pr(Z_{l, t} = \text{not up} \vert x)} \right ) = \beta_{10} + \beta_{11}Y_{l,t}^\Delta + \beta_{12}Y_{l,t-7}^\Delta + \beta_{13}Y_{l,t-14}^\Delta. +$$ + +Now then, we will operate on the same subset of the `covid_case_death_rates` +that we used in our above example. This time, we will use it to investigate +whether the number of newly reported cases over the past 7 days has increased by +at least 25% compared to the preceding week for our sample of states. + +Notice that by using the `arx_classifier()` function we've completely eliminated +the need to manually categorize the response variable and implement +pre-processing steps, which was necessary in +`vignette("preprocessing-and-models")`. + +```{r} +log_res <- arx_classifier( + jhu, + outcome = "case_rate", + predictors = "case_rate", + args_list = arx_class_args_list( + breaks = 0.25 / 7 # division by 7 gives weekly not daily + ) +) + +log_res$epi_workflow +``` + +Comparing the pre-processing steps for this to those in the other vignette, we +can see that they are not precisely the same, but they cover the same essentials +of transforming `case_rate` to the growth rate scale (`step_growth_rate()`), +lagging the predictors (`step_epi_lag()`), leading the response +(`step_epi_ahead()`), which are both constructed from the growth rates, and +constructing the binary classification response variable (`step_mutate()`). + +On this topic, it is important to understand that we are not actually concerned +about the case values themselves. Rather we are concerned whether the quantity +of cases in the future is a lot larger than that in the present. For this +reason, the outcome does not remain as cases, but rather it is transformed by +using either growth rates (as the predictors and outcome in our example are) or +lagged differences. While the latter is closer to the requirements for the +[2022-23 CDC Flusight Hospitalization Experimental Target](https://github.com/cdcepi/Flusight-forecast-data/blob/745511c436923e1dc201dea0f4181f21a8217b52/data-experimental/README.md), +and it is conceptually easy to understand because it is simply the change of the +value for the horizon, it is not the default. The default is `growth_rate`. One +reason for this choice is because the growth rate is on a rate scale, not on the +absolute scale, so it fosters comparability across locations without any +conscious effort (on the other hand, when using the `lag_difference` one would +need to take care to operate on rates per 100k and not raw counts). We utilize +`epiprocess::growth_rate()` to create the outcome using some of the additional +arguments. One important argument for the growth rate calculation is the +`method`. Only `rel_change` for relative change should be used as the method +because the test data is the only data that is accessible and the other methods +require access to the training data. + +The other optional arguments for controlling the growth rate calculation (that +can be inputted as `additional_gr_args`) can be found in the documentation for +`epiprocess::growth_rate()` and the related +`vignette("growth_rate", package = "epiprocess")`. + +### Visualizing the results + +To visualize the prediction classes across the states for the target date, we +can plot our results as a heatmap. However, if we were to plot the results for +only one target date, like our 7-day ahead predictions, then that would be a +pretty sad heatmap (which would look more like a bar chart than a heatmap)... So +instead of doing that, let's get predictions for several aheads and plot a +heatmap across the target dates. To get the predictions across several ahead +values, we will use the map function in the same way that we did in other +vignettes: + +```{r} +multi_log_res <- map(1:40, ~ arx_classifier( + jhu, + outcome = "case_rate", + predictors = "case_rate", + args_list = arx_class_args_list( + breaks = 0.25 / 7, # division by 7 gives weekly not daily + ahead = .x + ) +)$predictions) %>% purrr::list_rbind() +``` + +We can plot a the heatmap of the results over the aheads to see if there's +anything novel or interesting to take away: + +```{r} +ggplot2::ggplot(multi_log_res, aes(target_date, geo_value, fill = .pred_class)) + + ggplot2::geom_tile() + + ggplot2::ylab("State") + + ggplot2::xlab("Target date") + + ggplot2::scale_fill_brewer(palette = "Set1") +``` + +While there is a bit of variability near to the end, we can clearly see that +there are upswings for all states starting from the beginning of January 2022, +which we can recall was when there was a massive spike in cases for many states. +So our results seem to align well with what actually happened at the beginning +of January 2022. + +## A brief reflection + +The most noticeable benefit of using the `arx_classifier()` function is the +simplification and reduction of the manual implementation of the classifier from +about 30 down to 3 lines. However, as we noted before, the trade-off for +simplicity is control over the precise pre-processing, post-processing, and +additional features embedded in the coding of a classifier. So the good thing is +that `epipredict` provides both - a built-in `arx_classifer()` or the means to +implement your own classifier from scratch by using the `epi_workflow` +framework. And which you choose will depend on the circumstances. Our advice is +to start with using the built-in classifier for ostensibly simple projects and +begin to implement your own when the modelling project takes a complicated turn. +To get some practice on coding up a classifier by hand, consider translating +this binary classification model example to an `epi_workflow`, akin to that in +`vignette("preprocessing-and-models")`. + diff --git a/renv.lock b/renv.lock index 59ee76b..fbb9919 100644 --- a/renv.lock +++ b/renv.lock @@ -1,6 +1,6 @@ { "R": { - "Version": "4.3.3", + "Version": "4.4.1", "Repositories": [ { "Name": "RSPM", @@ -11,10 +11,10 @@ "Packages": { "BH": { "Package": "BH", - "Version": "1.84.0-0", + "Version": "1.87.0-1", "Source": "Repository", "Repository": "RSPM", - "Hash": "a8235afbcd6316e6e91433ea47661013" + "Hash": "468d9a03ba57f22ebde50060fd13ba9f" }, "DBI": { "Package": "DBI", @@ -50,18 +50,18 @@ }, "KernSmooth": { "Package": "KernSmooth", - "Version": "2.23-24", + "Version": "2.23-26", "Source": "Repository", "Repository": "RSPM", "Requirements": [ "R", "stats" ], - "Hash": "9f33a1ee37bbe8919eb2ec4b9f2473a5" + "Hash": "2fb39782c07b5ad422b0448ae83f64c4" }, "MASS": { "Package": "MASS", - "Version": "7.3-60", + "Version": "7.3-64", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -72,7 +72,7 @@ "stats", "utils" ], - "Hash": "a56a6365b3fa73293ea8d084be0d9bb0" + "Hash": "49d2d8090b74c1179df1aff16201caf8" }, "MMWRweek": { "Package": "MMWRweek", @@ -83,9 +83,9 @@ }, "Matrix": { "Package": "Matrix", - "Version": "1.6-5", + "Version": "1.7-2", "Source": "Repository", - "Repository": "CRAN", + "Repository": "RSPM", "Requirements": [ "R", "grDevices", @@ -96,17 +96,17 @@ "stats", "utils" ], - "Hash": "8c7115cd3a0e048bda2a7cd110549f7a" + "Hash": "f6a81550f166acbe2cd5229e9ca079b9" }, "R6": { "Package": "R6", - "Version": "2.5.1", + "Version": "2.6.1", "Source": "Repository", - "Repository": "CRAN", + "Repository": "RSPM", "Requirements": [ "R" ], - "Hash": "470851b6d5d0ac559e9d01bb352b4021" + "Hash": "d4335fe7207f1c01ab8c41762f5840d4" }, "RColorBrewer": { "Package": "RColorBrewer", @@ -120,14 +120,14 @@ }, "Rcpp": { "Package": "Rcpp", - "Version": "1.0.13", + "Version": "1.0.14", "Source": "Repository", "Repository": "RSPM", "Requirements": [ "methods", "utils" ], - "Hash": "f27411eb6d9c3dada5edd444b8416675" + "Hash": "e7bdd9ee90e96921ca8a0f1972d66682" }, "RcppEigen": { "Package": "RcppEigen", @@ -165,7 +165,7 @@ }, "anytime": { "Package": "anytime", - "Version": "0.3.9", + "Version": "0.3.11", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -173,17 +173,17 @@ "R", "Rcpp" ], - "Hash": "74a64813f17b492da9c6afda6b128e3d" + "Hash": "a1db66ce27bcfb58cb67ad4e04d48b79" }, "askpass": { "Package": "askpass", - "Version": "1.2.0", + "Version": "1.2.1", "Source": "Repository", "Repository": "RSPM", "Requirements": [ "sys" ], - "Hash": "cad6cf7f1d5f6e906700b9d3e718c796" + "Hash": "c39f4155b3ceb1a9a2799d700fbd4b6a" }, "backports": { "Package": "backports", @@ -207,7 +207,7 @@ }, "bayestestR": { "Package": "bayestestR", - "Version": "0.14.0", + "Version": "0.15.2", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -219,31 +219,32 @@ "stats", "utils" ], - "Hash": "71e7da5d38487173de67a1f0d763ceef" + "Hash": "768acac338a7a17a063e8bb88c58e611" }, "bit": { "Package": "bit", - "Version": "4.5.0", + "Version": "4.5.0.1", "Source": "Repository", "Repository": "RSPM", "Requirements": [ "R" ], - "Hash": "5dc7b2677d65d0e874fc4aaf0e879987" + "Hash": "f89f074e0e49bf1dbe3eba0a15a91476" }, "bit64": { "Package": "bit64", - "Version": "4.5.2", + "Version": "4.6.0-1", "Source": "Repository", "Repository": "RSPM", "Requirements": [ "R", "bit", + "graphics", "methods", "stats", "utils" ], - "Hash": "e84984bf5f12a18628d9a02322128dfd" + "Hash": "4f572fbc586294afff277db583b9060f" }, "blob": { "Package": "blob", @@ -279,7 +280,7 @@ }, "bslib": { "Package": "bslib", - "Version": "0.8.0", + "Version": "0.9.0", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -297,7 +298,7 @@ "rlang", "sass" ], - "Hash": "b299c6741ca9746fb227debcb0f9fb6c" + "Hash": "70a6489cc254171fb9b4a7f130f44dca" }, "cachem": { "Package": "cachem", @@ -349,27 +350,27 @@ }, "class": { "Package": "class", - "Version": "7.3-22", + "Version": "7.3-23", "Source": "Repository", - "Repository": "CRAN", + "Repository": "RSPM", "Requirements": [ "MASS", "R", "stats", "utils" ], - "Hash": "f91f6b29f38b8c280f2b9477787d4bb2" + "Hash": "d0cb9cc838c3b43560bd958fc4317fdc" }, "cli": { "Package": "cli", - "Version": "3.6.3", + "Version": "3.6.4", "Source": "Repository", "Repository": "RSPM", "Requirements": [ "R", "utils" ], - "Hash": "b21916dd77a27642b447374a5d30ecf3" + "Hash": "491c34d3d9dd0d2fe13d9f278bb90795" }, "clipr": { "Package": "clipr", @@ -383,7 +384,7 @@ }, "clock": { "Package": "clock", - "Version": "0.7.1", + "Version": "0.7.2", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -395,7 +396,7 @@ "tzdb", "vctrs" ], - "Hash": "3dcaebd52554438d12989e5061e15de8" + "Hash": "64780087d11447eb617fc566f26bee15" }, "codetools": { "Package": "codetools", @@ -436,7 +437,7 @@ }, "correlation": { "Package": "correlation", - "Version": "0.8.5", + "Version": "0.8.6", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -448,17 +449,17 @@ "parameters", "stats" ], - "Hash": "0995955fd59a01caf80918913bc5066c" + "Hash": "8b6512d8f60685736ee3bafdc292190d" }, "cpp11": { "Package": "cpp11", - "Version": "0.5.0", + "Version": "0.5.1", "Source": "Repository", "Repository": "RSPM", "Requirements": [ "R" ], - "Hash": "91570bba75d0c9d3f1040c835cee8fba" + "Hash": "9df43854f1c84685d095ed6270b52387" }, "crayon": { "Package": "crayon", @@ -474,7 +475,7 @@ }, "credentials": { "Package": "credentials", - "Version": "2.0.1", + "Version": "2.0.2", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -484,32 +485,32 @@ "openssl", "sys" ], - "Hash": "c7844b32098dcbd1c59cbd8dddb4ecc6" + "Hash": "09fd631e607a236f8cc7f9604db32cb8" }, "curl": { "Package": "curl", - "Version": "5.2.3", + "Version": "6.2.1", "Source": "Repository", "Repository": "RSPM", "Requirements": [ "R" ], - "Hash": "d91263322a58af798f6cf3b13fd56dde" + "Hash": "d65efa2a203bb99faa78690eee928f30" }, "data.table": { "Package": "data.table", - "Version": "1.16.0", + "Version": "1.17.0", "Source": "Repository", "Repository": "RSPM", "Requirements": [ "R", "methods" ], - "Hash": "fb24e05d4a91d8b1c7ff8e284bde834a" + "Hash": "d06cd1a942bb17b607df7d57048d6840" }, "datawizard": { "Package": "datawizard", - "Version": "0.12.3", + "Version": "1.0.0", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -518,7 +519,7 @@ "stats", "utils" ], - "Hash": "611537168bbb78b57720de109ec1ad19" + "Hash": "8a37ad13b9b0db70bf32d6eb2fc0a2aa" }, "dbplyr": { "Package": "dbplyr", @@ -576,7 +577,7 @@ }, "dials": { "Package": "dials", - "Version": "1.3.0", + "Version": "1.4.0", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -597,7 +598,7 @@ "vctrs", "withr" ], - "Hash": "f2fbe4e90fab23fc1f95bffcd3662878" + "Hash": "0a10c1d456fc2b9ac84b12e5ce60cf73" }, "diffobj": { "Package": "diffobj", @@ -702,7 +703,7 @@ }, "effectsize": { "Package": "effectsize", - "Version": "0.8.9", + "Version": "1.0.0", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -715,7 +716,7 @@ "stats", "utils" ], - "Hash": "7aceb5e07b6d48171c6b56714cc305ea" + "Hash": "b289af101f62cb7b46c53d1a45ed5198" }, "ellipsis": { "Package": "ellipsis", @@ -730,29 +731,24 @@ }, "epidatasets": { "Package": "epidatasets", - "Version": "0.0.1", + "Version": "0.0.2", "Source": "GitHub", "RemoteType": "github", "RemoteUsername": "cmu-delphi", "RemoteRepo": "epidatasets", "RemoteRef": "main", - "RemoteSha": "0632a77dc30655bbbb8c9667d7365f99ad0d5622", + "RemoteSha": "2c45cc1b11235c2fd323bb27d445e8631b1a985c", "RemoteHost": "api.github.com", "Requirements": [ "R" ], - "Hash": "d8715113bd6e6fbbddb664144d999dd0" + "Hash": "be7f01dda806dba9b69030deb8c08d03" }, "epidatr": { "Package": "epidatr", "Version": "1.2.0", - "Source": "GitHub", - "RemoteType": "github", - "RemoteUsername": "cmu-delphi", - "RemoteRepo": "epidatr", - "RemoteRef": "dev", - "RemoteSha": "0b3480889091063e5b03358cea10670292a803e6", - "RemoteHost": "api.github.com", + "Source": "Repository", + "Repository": "RSPM", "Requirements": [ "MMWRweek", "R", @@ -771,24 +767,25 @@ "usethis", "xml2" ], - "Hash": "3ad6e3cc0f0a1ff4b1e976b00ba3654d" + "Hash": "addc79c7bdb24b1e865083d42bae56b1" }, "epipredict": { "Package": "epipredict", - "Version": "0.0.24", + "Version": "0.1.11", "Source": "GitHub", "RemoteType": "github", "RemoteHost": "api.github.com", "RemoteUsername": "cmu-delphi", "RemoteRepo": "epipredict", "RemoteRef": "dev", - "RemoteSha": "36c4c0a88f77861302b35e95b815609f9014e90d", + "RemoteSha": "a805c2ae78e5af6a8697c6a4492a6d47bedcdfbe", "Requirements": [ "R", "checkmate", "cli", "distributional", "dplyr", + "epidatasets", "epiprocess", "generics", "ggplot2", @@ -807,25 +804,25 @@ "vctrs", "workflows" ], - "Hash": "2015c74d601879eaeb391c269cb7551d" + "Hash": "3b6c44efe33276f2bc5ea594cb53ac47" }, "epiprocess": { "Package": "epiprocess", - "Version": "0.9.0", + "Version": "0.10.5", "Source": "GitHub", "RemoteType": "github", - "RemoteHost": "api.github.com", "RemoteUsername": "cmu-delphi", "RemoteRepo": "epiprocess", "RemoteRef": "dev", - "RemoteSha": "44e70950a0e3c3c2bd8da52e5234dc505d99bb00", + "RemoteSha": "da3fa5f170bffb062af8462e16ebc395e6a5e35f", + "RemoteHost": "api.github.com", "Requirements": [ "R", "checkmate", "cli", "data.table", "dplyr", - "genlasso", + "epidatasets", "ggplot2", "glue", "lifecycle", @@ -837,22 +834,23 @@ "tibble", "tidyr", "tidyselect", + "tools", "tsibble", "utils", "vctrs", "waldo" ], - "Hash": "057dc098224b8288fdbaf47e9fd5ed86" + "Hash": "28141ab6f9f6b39a3e41bf582e92663a" }, "evaluate": { "Package": "evaluate", - "Version": "1.0.0", + "Version": "1.0.3", "Source": "Repository", "Repository": "RSPM", "Requirements": [ "R" ], - "Hash": "6b567375113ceb7d9f800de4dd42218e" + "Hash": "e9651417729bbe7472e32b5027370e79" }, "fansi": { "Package": "fansi", @@ -882,7 +880,7 @@ }, "fontawesome": { "Package": "fontawesome", - "Version": "0.5.2", + "Version": "0.5.3", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -890,7 +888,7 @@ "htmltools", "rlang" ], - "Hash": "c2efdd5f0bcd1ea861c2d4e2a883a67d" + "Hash": "bd1297f9b5b1fc1372d19e2c4cd82215" }, "forcats": { "Package": "forcats", @@ -923,14 +921,14 @@ }, "fs": { "Package": "fs", - "Version": "1.6.4", + "Version": "1.6.5", "Source": "Repository", "Repository": "RSPM", "Requirements": [ "R", "methods" ], - "Hash": "15aeb8c27f5ea5161f9f6a641fafd93a" + "Hash": "7f48af39fa27711ea5fbd183b399920d" }, "furrr": { "Package": "furrr", @@ -965,7 +963,7 @@ }, "future.apply": { "Package": "future.apply", - "Version": "1.11.2", + "Version": "1.11.3", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -975,7 +973,7 @@ "parallel", "utils" ], - "Hash": "afe1507511629f44572e6c53b9baeb7c" + "Hash": "0efead26b03bb60716ed29971b1953dc" }, "gargle": { "Package": "gargle", @@ -1010,26 +1008,9 @@ ], "Hash": "15e9634c0fcd294799e9b2e929ed1b86" }, - "genlasso": { - "Package": "genlasso", - "Version": "1.6.1", - "Source": "GitHub", - "RemoteType": "github", - "RemoteHost": "api.github.com", - "RemoteUsername": "glmgen", - "RemoteRepo": "genlasso", - "RemoteRef": "master", - "RemoteSha": "5adbeb9976597ca27f4380555e96e27b6ded4215", - "Requirements": [ - "Matrix", - "R", - "igraph" - ], - "Hash": "276baa70b52ea97ac281175bae6d1952" - }, "gert": { "Package": "gert", - "Version": "2.1.2", + "Version": "2.1.4", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -1040,7 +1021,7 @@ "sys", "zip" ], - "Hash": "347d104ed332650b737f509a703c9c7f" + "Hash": "ae855ad6d7be20dd7b05d43d25700398" }, "ggplot2": { "Package": "ggplot2", @@ -1108,14 +1089,14 @@ }, "glue": { "Package": "glue", - "Version": "1.7.0", + "Version": "1.8.0", "Source": "Repository", "Repository": "RSPM", "Requirements": [ "R", "methods" ], - "Hash": "e0b3a53876554bd45879e596cdb10a52" + "Hash": "5899f1eaa825580172bb56c08266f37c" }, "googledrive": { "Package": "googledrive", @@ -1172,14 +1153,14 @@ }, "gower": { "Package": "gower", - "Version": "1.0.1", + "Version": "1.0.2", "Source": "Repository", "Repository": "RSPM", - "Hash": "7a0051eef852c301b5efe2f7913dd45f" + "Hash": "7a37062c38f12229b5191cf5f593f91e" }, "gtable": { "Package": "gtable", - "Version": "0.3.5", + "Version": "0.3.6", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -1188,13 +1169,14 @@ "glue", "grid", "lifecycle", - "rlang" + "rlang", + "stats" ], - "Hash": "e18861963cbc65a27736e02b3cd3c4a0" + "Hash": "de949855009e2d4d0e52a844e30617ae" }, "hardhat": { "Package": "hardhat", - "Version": "1.4.0", + "Version": "1.4.1", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -1202,10 +1184,11 @@ "cli", "glue", "rlang", + "sparsevctrs", "tibble", "vctrs" ], - "Hash": "e7aabf81944f6c6cbbcec1f85827a279" + "Hash": "42700da57c44a7f594270799803e20be" }, "haven": { "Package": "haven", @@ -1228,6 +1211,16 @@ ], "Hash": "9171f898db9d9c4c1b2c745adc2c1ef1" }, + "here": { + "Package": "here", + "Version": "1.0.1", + "Source": "Repository", + "Repository": "RSPM", + "Requirements": [ + "rprojroot" + ], + "Hash": "24b224366f9c2e7534d2344d10d59211" + }, "highr": { "Package": "highr", "Version": "0.11", @@ -1286,7 +1279,7 @@ }, "httr2": { "Package": "httr2", - "Version": "1.0.5", + "Version": "1.1.0", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -1303,7 +1296,7 @@ "vctrs", "withr" ], - "Hash": "d84e4c33206aaace37714901ac2b00c3" + "Hash": "0f14199bbd820a9fca398f2df40994f1" }, "ids": { "Package": "ids", @@ -1316,29 +1309,6 @@ ], "Hash": "99df65cfef20e525ed38c3d2577f7190" }, - "igraph": { - "Package": "igraph", - "Version": "2.0.3", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "Matrix", - "R", - "cli", - "cpp11", - "grDevices", - "graphics", - "lifecycle", - "magrittr", - "methods", - "pkgconfig", - "rlang", - "stats", - "utils", - "vctrs" - ], - "Hash": "c3b7d801d722e26e4cd888e042bf9af5" - }, "infer": { "Package": "infer", "Version": "1.0.7", @@ -1374,7 +1344,7 @@ }, "insight": { "Package": "insight", - "Version": "0.20.4", + "Version": "1.0.2", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -1383,7 +1353,7 @@ "stats", "utils" ], - "Hash": "8457d6e682a49f2c87b698a830527b09" + "Hash": "ba2606f588c01adc84699ed43e8d162a" }, "ipred": { "Package": "ipred", @@ -1435,17 +1405,17 @@ }, "jsonlite": { "Package": "jsonlite", - "Version": "1.8.9", + "Version": "1.9.0", "Source": "Repository", "Repository": "RSPM", "Requirements": [ "methods" ], - "Hash": "4e993b65c2c3ffbffce7bb3e2c6f832b" + "Hash": "a61860f091bd20d8dd6c3fd8ac7f6e90" }, "knitr": { "Package": "knitr", - "Version": "1.48", + "Version": "1.49", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -1457,7 +1427,7 @@ "xfun", "yaml" ], - "Hash": "acf380f300c721da9fde7df115a5f86f" + "Hash": "9fcb189926d93c636dea94fbe4f44480" }, "labeling": { "Package": "labeling", @@ -1487,7 +1457,7 @@ }, "lava": { "Package": "lava", - "Version": "1.8.0", + "Version": "1.8.1", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -1504,7 +1474,7 @@ "survival", "utils" ], - "Hash": "579303ca1e817d94cea694b319803380" + "Hash": "b74911f7265f9c908027dab63801a754" }, "lhs": { "Package": "lhs", @@ -1542,7 +1512,7 @@ }, "lubridate": { "Package": "lubridate", - "Version": "1.9.3", + "Version": "1.9.4", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -1551,7 +1521,7 @@ "methods", "timechange" ], - "Hash": "680ad542fbcf801442c83a6ac5a2126c" + "Hash": "be38bc740fc51783a78edb5a157e4104" }, "magrittr": { "Package": "magrittr", @@ -1603,22 +1573,21 @@ }, "modelbased": { "Package": "modelbased", - "Version": "0.8.8", + "Version": "0.9.0", "Source": "Repository", "Repository": "RSPM", "Requirements": [ "R", "bayestestR", "datawizard", - "effectsize", "graphics", "insight", "parameters", - "performance", "stats", + "tools", "utils" ], - "Hash": "be0465e9a8078f1c5a15344a2c130266" + "Hash": "1e8ee59c3cb38d41df8bbc93d7f6361d" }, "modeldata": { "Package": "modeldata", @@ -1637,16 +1606,17 @@ }, "modelenv": { "Package": "modelenv", - "Version": "0.1.1", + "Version": "0.2.0", "Source": "Repository", "Repository": "RSPM", "Requirements": [ + "cli", "glue", "rlang", "tibble", "vctrs" ], - "Hash": "fc2e59a68030885555c7be34ee7765a1" + "Hash": "d3c0bebed774ca58574f6f1d7e9b9a62" }, "modelr": { "Package": "modelr", @@ -1679,7 +1649,7 @@ }, "nlme": { "Package": "nlme", - "Version": "3.1-166", + "Version": "3.1-167", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -1689,19 +1659,19 @@ "stats", "utils" ], - "Hash": "ccbb8846be320b627e6aa2b4616a2ded" + "Hash": "abd9318b4073223646c0f9d3ed641904" }, "nnet": { "Package": "nnet", - "Version": "7.3-19", + "Version": "7.3-20", "Source": "Repository", - "Repository": "CRAN", + "Repository": "RSPM", "Requirements": [ "R", "stats", "utils" ], - "Hash": "2c797b46eea7fb58ede195bc0b1f1138" + "Hash": "c955edf99ff24a32e96bd0a22645af60" }, "numDeriv": { "Package": "numDeriv", @@ -1715,13 +1685,13 @@ }, "openssl": { "Package": "openssl", - "Version": "2.2.2", + "Version": "2.3.2", "Source": "Repository", "Repository": "RSPM", "Requirements": [ "askpass" ], - "Hash": "d413e0fef796c9401a4419485f709ca1" + "Hash": "bc54d87ebf858b28de18df4bca6528d3" }, "outbreaks": { "Package": "outbreaks", @@ -1735,7 +1705,7 @@ }, "parallelly": { "Package": "parallelly", - "Version": "1.38.0", + "Version": "1.42.0", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -1743,11 +1713,11 @@ "tools", "utils" ], - "Hash": "6e8b139c1904f5e9e14c69db64453bbe" + "Hash": "78f830734a4b488f2c72bf00cde6381e" }, "parameters": { "Package": "parameters", - "Version": "0.22.2", + "Version": "0.24.1", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -1760,11 +1730,11 @@ "stats", "utils" ], - "Hash": "ee0115da94a9cf7c451615415ce65c03" + "Hash": "1cff6878a759914640ff9ed51dd745a2" }, "parsnip": { "Package": "parsnip", - "Version": "1.2.1", + "Version": "1.3.0", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -1782,6 +1752,7 @@ "prettyunits", "purrr", "rlang", + "sparsevctrs", "stats", "tibble", "tidyr", @@ -1789,7 +1760,7 @@ "vctrs", "withr" ], - "Hash": "ace928adf616e06ece817d970faa2d03" + "Hash": "d7126709b5c62a6e96b3b570e758f1e7" }, "patchwork": { "Package": "patchwork", @@ -1812,7 +1783,7 @@ }, "performance": { "Package": "performance", - "Version": "0.12.3", + "Version": "0.13.0", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -1823,16 +1794,15 @@ "stats", "utils" ], - "Hash": "92be9503bc3394c464688fb6b03002e3" + "Hash": "35de0c9474563ca8054029abb0b6e72c" }, "pillar": { "Package": "pillar", - "Version": "1.9.0", + "Version": "1.10.1", "Source": "Repository", "Repository": "RSPM", "Requirements": [ "cli", - "fansi", "glue", "lifecycle", "rlang", @@ -1840,7 +1810,7 @@ "utils", "vctrs" ], - "Hash": "15da5a8412f317beeee6175fbc76f4bb" + "Hash": "8b16b6097daef84cd3c40a6a7c5c9d86" }, "pkgconfig": { "Package": "pkgconfig", @@ -1883,7 +1853,7 @@ }, "processx": { "Package": "processx", - "Version": "3.8.4", + "Version": "3.8.6", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -1892,7 +1862,7 @@ "ps", "utils" ], - "Hash": "0c90a7d71988856bad2a2a45dd871bb9" + "Hash": "720161b280b0a35f4d1490ead2fe81d0" }, "prodlim": { "Package": "prodlim", @@ -1929,7 +1899,7 @@ }, "progressr": { "Package": "progressr", - "Version": "0.14.0", + "Version": "0.15.1", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -1937,22 +1907,22 @@ "digest", "utils" ], - "Hash": "ac50c4ffa8f6a46580dd4d7813add3c4" + "Hash": "f8e9876b6ffff900e83f171081cacd6f" }, "ps": { "Package": "ps", - "Version": "1.8.0", + "Version": "1.9.0", "Source": "Repository", "Repository": "RSPM", "Requirements": [ "R", "utils" ], - "Hash": "4b9c8485b0c7eecdf0a9ba5132a45576" + "Hash": "8609dabea57eccf858a5923e4a9e77d4" }, "purrr": { "Package": "purrr", - "Version": "1.0.2", + "Version": "1.0.4", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -1963,7 +1933,7 @@ "rlang", "vctrs" ], - "Hash": "1cba04a4e9414bdefc9dcaa99649a8dc" + "Hash": "cc8b5d43f90551fa6df0a6be5d640a4f" }, "ragg": { "Package": "ragg", @@ -1978,7 +1948,7 @@ }, "ranger": { "Package": "ranger", - "Version": "0.16.0", + "Version": "0.17.0", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -1987,7 +1957,7 @@ "Rcpp", "RcppEigen" ], - "Hash": "d5ca3a8d00f088042ea3b638534e0f3d" + "Hash": "9fe0f505fe36cac3207e0790d21b3676" }, "rappdirs": { "Package": "rappdirs", @@ -2039,7 +2009,7 @@ }, "recipes": { "Package": "recipes", - "Version": "1.1.0", + "Version": "1.1.1", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -2058,6 +2028,7 @@ "magrittr", "purrr", "rlang", + "sparsevctrs", "stats", "tibble", "tidyr", @@ -2067,7 +2038,7 @@ "vctrs", "withr" ], - "Hash": "fc6672e55fcd1b5c461a3529ff6b1b08" + "Hash": "55b88203a511e8fe68f724284ad73ba2" }, "rematch": { "Package": "rematch", @@ -2088,13 +2059,13 @@ }, "renv": { "Package": "renv", - "Version": "1.0.9", + "Version": "1.1.1", "Source": "Repository", "Repository": "RSPM", "Requirements": [ "utils" ], - "Hash": "ef233f0e9064fc88c898b340c9add5c2" + "Hash": "e660d5fdd4939225e5f13a0f7784d67c" }, "reprex": { "Package": "reprex", @@ -2120,18 +2091,18 @@ }, "rlang": { "Package": "rlang", - "Version": "1.1.4", + "Version": "1.1.5", "Source": "Repository", "Repository": "RSPM", "Requirements": [ "R", "utils" ], - "Hash": "3eec01f8b1dee337674b2e34ab1f9bc1" + "Hash": "724dcc1490cd7071ee75ca2994a5446e" }, "rmarkdown": { "Package": "rmarkdown", - "Version": "2.28", + "Version": "2.29", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -2150,20 +2121,20 @@ "xfun", "yaml" ], - "Hash": "062470668513dcda416927085ee9bdc7" + "Hash": "df99277f63d01c34e95e3d2f06a79736" }, "rpart": { "Package": "rpart", - "Version": "4.1.23", + "Version": "4.1.24", "Source": "Repository", - "Repository": "CRAN", + "Repository": "RSPM", "Requirements": [ "R", "grDevices", "graphics", "stats" ], - "Hash": "b3d390424f41d04174cccf84d49676c2" + "Hash": "ad31b457482eda7a12e51c5d8e7b0be4" }, "rprojroot": { "Package": "rprojroot", @@ -2202,10 +2173,10 @@ }, "rstudioapi": { "Package": "rstudioapi", - "Version": "0.16.0", + "Version": "0.17.1", "Source": "Repository", "Repository": "RSPM", - "Hash": "96710351d642b70e8f02ddeb237c46a7" + "Hash": "5f90cd73946d706cfe26024294236113" }, "rvest": { "Package": "rvest", @@ -2262,7 +2233,7 @@ }, "see": { "Package": "see", - "Version": "0.9.0", + "Version": "0.10.0", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -2277,10 +2248,11 @@ "insight", "modelbased", "parameters", + "patchwork", "performance", "stats" ], - "Hash": "743de04e180938d89e913f392dc9a104" + "Hash": "e1d3163e1c87f9ccd337cb937e7c3437" }, "selectr": { "Package": "selectr", @@ -2297,16 +2269,16 @@ }, "sessioninfo": { "Package": "sessioninfo", - "Version": "1.2.2", + "Version": "1.2.3", "Source": "Repository", - "Repository": "CRAN", + "Repository": "RSPM", "Requirements": [ "R", "cli", "tools", "utils" ], - "Hash": "3f9796a8d0a0e8c6eb49a4b029359d1f" + "Hash": "bf169c6e52cdbded916e448dc1254913" }, "sfd": { "Package": "sfd", @@ -2336,7 +2308,7 @@ }, "slider": { "Package": "slider", - "Version": "0.3.1", + "Version": "0.3.2", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -2346,7 +2318,20 @@ "vctrs", "warp" ], - "Hash": "a584625e2b9e4fad4be135c8ea5c99aa" + "Hash": "5dbae6fb8e910b174bc3428164f4e8f8" + }, + "sparsevctrs": { + "Package": "sparsevctrs", + "Version": "0.2.0", + "Source": "Repository", + "Repository": "RSPM", + "Requirements": [ + "R", + "cli", + "rlang", + "vctrs" + ], + "Hash": "7369d4acd39e9f753c555f3943beeea2" }, "stringi": { "Package": "stringi", @@ -2380,7 +2365,7 @@ }, "survival": { "Package": "survival", - "Version": "3.7-0", + "Version": "3.8-3", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -2392,39 +2377,46 @@ "stats", "utils" ], - "Hash": "5aaa9cbaf4aba20f8e06fdea1850a398" + "Hash": "fe42836742a4f065b3f3f5de81fccab9" }, "sys": { "Package": "sys", - "Version": "3.4.2", + "Version": "3.4.3", "Source": "Repository", "Repository": "RSPM", - "Hash": "3a1be13d68d47a8cd0bfd74739ca1555" + "Hash": "de342ebfebdbf40477d0758d05426646" }, "systemfonts": { "Package": "systemfonts", - "Version": "1.1.0", + "Version": "1.2.1", "Source": "Repository", "Repository": "RSPM", "Requirements": [ "R", "cpp11", - "lifecycle" + "grid", + "jsonlite", + "lifecycle", + "tools", + "utils" ], - "Hash": "213b6b8ed5afbf934843e6c3b090d418" + "Hash": "f8b2924480a2679e2bad9750646112fe" }, "textshaping": { "Package": "textshaping", - "Version": "0.4.0", + "Version": "1.0.0", "Source": "Repository", "Repository": "RSPM", "Requirements": [ "R", "cpp11", "lifecycle", - "systemfonts" + "stats", + "stringi", + "systemfonts", + "utils" ], - "Hash": "5142f8bc78ed3d819d26461b641627ce" + "Hash": "5d44adc8145c718066b0bc374d142ca1" }, "tibble": { "Package": "tibble", @@ -2447,7 +2439,7 @@ }, "tidymodels": { "Package": "tidymodels", - "Version": "1.2.0", + "Version": "1.3.0", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -2474,7 +2466,7 @@ "workflowsets", "yardstick" ], - "Hash": "c3296bbe8389a31fafc1ee07e69889a7" + "Hash": "404488681fdfecc30a2ac59cb24fd395" }, "tidyr": { "Package": "tidyr", @@ -2582,17 +2574,17 @@ }, "tinytex": { "Package": "tinytex", - "Version": "0.53", + "Version": "0.56", "Source": "Repository", "Repository": "RSPM", "Requirements": [ "xfun" ], - "Hash": "9db859e8aabbb474293dde3097839420" + "Hash": "68e6032b8c16f33fab8756e40a0dca1a" }, "tsibble": { "Package": "tsibble", - "Version": "1.1.5", + "Version": "1.1.6", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -2609,11 +2601,11 @@ "tidyselect", "vctrs" ], - "Hash": "a75e397766b45996310908b5b32557ba" + "Hash": "10a8318a2264880cf5e83b32241306d2" }, "tune": { "Package": "tune", - "Version": "1.2.1", + "Version": "1.3.0", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -2643,7 +2635,7 @@ "workflows", "yardstick" ], - "Hash": "7fbdbcd58e7a63957b23ddb751b346af" + "Hash": "e774223f0f27d8579e4b55fa9fe8938f" }, "tzdb": { "Package": "tzdb", @@ -2658,7 +2650,7 @@ }, "usethis": { "Package": "usethis", - "Version": "3.0.0", + "Version": "3.1.0", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -2680,12 +2672,13 @@ "rprojroot", "rstudioapi", "stats", + "tools", "utils", "whisker", "withr", "yaml" ], - "Hash": "b2fbf93c2127bedd2cbe9b799530d5d2" + "Hash": "0d7f5ca181f9b1e68b217bd93b6cc703" }, "utf8": { "Package": "utf8", @@ -2759,7 +2752,7 @@ }, "waldo": { "Package": "waldo", - "Version": "0.5.3", + "Version": "0.6.1", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -2768,11 +2761,9 @@ "diffobj", "glue", "methods", - "rematch2", - "rlang", - "tibble" + "rlang" ], - "Hash": "16aa934a49658677d8041df9017329b9" + "Hash": "52f574062a7b66e56926988c3fbdb3b7" }, "warp": { "Package": "warp", @@ -2793,7 +2784,7 @@ }, "withr": { "Package": "withr", - "Version": "3.0.1", + "Version": "3.0.2", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -2801,11 +2792,11 @@ "grDevices", "graphics" ], - "Hash": "07909200e8bbe90426fbfeb73e1e27aa" + "Hash": "cc2d62c76458d425210d1eb1478b30b4" }, "workflows": { "Package": "workflows", - "Version": "1.1.4", + "Version": "1.2.0", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -2817,11 +2808,14 @@ "lifecycle", "modelenv", "parsnip", + "recipes", "rlang", + "sparsevctrs", "tidyselect", - "vctrs" + "vctrs", + "withr" ], - "Hash": "f2c2cefdf6babfed4594b33479d19fc3" + "Hash": "3339eafba56a1ccdc0a59ce7ab7be055" }, "workflowsets": { "Package": "workflowsets", @@ -2855,7 +2849,7 @@ }, "xfun": { "Package": "xfun", - "Version": "0.47", + "Version": "0.51", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -2864,7 +2858,7 @@ "stats", "tools" ], - "Hash": "36ab21660e2d095fef0d83f689e0477c" + "Hash": "e1a3c06389a46d065c18bd4bbc27c64c" }, "xgboost": { "Package": "xgboost", @@ -2902,7 +2896,7 @@ }, "yardstick": { "Package": "yardstick", - "Version": "1.3.1", + "Version": "1.3.2", "Source": "Repository", "Repository": "RSPM", "Requirements": [ @@ -2919,14 +2913,14 @@ "vctrs", "withr" ], - "Hash": "9ce4117141b326c4fffc7c42e56e0f88" + "Hash": "04af7b76ce6c3472553a8b508c073896" }, "zip": { "Package": "zip", - "Version": "2.3.1", + "Version": "2.3.2", "Source": "Repository", "Repository": "RSPM", - "Hash": "fcc4bd8e6da2d2011eb64a5e5cc685ab" + "Hash": "2f2ac1424654714391fe94dd69e196a9" } } } diff --git a/renv/activate.R b/renv/activate.R index c360bf2..2fe247d 100644 --- a/renv/activate.R +++ b/renv/activate.R @@ -2,7 +2,7 @@ local({ # the requested version of renv - version <- "1.0.9" + version <- "1.1.1" attr(version, "sha") <- NULL # the project directory @@ -42,7 +42,7 @@ local({ return(FALSE) # next, check environment variables - # TODO: prefer using the configuration one in the future + # prefer using the configuration one in the future envvars <- c( "RENV_CONFIG_AUTOLOADER_ENABLED", "RENV_AUTOLOADER_ENABLED", @@ -135,12 +135,12 @@ local({ # R help links pattern <- "`\\?(renv::(?:[^`])+)`" - replacement <- "`\033]8;;ide:help:\\1\a?\\1\033]8;;\a`" + replacement <- "`\033]8;;x-r-help:\\1\a?\\1\033]8;;\a`" text <- gsub(pattern, replacement, text, perl = TRUE) # runnable code pattern <- "`(renv::(?:[^`])+)`" - replacement <- "`\033]8;;ide:run:\\1\a\\1\033]8;;\a`" + replacement <- "`\033]8;;x-r-run:\\1\a\\1\033]8;;\a`" text <- gsub(pattern, replacement, text, perl = TRUE) # return ansified text @@ -209,10 +209,6 @@ local({ } - startswith <- function(string, prefix) { - substring(string, 1, nchar(prefix)) == prefix - } - bootstrap <- function(version, library) { friendly <- renv_bootstrap_version_friendly(version) @@ -368,8 +364,7 @@ local({ quiet = TRUE ) - if ("headers" %in% names(formals(utils::download.file))) - { + if ("headers" %in% names(formals(utils::download.file))) { headers <- renv_bootstrap_download_custom_headers(url) if (length(headers) && is.character(headers)) args$headers <- headers @@ -457,9 +452,8 @@ local({ # add custom headers if available -- note that # utils::available.packages() will pass this to download.file() - if ("headers" %in% names(formals(utils::download.file))) - { - headers <- renv_bootstrap_download_custom_headers(url) + if ("headers" %in% names(formals(utils::download.file))) { + headers <- renv_bootstrap_download_custom_headers(repos) if (length(headers) && is.character(headers)) args$headers <- headers } @@ -565,6 +559,9 @@ local({ # prepare download options token <- renv_bootstrap_github_token() + if (is.null(token)) + token <- "" + if (nzchar(Sys.which("curl")) && nzchar(token)) { fmt <- "--location --fail --header \"Authorization: token %s\"" extra <- sprintf(fmt, token) @@ -953,8 +950,14 @@ local({ } renv_bootstrap_validate_version_dev <- function(version, description) { + expected <- description[["RemoteSha"]] - is.character(expected) && startswith(expected, version) + if (!is.character(expected)) + return(FALSE) + + pattern <- sprintf("^\\Q%s\\E", version) + grepl(pattern, expected, perl = TRUE) + } renv_bootstrap_validate_version_release <- function(version, description) { @@ -1134,10 +1137,10 @@ local({ renv_bootstrap_exec <- function(project, libpath, version) { if (!renv_bootstrap_load(project, libpath, version)) - renv_bootstrap_run(version, libpath) + renv_bootstrap_run(project, libpath, version) } - renv_bootstrap_run <- function(version, libpath) { + renv_bootstrap_run <- function(project, libpath, version) { # perform bootstrap bootstrap(version, libpath) @@ -1148,7 +1151,7 @@ local({ # try again to load if (requireNamespace("renv", lib.loc = libpath, quietly = TRUE)) { - return(renv::load(project = getwd())) + return(renv::load(project = project)) } # failed to download or load renv; warn the user @@ -1194,98 +1197,101 @@ local({ jsonlite::fromJSON(txt = text, simplifyVector = FALSE) } - renv_json_read_default <- function(file = NULL, text = NULL) { - - # find strings in the JSON - text <- paste(text %||% readLines(file, warn = FALSE), collapse = "\n") - pattern <- '["](?:(?:\\\\.)|(?:[^"\\\\]))*?["]' - locs <- gregexpr(pattern, text, perl = TRUE)[[1]] - - # if any are found, replace them with placeholders - replaced <- text - strings <- character() - replacements <- character() - - if (!identical(c(locs), -1L)) { - - # get the string values - starts <- locs - ends <- locs + attr(locs, "match.length") - 1L - strings <- substring(text, starts, ends) - - # only keep those requiring escaping - strings <- grep("[[\\]{}:]", strings, perl = TRUE, value = TRUE) - - # compute replacements - replacements <- sprintf('"\032%i\032"', seq_along(strings)) - - # replace the strings - mapply(function(string, replacement) { - replaced <<- sub(string, replacement, replaced, fixed = TRUE) - }, strings, replacements) + renv_json_read_patterns <- function() { + + list( + + # objects + list("{", "\t\n\tobject(\t\n\t"), + list("}", "\t\n\t)\t\n\t"), + + # arrays + list("[", "\t\n\tarray(\t\n\t"), + list("]", "\n\t\n)\n\t\n"), + + # maps + list(":", "\t\n\t=\t\n\t") + + ) + + } + renv_json_read_envir <- function() { + + envir <- new.env(parent = emptyenv()) + + envir[["+"]] <- `+` + envir[["-"]] <- `-` + + envir[["object"]] <- function(...) { + result <- list(...) + names(result) <- as.character(names(result)) + result } - - # transform the JSON into something the R parser understands - transformed <- replaced - transformed <- gsub("{}", "`names<-`(list(), character())", transformed, fixed = TRUE) - transformed <- gsub("[[{]", "list(", transformed, perl = TRUE) - transformed <- gsub("[]}]", ")", transformed, perl = TRUE) - transformed <- gsub(":", "=", transformed, fixed = TRUE) - text <- paste(transformed, collapse = "\n") - - # parse it - json <- parse(text = text, keep.source = FALSE, srcfile = NULL)[[1L]] - - # construct map between source strings, replaced strings - map <- as.character(parse(text = strings)) - names(map) <- as.character(parse(text = replacements)) - - # convert to list - map <- as.list(map) - - # remap strings in object - remapped <- renv_json_read_remap(json, map) - - # evaluate - eval(remapped, envir = baseenv()) - + + envir[["array"]] <- list + + envir[["true"]] <- TRUE + envir[["false"]] <- FALSE + envir[["null"]] <- NULL + + envir + } - renv_json_read_remap <- function(json, map) { - - # fix names - if (!is.null(names(json))) { - lhs <- match(names(json), names(map), nomatch = 0L) - rhs <- match(names(map), names(json), nomatch = 0L) - names(json)[rhs] <- map[lhs] + renv_json_read_remap <- function(object, patterns) { + + # repair names if necessary + if (!is.null(names(object))) { + + nms <- names(object) + for (pattern in patterns) + nms <- gsub(pattern[[2L]], pattern[[1L]], nms, fixed = TRUE) + names(object) <- nms + } - - # fix values - if (is.character(json)) - return(map[[json]] %||% json) - - # handle true, false, null - if (is.name(json)) { - text <- as.character(json) - if (text == "true") - return(TRUE) - else if (text == "false") - return(FALSE) - else if (text == "null") - return(NULL) + + # repair strings if necessary + if (is.character(object)) { + for (pattern in patterns) + object <- gsub(pattern[[2L]], pattern[[1L]], object, fixed = TRUE) } + + # recurse for other objects + if (is.recursive(object)) + for (i in seq_along(object)) + object[i] <- list(renv_json_read_remap(object[[i]], patterns)) + + # return remapped object + object + + } - # recurse - if (is.recursive(json)) { - for (i in seq_along(json)) { - json[i] <- list(renv_json_read_remap(json[[i]], map)) - } - } + renv_json_read_default <- function(file = NULL, text = NULL) { - json + # read json text + text <- paste(text %||% readLines(file, warn = FALSE), collapse = "\n") + + # convert into something the R parser will understand + patterns <- renv_json_read_patterns() + transformed <- text + for (pattern in patterns) + transformed <- gsub(pattern[[1L]], pattern[[2L]], transformed, fixed = TRUE) + + # parse it + rfile <- tempfile("renv-json-", fileext = ".R") + on.exit(unlink(rfile), add = TRUE) + writeLines(transformed, con = rfile) + json <- parse(rfile, keep.source = FALSE, srcfile = NULL)[[1L]] + # evaluate in safe environment + result <- eval(json, envir = renv_json_read_envir()) + + # fix up strings if necessary + renv_json_read_remap(result, patterns) + } + # load the renv profile, if any renv_bootstrap_profile_load(project) diff --git a/smooth-qr-files/smooth-qr_baseline_preds.rds b/smooth-qr-files/smooth-qr_baseline_preds.rds new file mode 100644 index 0000000..5c14399 Binary files /dev/null and b/smooth-qr-files/smooth-qr_baseline_preds.rds differ diff --git a/smooth-qr-files/smooth-qr_smooth_preds_list.rds b/smooth-qr-files/smooth-qr_smooth_preds_list.rds new file mode 100644 index 0000000..a32f265 Binary files /dev/null and b/smooth-qr-files/smooth-qr_smooth_preds_list.rds differ diff --git a/smooth-qr.qmd b/smooth-qr.qmd new file mode 100644 index 0000000..aae8cc8 --- /dev/null +++ b/smooth-qr.qmd @@ -0,0 +1,544 @@ +--- +title: "Smooth quantile regression" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Smooth quantile regression} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r setup, include = FALSE} +knitr::opts_chunk$set( + collapse = FALSE, + comment = "#>", + warning = FALSE, + message = FALSE, + out.width = "100%" +) +``` + +# Introducing smooth quantile regression + +Whereas other time-series forecasting examples in this package have used +(direct) models for single horizons, in multi-period forecasting, the goal is to +(directly) forecast several horizons simultaneously. This is useful in +epidemiological applications where decisions are based on the trend of a signal. + +The idea underlying smooth quantile regression is that set forecast targets can +be approximated by a smooth curve. This novel approach from +[Tuzhilina et al., 2022](https://arxiv.org/abs/2202.09723) +enforces smoothness across the +horizons and can be applied to point estimation by regression or interval +prediction by quantile regression. Our focus in this vignette is the latter. + +# Built-in function for smooth quantile regression and its parameters + +The built-in smooth quantile regression function, `smooth_quantile_reg()` +provides a model specification for smooth quantile regression that works under +the tidymodels framework. It has the following parameters and default values: + +```{r, eval = FALSE} +smooth_quantile_reg( + mode = "regression", + engine = "smoothqr", + outcome_locations = NULL, + quantile_levels = 0.5, + degree = 3L +) +``` + +For smooth quantile regression, the type of model or `mode` is regression. + +The only `engine` that is currently supported is `smooth_qr()` from the +[`smoothqr` package](https://dajmcdon.github.io/smoothqr/). + +The `outcome_locations` indicate the multiple horizon (ie. ahead) values. These +should be specified by the user. + +The `quantile_levels` parameter is a vector of values that indicates the +quantiles to be estimated. The default is the median (0.5 quantile). + +The `degree` parameter indicates the degree of the polynomials used for +smoothing of the response. It should be no more than the number of aheads. If +the degree is precisely equal to the number of aheads, then there is no +smoothing. To better understand this parameter and how it works, we should look +to its origins and how it is used in the model. + +# Model form + +Smooth quantile regression is linear auto-regressive, with the key feature being +a transformation that forces the coefficients to satisfy a smoothing constraint. +The purpose of this is for each model coefficient to be a smooth function of +ahead values, and so each such coefficient is set to be a linear combination of +smooth basis functions (such as a spline or a polynomial). + +The `degree` parameter controls the number of these polynomials used. It should +be no greater than the number of responses. This is a tuning parameter, and so +it can be chosen by performing a grid search with cross-validation. Intuitively, +$d = 1$ corresponds to the constant model, $d = 2$ gives straight line +forecasts, while $d = 3$ gives quadratic forecasts. Since a degree of 3 was +found to work well in the tested applications (see Section 9 of +[Tuzhilina et al., 2022](https://arxiv.org/abs/2202.09723)), +it is the default value. + +# Demonstration of smooth quantile regression + +```{r, message = FALSE} +library(epipredict) +library(dplyr) +library(purrr) +library(ggplot2) +theme_set(theme_bw()) +``` + +We will now apply smooth quantile regression on the real data used for COVID-19 +forecasting. The built-in dataset we will use is a subset of JHU daily data on +state cases and deaths. This sample data ranges from Dec. 31, 2020 to +Dec. 31, 2021. + +```{r} +edf <- epidatasets::covid_case_death_rates +``` + +We will set the forecast date to be November 30, 2021 so that we can produce +forecasts for target dates of 1 to 28 days ahead. We construct our test data, +`tedf` from the days beyond this. + +```{r} +fd <- as.Date("2021-11-30") + +tedf <- edf %>% filter(time_value >= fd) +``` + +We will use the most recent 3 months worth of data up to the forecast date for +training. + +```{r} +edf <- edf %>% filter(time_value < fd, time_value >= fd - 90L) +``` + +And for plotting our focus will be on a subset of two states - California and +Utah. + +```{r} +geos <- c("ut", "ca") +``` + +Suppose that our goal with this data is to predict COVID-19 death rates at +several horizons for each state. On day $t$, we want to predict new deaths $y$ +that are $a = 1,\dots, 28$ days ahead at locations $j$ using the death rates +from today, 1 week ago, and 2 weeks ago. So for each location, we'll predict the +median (0.5 quantile) for each of the target dates by using +$$ +\hat{y}_{j}(t+a) = \alpha(a) + \sum_{l = 0}^2 \beta_{l}(a) y_{j}(t - 7l) +$$ +where $\beta_{l}(a) = \sum_{i=1}^d \theta_{il} h_i(a)$ is the smoothing +constraint where ${h_1(a), \dots, h_d(a)}$ are the set of smooth basis functions +and $d$ is a hyperparameter that manages the flexibility of $\beta_{l}(a)$. +Remember that the goal is to have each $\beta_{l}(a)$ to be a smooth function of +the aheads and that is achieved through imposing the smoothing constraint. + +Note that this model is intended to be simple and straightforward. Our only +modification to this model is to add case rates as another predictive feature +(we will leave it to the reader to incorporate additional features beyond this +and the historical response values). We can update the basic model incorporate +the $k = 2$ predictive features of case and death rates for each location j, +$x_j(t) = (x_{j1}(t), x_{j2}(t))$ as follows: + +$$ +\hat{y}_{j}(t+a) = \alpha(a) + \sum_{k = 1}^2 \sum_{l = 0}^2 \beta_{kl}(a) x_{jk}(t - 7l) +$$ +where $\beta_{kl}(a) = \sum_{i=1}^d \theta_{ikl} h_i(a)$. + +Now, we will create our own forecaster from scratch by building up an +`epi_workflow` (there is no canned forecaster that is currently available). +Building our own forecaster allows for customization and control over the +pre-processing and post-processing actions we wish to take. + +The pre-processing steps we take in our `epi_recipe` are simply to lag the +predictor (by 0, 7, and 14 days) and lead the response by the multiple aheads +specified by the function user. + +The post-processing layers we add to our `frosting` are nearly as simple. We +first predict, unnest the prediction list-cols, omit NAs from them, and enforce +that they are greater than 0. + +The third component of an to an `epi_workflow`, the model, is smooth quantile +regression, which has three main arguments - the quantiles, aheads, and degree. + +After creating our `epi_workflow` with these components, we get our test data +based on longest lag period and make the predictions. + +We input our forecaster into a function for ease of use. + +```{r} +smooth_fc <- function(x, aheads = 1:28, degree = 3L, quantiles = 0.5, fd) { + rec <- epi_recipe(x) %>% + step_epi_lag(case_rate, lag = c(0, 7, 14)) %>% + step_epi_lag(death_rate, lag = c(0, 7, 14)) %>% + step_epi_ahead(death_rate, ahead = aheads) + + f <- frosting() %>% + layer_predict() %>% + layer_unnest(.pred) %>% + layer_naomit(distn) %>% + layer_add_forecast_date() %>% + layer_threshold(distn) + + ee <- smooth_quantile_reg( + quantile_levels = quantiles, + outcome_locations = aheads, + degree = degree + ) + + ewf <- epi_workflow(rec, ee, f) + + the_fit <- ewf %>% fit(x) + + latest <- get_test_data(rec, x) + + preds <- predict(the_fit, new_data = latest) %>% + mutate(forecast_date = fd, target_date = fd + ahead) %>% + select(geo_value, target_date, distn, ahead) %>% + pivot_quantiles_wider(distn) + + preds +} +``` + +```{r load-stored-preds, echo=FALSE} +smooth_preds_list <- readRDS(here::here("smooth-qr-files/smooth-qr_smooth_preds_list.rds")) +baseline_preds <- readRDS(here::here("smooth-qr-files/smooth-qr_baseline_preds.rds")) +smooth_preds <- smooth_preds_list %>% + filter(degree == 3L) %>% + select(geo_value:ahead, `0.5`) +``` + +Notice that we allow the function user to specify the aheads, degree, and +quantile as they may want to change these parameter values. We also allow for +input of the forecast date as we fixed that at the onset of this demonstration. + +We now can produce smooth quantile regression predictions for our problem: + +```{r, eval = FALSE} +smooth_preds <- smooth_fc(edf, fd = fd) +smooth_preds +``` + +```{r, echo=FALSE} +smooth_preds +smooth_preds <- smooth_preds_list %>% + filter(degree == 3L) %>% + select(-degree) +``` + + +Most often, we're not going to want to limit ourselves to just predicting the +median value as there is uncertainty about the predictions, so let's try to +predict several different quantiles in addition to the median: + +```{r, eval = FALSE} +several_quantiles <- c(.1, .25, .5, .75, .9) +smooth_preds <- smooth_fc(edf, quantiles = several_quantiles, fd = fd) +smooth_preds +``` + +```{r, echo = FALSE} +several_quantiles <- c(.1, .25, .5, .75, .9) +smooth_preds +``` + +We can see that we have different columns for the different quantile +predictions. + +Let's visualize these results for the sample of two states. We will create a +simple plotting function, under which the median predictions are an orange line +and the surrounding quantiles are blue bands around this. For comparison, we +will include the actual values over time as a black line. + +```{r} +plot_preds <- function(preds, geos_to_plot = NULL, train_test_dat, fd) { + if (!is.null(geos_to_plot)) { + preds <- preds %>% filter(geo_value %in% geos_to_plot) + train_test_dat <- train_test_dat %>% filter(geo_value %in% geos_to_plot) + } + + ggplot(preds) + + geom_ribbon(aes(target_date, ymin = `0.1`, ymax = `0.9`), + fill = "cornflowerblue", alpha = .8 + ) + + geom_ribbon(aes(target_date, ymin = `0.25`, ymax = `0.75`), + fill = "#00488E", alpha = .8 + ) + + geom_line(data = train_test_dat, aes(time_value, death_rate)) + + geom_line(aes(target_date, `0.5`), color = "orange") + + geom_vline(xintercept = fd) + + facet_wrap(~geo_value) + + scale_x_date(name = "", date_labels = "%b %Y", date_breaks = "2 months") + + ylab("Deaths per 100K inhabitants") +} +``` + +Since we would like to plot the actual death rates for these states over time, +we bind the training and testing data together and input this into our plotting +function as follows: + +```{r, warning = FALSE} +plot_preds(smooth_preds, geos, bind_rows(tedf, edf), fd) +``` + +We can see that the predictions are smooth curves for each state, as expected +when using smooth quantile regression. In addition while the curvature of the +forecasts matches that of the truth, the forecasts do not look remarkably +accurate. + +## Varying the degrees parameter + +We can test the impact of different degrees by using the `map()` function. +Noting that this may take some time to run, let's try out all degrees from 1 +to 7: + +```{r, eval = FALSE} +smooth_preds_list <- map(1:7, function(x) { + smooth_fc( + edf, + degree = x, + quantiles = c(.1, .25, .5, .75, .9), + fd = fd + ) %>% + mutate(degree = x) +}) %>% list_rbind() +``` + +One way to quantify the impact of these on the forecasting is to look at the +mean absolute error (MAE) or mean squared error (MSE) over the degrees. We can +select the degree that results in the lowest MAE. + +Since the MAE compares the predicted values to the actual values, we will first +join the test data to the predicted data for our comparisons: +```{r, message = FALSE} +tedf_sub <- tedf %>% + rename(target_date = time_value, actual = death_rate) %>% + select(geo_value, target_date, actual) +``` + +And then compute the MAE for each of the degrees: +```{r, message = FALSE} +smooth_preds_df_deg <- smooth_preds_list %>% + left_join(tedf_sub, by = c("geo_value", "target_date")) %>% + group_by(degree) %>% + mutate(error = abs(`0.5` - actual)) %>% + summarise(mean = mean(error)) + +# Arrange the MAE from smallest to largest +smooth_preds_df_deg %>% arrange(mean) +``` + +Instead of just looking at the raw numbers, let's create a simple line plot to +visualize how the MAE changes over degrees for this data: + +```{r} +ggplot(smooth_preds_df_deg, aes(degree, mean)) + + geom_line() + + xlab("Degrees of freedom") + + ylab("Mean MAE") +``` + +We can see that the degree that results in the lowest MAE is 3. Hence, we could +pick this degree for future forecasting work on this data. + +## A brief comparison between smoothing and no smoothing + +Now, we will briefly compare the results from using smooth quantile regression +to those obtained without smoothing. The latter approach amounts to ordinary +quantile regression to get predictions for the intended target date. The main +drawback is that it ignores the fact that the responses all represent the same +signal, just for different ahead values. In contrast, the smooth quantile +regression approach utilizes this information about the data structure - the +fact that the aheads in are not be independent of each other, but that they are +naturally related over time by a smooth curve. + +To get the basic quantile regression results we can utilize the forecaster that +we've already built. We can simply set the degree to be the number of ahead +values to re-run the code without smoothing. + +```{r, eval = FALSE} +baseline_preds <- smooth_fc( + edf, + degree = 28L, quantiles = several_quantiles, fd = fd +) +``` + +And we can produce the corresponding plot to inspect the predictions obtained +under the baseline model: + +```{r, warning = FALSE} +plot_preds(baseline_preds, geos, bind_rows(tedf, edf), fd) +``` + +Unlike for smooth quantile regression, the resulting forecasts are not smooth +curves, but rather jagged and irregular in shape. + +For a more formal comparison between the two approaches, we could compare the +test performance in terms of accuracy through calculating either the, MAE or +MSE, where the performance measure of choice can be calculated over over all +times and locations for each ahead value + + +```{r, message = FALSE} +baseline_preds_mae_df <- baseline_preds %>% + left_join(tedf_sub, by = c("geo_value", "target_date")) %>% + group_by(ahead) %>% + mutate(error = abs(`0.5` - actual)) %>% + summarise(mean = mean(error)) %>% + mutate(type = "baseline") + +smooth_preds_mae_df <- smooth_preds %>% + left_join(tedf_sub, by = c("geo_value", "target_date")) %>% + group_by(ahead) %>% + mutate(error = abs(`0.5` - actual)) %>% + summarise(mean = mean(error)) %>% + mutate(type = "smooth") + +preds_mae_df <- bind_rows(baseline_preds_mae_df, smooth_preds_mae_df) + +ggplot(preds_mae_df, aes(ahead, mean, color = type)) + + geom_line() + + xlab("Ahead") + + ylab("Mean MAE") + + scale_color_manual(values = c("darkred", "#063970"), name = "") +``` + +or over all aheads, times, and locations for a single numerical summary. + +```{r} +mean(baseline_preds_mae_df$mean) +mean(smooth_preds_mae_df$mean) +``` + +The former shows that forecasts for the immediate future and for the distant +future are more inaccurate for both models under consideration. The latter shows +that the smooth quantile regression model and baseline models perform very +similarly overall, with the smooth quantile regression model only slightly +beating the baseline model in terms of overall average MAE. + +One other commonly used metric is the Weighted Interval Score +(WIS, [Bracher et al., 2021](https://arxiv.org/pdf/2005.12881.pdf)), +which a scoring rule that is based on the population quantiles. The point is to +score the interval, whereas MAE only evaluates the accuracy of the point +forecast. + +Let $F$ be a forecast composed of predicted quantiles $q_{\tau}$ for the set of +quantile levels $\tau$. Then, in terms of the predicted quantiles, the WIS for +target variable $Y$ is represented as follows +([McDonald etal., 2021](https://www.pnas.org/doi/full/10.1073/pnas.2111453118)): + +$$ +WIS(F, Y) = 2 \sum_{\tau} \phi_{\tau} (Y - q_{\tau}) +$$ +where $\phi_{\tau}(x) = \tau |x|$ for $x \geq 0$ +and$\phi_{\tau}(x) = (1 - \tau) |x|$ for $x < 0$. + +This form is general as it can accommodate both symmetric and asymmetric +quantile levels. If the quantile levels are symmetric, then we can alternatively +express the WIS as a collection of central prediction intervals +($\ell_{\alpha}, u_{\alpha}$) parametrized by the exclusion probability +$\alpha$: + +$$ +WIS(F, Y) = \sum_{\alpha} \{ (u_{\alpha} - \ell_{\alpha}) + 2 \cdot \text{dist}(Y, [\ell_{\alpha}, u_{\alpha}]) \} +$$ +where $\text{dist}(a,S)$ is the smallest distance between point $a$ and an +element of set $S$. + +While we implement the former representation, we mention this form because it +shows the that the score can be decomposed into the addition of a sharpness +component (first term in the summand) and an under/overprediction component +(second term in the summand). This alternative representation is useful because +from it, we more easily see the major limitation to the WIS, which is that the +score tends to prioritize sharpness (how wide the interval is) relative to +coverage (if the interval contains the truth). + +Now, we write a simple function for the first representation of the score that +is compatible with the latest version of `epipredict` (adapted from the +corresponding function in +[smoothmpf-epipredict](https://github.com/dajmcdon/smoothmpf-epipredict)). The +inputs for it are the actual and predicted values and the quantile levels. + +```{r} +wis_dist_quantile <- function(actual, values, quantile_levels) { + 2 * mean(pmax( + quantile_levels * (actual - values), + (1 - quantile_levels) * (values - actual), + na.rm = TRUE + )) +} +``` + +Next, we apply the `wis_dist_quantile` function to get a WIS score for each +state on each target date. We then compute the mean WIS for each ahead value +over all of the states. The results for each of the smooth and baseline +forecasters are shown in a similar style line plot as we chose for MAE: + +```{r} +smooth_preds_wis_df <- smooth_preds %>% + left_join(tedf_sub, by = c("geo_value", "target_date")) %>% + rowwise() %>% + mutate(wis = wis_dist_quantile( + actual, c(`0.1`, `0.25`, `0.5`, `0.75`, `0.9`), + several_quantiles + )) %>% + group_by(ahead) %>% + summarise(mean = mean(wis)) %>% + mutate(type = "smooth") + +baseline_preds_wis_df <- baseline_preds %>% + left_join(tedf_sub, by = c("geo_value", "target_date")) %>% + rowwise() %>% + mutate(wis = wis_dist_quantile( + actual, c(`0.1`, `0.25`, `0.5`, `0.75`, `0.9`), + several_quantiles + )) %>% + group_by(ahead) %>% + summarise(mean = mean(wis)) %>% + mutate(type = "baseline") + +preds_wis_df <- bind_rows(smooth_preds_wis_df, baseline_preds_wis_df) + +ggplot(preds_wis_df, aes(ahead, mean, color = type)) + + geom_line() + + xlab("Ahead") + + ylab("Mean WIS") + + scale_color_manual(values = c("darkred", "#063970"), name = "") +``` + +The results are consistent with what we saw for MAE: The forecasts for the near +and distant future tend to be inaccurate for both models. The smooth quantile +regression model only slightly outperforms the baseline model. + +Though averaging the WIS score over location and time tends to be the primary +aggregation scheme used in evaluation and model comparisons (see, for example, +[McDonald et al., 2021](https://www.pnas.org/doi/full/10.1073/pnas.2111453118)), +we can also obtain a single numerical summary by averaging over the aheads, +times, and locations: + +```{r} +mean(baseline_preds_wis_df$mean) +mean(smooth_preds_wis_df$mean) +``` + +Overall, both perspectives agree that the smooth quantile regression model tends +to perform only slightly better than the baseline model in terms of average WIS, +illustrating the difficulty of this forecasting problem. + +# What we've learned in a nutshell + +Smooth quantile regression is used in multi-period forecasting for predicting +several horizons simultaneously with a single smooth curve. It operates under +the key assumption that the future of the response can be approximated well by a +smooth curve. + +# Attribution + +The information presented on smooth quantile regression is from +[Tuzhilina et al., 2022](https://arxiv.org/abs/2202.09723).