Skip to content

Commit e2b2dce

Browse files
brookslogandajmcdondshemetov
authored
fix: update tooling book to epiprocess==0.9.0 and others (#12)
* bump epiprocess, epipredict, epidatr versions, update due to breaking changes * bump versions on all dependencies, update renv.lock. * replace both `pivot_quantiles` and `unnest` `pivot_wider` patterns to `pivot_quantiles_wider` * fix some `epi_recipe` and `frosting` printing that doesn't play well with knitr now. * update a few plots * update for epiprocess R6 refactor * remove references to R6 and mutation * fix the authors section of DESCRIPTION * integrate Rprofile with user Rprofile * add a README file * fix broken formatting in packages.bib * get missing data for sliding-forecasters.qmd online instead of local files Co-authored-by: Daniel McDonald <[email protected]> Co-authored-by: Dmitry Shemetov <[email protected]>
1 parent 4c3830c commit e2b2dce

File tree

68 files changed

+67659
-46966
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

68 files changed

+67659
-46966
lines changed

.Rprofile

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,7 @@
11
source("renv/activate.R")
2+
3+
# Check if user .Rprofile exists
4+
if (file.exists("~/.Rprofile")) {
5+
# Source user .Rprofile
6+
source("~/.Rprofile")
7+
}

DESCRIPTION

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,12 @@ Package: delphitoolingbook
22
Title: Delphi Tooling
33
Version: 0.0.0.9999
44
Authors@R: c(
5-
person("Daniel", "McDonald", "J.", "[email protected]", role = c("cre", "aut"),
6-
person("Logan", "Brooks", role = c("cre","aut"),
7-
person("Rachel", "Lobay", role = "aut"))
8-
person("Ryan", "Tibshirani", "J.", "[email protected]", role = "aut"),
9-
Description:
5+
person("Daniel", "McDonald", "J.", "[email protected]", role = c("cre", "aut")),
6+
person("Logan", "Brooks", role = c("cre","aut")),
7+
person("Rachel", "Lobay", role = "aut"),
8+
person("Ryan", "Tibshirani", "J.", "[email protected]", role = "aut")
9+
)
10+
Description:
1011
| This book is a longform introduction to analysing and forecasting epidemiological data.
1112
License: MIT + file LICENSE
1213
Imports:

README.md

Lines changed: 22 additions & 0 deletions

_common.R

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,3 +42,14 @@ options(
4242

4343
ggplot2::theme_set(ggplot2::theme_bw())
4444

45+
# Workaround for interleaved `cat`s and `message`s (from `cli`) getting
46+
# intercepted and not combined properly by `collapse: true`:
47+
with_messages_cat_to_stdout <- function(code) {
48+
withCallingHandlers(
49+
code,
50+
message = function(m) {
51+
cat(m$message)
52+
tryInvokeRestart("muffleMessage")
53+
}
54+
)
55+
}

_freeze/archive/execute-results/html.json

Lines changed: 2 additions & 2 deletions
Large diffs are not rendered by default.

_freeze/archive/figure-html/unnamed-chunk-8-1.svg

Lines changed: 1465 additions & 0 deletions

_freeze/archive/figure-html/unnamed-chunk-9-1.svg

Lines changed: 588 additions & 591 deletions

_freeze/correlations/figure-html/unnamed-chunk-10-1.svg

Lines changed: 147 additions & 150 deletions

_freeze/correlations/figure-html/unnamed-chunk-4-1.svg

Lines changed: 171 additions & 174 deletions

_freeze/correlations/figure-html/unnamed-chunk-6-1.svg

Lines changed: 189 additions & 192 deletions

_freeze/correlations/figure-html/unnamed-chunk-8-1.svg

Lines changed: 134 additions & 137 deletions

_freeze/epidf/execute-results/html.json

Lines changed: 2 additions & 2 deletions
Large diffs are not rendered by default.

_freeze/epidf/figure-html/unnamed-chunk-11-1.svg

Lines changed: 474 additions & 470 deletions

_freeze/epidf/figure-html/unnamed-chunk-13-1.svg

Lines changed: 715 additions & 711 deletions

_freeze/epidf/figure-html/unnamed-chunk-15-1.svg

Lines changed: 1998 additions & 1994 deletions

_freeze/epipredict/execute-results/html.json

Lines changed: 2 additions & 2 deletions
Large diffs are not rendered by default.

_freeze/flatline-forecaster/execute-results/html.json

Lines changed: 2 additions & 2 deletions
Large diffs are not rendered by default.

_freeze/flatline-forecaster/figure-html/unnamed-chunk-12-1.svg

Lines changed: 344 additions & 347 deletions

_freeze/flatline-forecaster/figure-html/unnamed-chunk-13-1.svg

Lines changed: 349 additions & 464 deletions

_freeze/flatline-forecaster/figure-html/unnamed-chunk-14-1.svg

Lines changed: 436 additions & 439 deletions

_freeze/flatline-forecaster/figure-html/unnamed-chunk-15-1.svg

Lines changed: 879 additions & 0 deletions

_freeze/forecast-framework/execute-results/html.json

Lines changed: 2 additions & 2 deletions
Large diffs are not rendered by default.

_freeze/growth-rates/execute-results/html.json

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

_freeze/growth-rates/figure-html/unnamed-chunk-11-1.svg

Lines changed: 298 additions & 301 deletions

_freeze/growth-rates/figure-html/unnamed-chunk-11-2.svg

Lines changed: 325 additions & 328 deletions

_freeze/growth-rates/figure-html/unnamed-chunk-4-1.svg

Lines changed: 1888 additions & 1911 deletions

_freeze/growth-rates/figure-html/unnamed-chunk-5-1.svg

Lines changed: 187 additions & 190 deletions

_freeze/growth-rates/figure-html/unnamed-chunk-7-1.svg

Lines changed: 304 additions & 307 deletions

_freeze/growth-rates/figure-html/unnamed-chunk-9-1.svg

Lines changed: 306 additions & 309 deletions

_freeze/index/execute-results/html.json

Lines changed: 2 additions & 2 deletions
Large diffs are not rendered by default.

_freeze/index/figure-html/unnamed-chunk-8-1.svg

Lines changed: 208 additions & 210 deletions

_freeze/outliers/execute-results/html.json

Lines changed: 2 additions & 2 deletions
Large diffs are not rendered by default.

_freeze/outliers/figure-html/unnamed-chunk-3-1.svg

Lines changed: 532 additions & 528 deletions

_freeze/outliers/figure-html/unnamed-chunk-7-1.svg

Lines changed: 1244 additions & 1042 deletions

_freeze/outliers/figure-html/unnamed-chunk-7-2.svg

Lines changed: 1222 additions & 1028 deletions

_freeze/outliers/figure-html/unnamed-chunk-9-1.svg

Lines changed: 538 additions & 534 deletions

_freeze/preprocessing-and-models/execute-results/html.json

Lines changed: 2 additions & 2 deletions
Large diffs are not rendered by default.

_freeze/preprocessing-and-models/figure-html/unnamed-chunk-9-1.svg

Lines changed: 239 additions & 242 deletions

_freeze/slide/execute-results/html.json

Lines changed: 2 additions & 2 deletions
Large diffs are not rendered by default.

_freeze/slide/figure-html/unnamed-chunk-10-1.svg

Lines changed: 308 additions & 0 deletions

_freeze/slide/figure-html/unnamed-chunk-12-1.svg

Lines changed: 16561 additions & 2557 deletions

_freeze/slide/figure-html/unnamed-chunk-16-1.svg

Lines changed: 2860 additions & 0 deletions

_freeze/slide/figure-html/unnamed-chunk-8-1.svg

Lines changed: 12068 additions & 12103 deletions

_freeze/sliding-forecasters/execute-results/html.json

Lines changed: 2 additions & 2 deletions
Large diffs are not rendered by default.

_freeze/sliding-forecasters/figure-html/plot-ar-asof-1.svg

Lines changed: 1771 additions & 1768 deletions

_freeze/sliding-forecasters/figure-html/plot-arx-1.svg

Lines changed: 1739 additions & 1725 deletions

_freeze/sliding-forecasters/figure-html/plot-can-fc-boost-1.svg

Lines changed: 4993 additions & 4708 deletions

_freeze/sliding-forecasters/figure-html/plot-can-fc-lr-1.svg

Lines changed: 4969 additions & 4476 deletions

_freeze/tidymodels-intro/execute-results/html.json

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

_freeze/tidymodels-intro/figure-html/unnamed-chunk-23-1.svg

Lines changed: 322 additions & 325 deletions

_freeze/tidymodels-intro/figure-html/unnamed-chunk-26-1.svg

Lines changed: 136 additions & 137 deletions

_freeze/tidymodels-regression/execute-results/html.json

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

_freeze/tidymodels-regression/figure-html/unnamed-chunk-21-1.svg

Lines changed: 2614 additions & 2620 deletions

_freeze/tidymodels-regression/figure-html/unnamed-chunk-24-1.svg

Lines changed: 2584 additions & 2596 deletions

archive.qmd

Lines changed: 51 additions & 79 deletions
Original file line numberDiff line numberDiff line change
@@ -16,18 +16,17 @@ claims, available through the [COVIDcast
1616
API](https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html). This
1717
signal is subject to very heavy and regular revision; you can read more about it
1818
on its [API documentation
19-
page](https://cmu-delphi.github.io/delphi-epidata/api/covidcast-signals/doctor-visits.html). We'll use the offline version stored in `{epidatasets}`.
20-
19+
page](https://cmu-delphi.github.io/delphi-epidata/api/covidcast-signals/doctor-visits.html).
20+
We'll use the offline version stored in `{epidatasets}`.
2121

2222
```{r, include=FALSE}
2323
source("_common.R")
2424
```
2525

2626
## Getting data into `epi_archive` format
2727

28-
An `epi_archive` object
29-
can be constructed from a data frame, data table, or tibble, provided that it
30-
has (at least) the following columns:
28+
An `epi_archive` object can be constructed from a data frame, data table, or
29+
tibble, provided that it has (at least) the following columns:
3130

3231
* `geo_value`: the geographic value associated with each row of measurements.
3332
* `time_value`: the time value associated with each row of measurements.
@@ -37,7 +36,7 @@ has (at least) the following columns:
3736
the data for January 14, 2022 that were available one day later.
3837

3938
As we can see from the above, the data frame returned by
40-
`epidatr::covidcast()` has the columns required for the `epi_archive`
39+
`epidatr::pub_covidcast()` has the columns required for the `epi_archive`
4140
format, so we use
4241
`as_epi_archive()` to cast it into `epi_archive` format.[^1]
4342

@@ -48,17 +47,17 @@ to the [compactify vignette](https://cmu-delphi.github.io/epiprocess/articles/co
4847

4948
```{r}
5049
x <- archive_cases_dv_subset_dt %>%
51-
select(geo_value, time_value, version, percent_cli) %>%
50+
select(geo_value, time_value, version, percent_cli) %>%
5251
as_epi_archive(compactify = TRUE)
5352
5453
class(x)
5554
print(x)
5655
```
5756

58-
An `epi_archive` is special kind of class called an R6 class. Its primary field
59-
is a data table `DT`, which is of class `data.table` (from the `data.table`
60-
package), and has columns `geo_value`, `time_value`, `version`, as well as any
61-
number of additional columns.
57+
An `epi_archive` is an S3 class. Its primary field is a data table `DT`, which
58+
is of class `data.table` (from the `{data.table}` package), and has columns
59+
`geo_value`, `time_value`, `version`, as well as any number of additional
60+
columns.
6261

6362
```{r}
6463
class(x$DT)
@@ -70,33 +69,18 @@ for the data table, as well as any other specified in the metadata (described
7069
below). There can only be a single row per unique combination of key variables,
7170
and therefore the key variables are critical for figuring out how to generate a
7271
snapshot of data from the archive, as of a given version (also described below).
73-
74-
```{r, error=TRUE}
75-
key(x$DT)
76-
```
77-
78-
In general, the last version of each observation is carried forward (LOCF) to
79-
fill in data between recorded versions. **A word of caution:** R6 objects,
80-
unlike most other objects in R, have reference semantics. An important
81-
consequence of this is that objects are not copied when modified.
82-
72+
8373
```{r}
84-
original_value <- x$DT$percent_cli[1]
85-
y <- x # This DOES NOT make a copy of x
86-
y$DT$percent_cli[1] = 0
87-
head(y$DT)
88-
head(x$DT)
89-
x$DT$percent_cli[1] <- original_value
74+
data.table::key(x$DT)
9075
```
9176

92-
To make a copy, we can use the `clone()` method for an R6 class, as in `y <-
93-
x$clone()`. You can read more about reference semantics in Hadley Wickham's
94-
[Advanced R](https://adv-r.hadley.nz/r6.html#r6-semantics) book.
77+
In general, the last version of each observation is carried forward (LOCF) to
78+
fill in data between recorded versions.
9579

9680
## Some details on metadata
9781

9882
The following pieces of metadata are included as fields in an `epi_archive`
99-
object:
83+
object:
10084

10185
* `geo_type`: the type for the geo values.
10286
* `time_type`: the type for the time values.
@@ -112,20 +96,18 @@ call (as it did in the case above).
11296

11397
A key method of an `epi_archive` class is `as_of()`, which generates a snapshot
11498
of the archive in `epi_df` format. This represents the most up-to-date values of
115-
the signal variables as of a given version. This can be accessed via `x$as_of()`
116-
for an `epi_archive` object `x`, but the package also provides a simple wrapper
117-
function `epix_as_of()` since this is likely a more familiar interface for users
118-
not familiar with R6 (or object-oriented programming).
99+
the signal variables as of a given version. This can be accessed via
100+
`epix_as_of()`.
119101

120102
```{r}
121-
x_snapshot <- epix_as_of(x, max_version = as.Date("2021-06-01"))
103+
x_snapshot <- epix_as_of(x, version = as.Date("2021-06-01"))
122104
class(x_snapshot)
123105
x_snapshot
124106
max(x_snapshot$time_value)
125107
attributes(x_snapshot)$metadata$as_of
126108
```
127109

128-
We can see that the max time value in the `epi_df` object `x_snapshot` that was
110+
We can see that the max time value in the `epi_df` object `x_snapshot` that was
129111
generated from the archive is May 29, 2021, even though the specified version
130112
date was June 1, 2021. From this we can infer that the doctor's visits signal
131113
was 2 days latent on June 1. Also, we can see that the metadata in the `epi_df`
@@ -134,65 +116,67 @@ object has the version date recorded in the `as_of` field.
134116
By default, using the maximum of the `version` column in the underlying data table in an
135117
`epi_archive` object itself generates a snapshot of the latest values of signal
136118
variables in the entire archive. The `epix_as_of()` function issues a warning in
137-
this case, since updates to the current version may still come in at a later
119+
this case, since updates to the current version may still come in at a later
138120
point in time, due to various reasons, such as synchronization issues.
139121

140122
```{r}
141-
x_latest <- epix_as_of(x, max_version = max(x$DT$version))
123+
x_latest <- epix_as_of(x, version = max(x$DT$version))
142124
```
143125

144126
Below, we pull several snapshots from the archive, spaced one month apart. We
145127
overlay the corresponding signal curves as colored lines, with the version dates
146-
marked by dotted vertical lines, and draw the latest curve in black (from the
128+
marked by dotted vertical lines, and draw the latest curve in black (from the
147129
latest snapshot `x_latest` that the archive can provide).
148130

149131
```{r, fig.width = 8, fig.height = 7}
150132
self_max <- max(x$DT$version)
151133
versions <- seq(as.Date("2020-06-01"), self_max - 1, by = "1 month")
152134
snapshots <- map(
153-
versions,
154-
function(v) {
155-
epix_as_of(x, max_version = v) %>% mutate(version = v)
156-
}) %>%
135+
versions,
136+
function(v) {
137+
epix_as_of(x, version = v) %>% mutate(version = v)
138+
}
139+
) %>%
157140
list_rbind() %>%
158141
bind_rows(x_latest %>% mutate(version = self_max)) %>%
159142
mutate(latest = version == self_max)
160143
```
161144

162145
```{r, fig.height=7}
163146
#| code-fold: true
164-
ggplot(snapshots %>% filter(!latest),
165-
aes(x = time_value, y = percent_cli)) +
166-
geom_line(aes(color = factor(version)), na.rm = TRUE) +
147+
ggplot(
148+
snapshots %>% filter(!latest),
149+
aes(x = time_value, y = percent_cli)
150+
) +
151+
geom_line(aes(color = factor(version)), na.rm = TRUE) +
167152
geom_vline(aes(color = factor(version), xintercept = version), lty = 2) +
168-
facet_wrap(~ geo_value, scales = "free_y", ncol = 1) +
153+
facet_wrap(~geo_value, scales = "free_y", ncol = 1) +
169154
scale_x_date(minor_breaks = "month", date_labels = "%b %Y") +
170155
scale_color_viridis_d(option = "A", end = .9) +
171-
labs(x = "Date", y = "% of doctor's visits with CLI") +
156+
labs(x = "Date", y = "% of doctor's visits with CLI") +
172157
theme(legend.position = "none") +
173-
geom_line(data = snapshots %>% filter(latest),
174-
aes(x = time_value, y = percent_cli),
175-
inherit.aes = FALSE, color = "black", na.rm = TRUE)
158+
geom_line(
159+
data = snapshots %>% filter(latest),
160+
aes(x = time_value, y = percent_cli),
161+
inherit.aes = FALSE, color = "black", na.rm = TRUE
162+
)
176163
```
177164

178165
We can see some interesting and highly nontrivial revision behavior: at some
179166
points in time the provisional data snapshots grossly underestimate the latest
180167
curve (look in particular at Florida close to the end of 2021), and at others
181-
they overestimate it (both states towards the beginning of 2021), though not
168+
they overestimate it (both states towards the beginning of 2021), though not
182169
quite as dramatically. Modeling the revision process, which is often called
183170
*backfill modeling*, is an important statistical problem in it of itself.
184171

185-
186-
## Merging `epi_archive` objects
172+
## Merging `epi_archive` objects
187173

188174
Now we demonstrate how to merge two `epi_archive` objects together, e.g., so
189175
that grabbing data from multiple sources as of a particular version can be
190-
performed with a single `as_of` call. The `epi_archive` class provides a method
191-
`merge()` precisely for this purpose. The wrapper function is called
192-
`epix_merge()`; this wrapper avoids mutating its inputs, while `x$merge` will
193-
mutate `x`. Below we merge the working `epi_archive` of versioned percentage CLI
194-
from outpatient visits to another one of versioned COVID-19 case reporting data,
195-
which we fetch the from the [COVIDcast
176+
performed with a single `as_of` call. The `epiprocess` packages provides
177+
`epix_merge()` for this purpose. Below we merge the working `epi_archive` of
178+
versioned percentage CLI from outpatient visits to another one of versioned
179+
COVID-19 case reporting data, which we fetch the from the [COVIDcast
196180
API](https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html/), on the
197181
rate scale (counts per 100,000 people in the population).
198182

@@ -209,39 +193,27 @@ When merging archives, unless the archives have identical data release patterns,
209193
the other).
210194

211195
```{r, message = FALSE, warning = FALSE,eval=FALSE}
212-
# This code is for illustration and doesn't run.
196+
# This code is for illustration and doesn't run.
213197
# The result is saved/loaded in the (hidden) next chunk from `{epidatasets}`
214-
y <- covidcast(
215-
data_source = "jhu-csse",
198+
y <- pub_covidcast(
199+
source = "jhu-csse",
216200
signals = "confirmed_7dav_incidence_prop",
217201
time_type = "day",
218202
geo_type = "state",
219203
time_values = epirange(20200601, 20211201),
220204
geo_values = "ca,fl,ny,tx",
221205
issues = epirange(20200601, 20211201)
222206
) %>%
223-
fetch() %>%
224207
select(geo_value, time_value, version = issue, case_rate_7d_av = value) %>%
225208
as_epi_archive(compactify = TRUE)
226209
227-
x$merge(y, sync = "locf", compactify = FALSE)
210+
x <- epix_merge(x, y, sync = "locf", compactify = FALSE)
228211
print(x)
229212
head(x$DT)
230213
```
231214

232-
```{r, echo=FALSE}
233-
x <- archive_cases_dv_subset
234-
print(x)
235-
head(x$DT)
236-
```
237-
238-
Importantly, see that `x$merge` mutated `x` to hold the result of the merge. We
239-
could also have used `xy = epix_merge(x, y)` to avoid mutating `x`. See the
240-
documentation for either for more detailed descriptions of what mutation,
241-
pointer aliasing, and pointer reseating is possible.
242-
243215
## Sliding version-aware computations
244-
216+
245217
::: {.callout-note}
246218
TODO: need a simple example here.
247-
:::
219+
:::

0 commit comments

Comments
 (0)