diff --git a/DESCRIPTION b/DESCRIPTION index 3e02fafe..3e1fc47e 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -44,6 +44,8 @@ Suggests: dplyr, ggplot2, knitr, + maps, + mapproj, rmarkdown, rlang, testthat (>= 3.1.5), diff --git a/README.Rmd b/README.Rmd index cd43d3c9..792ed89f 100644 --- a/README.Rmd +++ b/README.Rmd @@ -21,86 +21,34 @@ ggplot2::theme_set(ggplot2::theme_bw()) [![codecov](https://codecov.io/gh/dsweber2/epidatr/branch/dev/graph/badge.svg?token=jVHL9eHZNZ)](https://app.codecov.io/gh/dsweber2/epidatr) -The [Delphi Epidatr package](https://cmu-delphi.github.io/epidatr/) is an R front-end for the [Delphi Epidata API](https://cmu-delphi.github.io/delphi-epidata/), which provides real-time access to epidemiological surveillance data for influenza, COVID-19, and other diseases for the USA at various geographical resolutions, both from official government sources such as the [Center for Disease Control (CDC)](https://www.cdc.gov/datastatistics/index.html) and [Google Trends](https://cmu-delphi.github.io/delphi-epidata/api/covidcast-signals/google-symptoms.html) and private partners such as [Facebook](https://delphi.cmu.edu/blog/2020/08/26/covid-19-symptom-surveys-through-facebook/) and [Change Healthcare](https://www.changehealthcare.com/). It is built and maintained by the Carnegie Mellon University [Delphi research group](https://delphi.cmu.edu/). - -This package is designed to streamline the downloading and usage of data from the [Delphi Epidata -API](https://cmu-delphi.github.io/delphi-epidata/). It provides a simple R interface to the API, including functions for downloading data, parsing the results, and converting the data into a tidy format. The API stores a historical record of all data, including corrections and updates, which is particularly useful for accurately backtesting forecasting models. We also provide packages for downstream data processing ([epiprocess](https://github.com/cmu-delphi/epiprocess)) and modeling ([epipredict](https://github.com/cmu-delphi/epipredict)). - -## Usage - -You can find detailed docs here: - -```{r} -library(epidatr) -# Obtain the smoothed covid-like illness (CLI) signal from the -# Facebook survey as it was on April 10, 2021 for the US -epidata <- pub_covidcast( - source = "fb-survey", - signals = "smoothed_cli", - geo_type = "nation", - time_type = "day", - geo_values = "us", - time_values = epirange(20210101, 20210601), - as_of = "2021-06-01" -) -epidata -``` +The [Delphi `epidatr` package](https://cmu-delphi.github.io/epidatr/) is an R front-end for the [Delphi Epidata API](https://cmu-delphi.github.io/delphi-epidata/), which provides real-time access to epidemiological surveillance data for influenza, COVID-19, and other diseases. `epidatr` is built and maintained by the Carnegie Mellon University [Delphi research group](https://delphi.cmu.edu/). -```{r fb-cli-signal} -# Plot this data -library(ggplot2) -ggplot(epidata, aes(x = time_value, y = value)) + - geom_line() + - labs( - title = "Smoothed CLI from Facebook Survey", - subtitle = "US, 2021", - x = "Date", - y = "CLI" - ) -``` +Data is available for the United States and a handful of other countries at various geographical resolutions, both from official government sources such as the [US Center for Disease Control (CDC)](https://www.cdc.gov/datastatistics/index.html), and private partners such as [Facebook](https://delphi.cmu.edu/blog/2020/08/26/covid-19-symptom-surveys-through-facebook/) and [Change Healthcare](https://www.changehealthcare.com/). The API stores a historical record of all data, including corrections and updates, which is particularly useful for accurately backtesting forecasting models. +`epidatr` is designed to streamline the downloading and usage of data from the Epidata API. The package provides a simple R interface to the API, including functions for downloading data, parsing the results, and converting the data into a tidy format. We also provide the [epiprocess](https://github.com/cmu-delphi/epiprocess) package for downstream data processing and [epipredict](https://github.com/cmu-delphi/epipredict) for modeling. -## Installation +Consult the [Epidata API documentation](https://cmu-delphi.github.io/delphi-epidata/) for details on the data included in the API, API key registration, licensing, and how to cite this data in your work. The documentation lists all the data sources and signals available through this API for [COVID-19](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html) and for [other diseases](https://cmu-delphi.github.io/delphi-epidata/api/README.html#source-specific-parameters). -You can install the stable version of this package from CRAN: +**To get started** using this package, view the Getting Started guide at `vignette("epidatr")`. -```R -install.packages("epidatr") -pak::pkg_install("epidatr") -renv::install("epidatr") -``` +## For users of the `covidcast` R package -Or if you want the development version, install from GitHub: +`epidatr` is a complete rewrite of the [`covidcast` package](https://cmu-delphi.github.io/covidcast/covidcastR/), with a focus on speed, reliability, and ease of use. The `covidcast` package is deprecated and will no longer be updated. -```R -# Install the dev version using `pak` or `remotes` -pak::pkg_install("cmu-delphi/epidatr") -remotes::install_github("cmu-delphi/epidatr") -renv::install("cmu-delphi/epidatr") -``` +## Get updates + +**You should consider subscribing to the [API mailing list](https://lists.andrew.cmu.edu/mailman/listinfo/delphi-covidcast-api)** to be notified of package updates, new data sources, corrections, and other updates. -### API Keys +## Usage terms and citation -The Delphi API requires a (free) API key for full functionality. To generate -your key, register for a pseudo-anonymous account -[here](https://api.delphi.cmu.edu/epidata/admin/registration_form) and see more -discussion on the [general API -website](https://cmu-delphi.github.io/delphi-epidata/api/api_keys.html). See the -`save_api_key()` function documentation for details on how to use your API key. +We request that if you use the `epidatr` package in your work, or use any of the data provided by the Delphi Epidata API through non-`covidcast` endpoints, that you cite us using the citation given by [`citation("epidatr")`](https://cmu-delphi.github.io/epidatr/dev/authors.html#citation). If you use any of the data from the `covidcast` endpoint, please use the [COVIDcast citation](https://cmu-delphi.github.io/covidcast/covidcastR/authors.html#citation) as well. See the [COVIDcast licensing documentation](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_licensing.html) and the [licensing documentation for other endpoints](https://cmu-delphi.github.io/delphi-epidata/api/README.html#data-licensing) for information about citing the datasets provided by the API. + +**Warning:** If you use data from the Epidata API to power a product, dashboard, app, or other service, please download the data you need and store it centrally rather than making API requests for every user. Our server resources are limited and cannot support high-volume interactive use. + +See also the [Terms of Use](https://delphi.cmu.edu/covidcast/terms-of-use/), noting that the data is a research product and not warranted for a particular purpose. -Note that the private endpoints (i.e. those prefixed with `pvt_`) require a -separate key that needs to be passed as an argument. These endpoints require -specific data use agreements to access. [mit-image]: https://img.shields.io/badge/License-MIT-yellow.svg [mit-url]: https://opensource.org/license/mit/ [github-actions-image]: https://github.com/cmu-delphi/epidatr/workflows/ci/badge.svg [github-actions-url]: https://github.com/cmu-delphi/epidatr/actions - -## Get updates - -You should consider subscribing to the [API mailing list](https://lists.andrew.cmu.edu/mailman/listinfo/delphi-covidcast-api) to be notified of package updates, new data sources, corrections, and other updates. - -## For users of the `covidcast` R package - -The `epidatr` package is a complete rewrite of the [`covidcast` package](https://cmu-delphi.github.io/covidcast/covidcastR/), with a focus on speed, reliability, and ease of use. The `covidcast` package is deprecated and will no longer be updated. diff --git a/README.md b/README.md index ea27a6fd..c44270a8 100644 --- a/README.md +++ b/README.md @@ -12,126 +12,80 @@ Actions](https://github.com/cmu-delphi/epidatr/workflows/ci/badge.svg)](https:// [![codecov](https://codecov.io/gh/dsweber2/epidatr/branch/dev/graph/badge.svg?token=jVHL9eHZNZ)](https://app.codecov.io/gh/dsweber2/epidatr) -The [Delphi Epidatr package](https://cmu-delphi.github.io/epidatr/) is +The [Delphi `epidatr` package](https://cmu-delphi.github.io/epidatr/) is an R front-end for the [Delphi Epidata API](https://cmu-delphi.github.io/delphi-epidata/), which provides real-time access to epidemiological surveillance data for influenza, -COVID-19, and other diseases for the USA at various geographical -resolutions, both from official government sources such as the [Center -for Disease Control -(CDC)](https://www.cdc.gov/datastatistics/index.html) and [Google -Trends](https://cmu-delphi.github.io/delphi-epidata/api/covidcast-signals/google-symptoms.html) -and private partners such as -[Facebook](https://delphi.cmu.edu/blog/2020/08/26/covid-19-symptom-surveys-through-facebook/) -and [Change Healthcare](https://www.changehealthcare.com/). It is built -and maintained by the Carnegie Mellon University [Delphi research +COVID-19, and other diseases. `epidatr` is built and maintained by the +Carnegie Mellon University [Delphi research group](https://delphi.cmu.edu/). -This package is designed to streamline the downloading and usage of data -from the [Delphi Epidata -API](https://cmu-delphi.github.io/delphi-epidata/). It provides a simple -R interface to the API, including functions for downloading data, -parsing the results, and converting the data into a tidy format. The API +Data is available for the United States and a handful of other countries +at various geographical resolutions, both from official government +sources such as the [US Center for Disease Control +(CDC)](https://www.cdc.gov/datastatistics/index.html), and private +partners such as +[Facebook](https://delphi.cmu.edu/blog/2020/08/26/covid-19-symptom-surveys-through-facebook/) +and [Change Healthcare](https://www.changehealthcare.com/). The API stores a historical record of all data, including corrections and updates, which is particularly useful for accurately backtesting -forecasting models. We also provide packages for downstream data -processing ([epiprocess](https://github.com/cmu-delphi/epiprocess)) and -modeling ([epipredict](https://github.com/cmu-delphi/epipredict)). - -## Usage - -You can find detailed docs here: - -``` r -library(epidatr) -# Obtain the smoothed covid-like illness (CLI) signal from the -# Facebook survey as it was on April 10, 2021 for the US -epidata <- pub_covidcast( - source = "fb-survey", - signals = "smoothed_cli", - geo_type = "nation", - time_type = "day", - geo_values = "us", - time_values = epirange(20210101, 20210601), - as_of = "2021-06-01" -) -epidata -#> # A tibble: 151 × 15 -#> geo_value signal source geo_type time_type time_value direction issue -#> -#> 1 us smoothed… fb-su… nation day 2021-01-01 NA 2021-01-06 -#> 2 us smoothed… fb-su… nation day 2021-01-02 NA 2021-01-07 -#> 3 us smoothed… fb-su… nation day 2021-01-03 NA 2021-01-08 -#> 4 us smoothed… fb-su… nation day 2021-01-04 NA 2021-01-09 -#> 5 us smoothed… fb-su… nation day 2021-01-05 NA 2021-01-10 -#> 6 us smoothed… fb-su… nation day 2021-01-06 NA 2021-01-29 -#> 7 us smoothed… fb-su… nation day 2021-01-07 NA 2021-01-29 -#> 8 us smoothed… fb-su… nation day 2021-01-08 NA 2021-01-29 -#> 9 us smoothed… fb-su… nation day 2021-01-09 NA 2021-01-29 -#> 10 us smoothed… fb-su… nation day 2021-01-10 NA 2021-01-29 -#> # ℹ 141 more rows -#> # ℹ 7 more variables: lag , missing_value , missing_stderr , -#> # missing_sample_size , value , stderr , sample_size -``` - -``` r -# Plot this data -library(ggplot2) -ggplot(epidata, aes(x = time_value, y = value)) + - geom_line() + - labs( - title = "Smoothed CLI from Facebook Survey", - subtitle = "US, 2021", - x = "Date", - y = "CLI" - ) -``` - - - -## Installation - -You can install the stable version of this package from CRAN: - -``` r -install.packages("epidatr") -pak::pkg_install("epidatr") -renv::install("epidatr") -``` - -Or if you want the development version, install from GitHub: - -``` r -# Install the dev version using `pak` or `remotes` -pak::pkg_install("cmu-delphi/epidatr") -remotes::install_github("cmu-delphi/epidatr") -renv::install("cmu-delphi/epidatr") -``` - -### API Keys - -The Delphi API requires a (free) API key for full functionality. To -generate your key, register for a pseudo-anonymous account -[here](https://api.delphi.cmu.edu/epidata/admin/registration_form) and -see more discussion on the [general API -website](https://cmu-delphi.github.io/delphi-epidata/api/api_keys.html). -See the `save_api_key()` function documentation for details on how to -use your API key. - -Note that the private endpoints (i.e. those prefixed with `pvt_`) -require a separate key that needs to be passed as an argument. These -endpoints require specific data use agreements to access. - -## Get updates - -You should consider subscribing to the [API mailing -list](https://lists.andrew.cmu.edu/mailman/listinfo/delphi-covidcast-api) -to be notified of package updates, new data sources, corrections, and -other updates. +forecasting models. + +`epidatr` is designed to streamline the downloading and usage of data +from the Epidata API. The package provides a simple R interface to the +API, including functions for downloading data, parsing the results, and +converting the data into a tidy format. We also provide the +[epiprocess](https://github.com/cmu-delphi/epiprocess) package for +downstream data processing and +[epipredict](https://github.com/cmu-delphi/epipredict) for modeling. + +Consult the [Epidata API +documentation](https://cmu-delphi.github.io/delphi-epidata/) for details +on the data included in the API, API key registration, licensing, and +how to cite this data in your work. The documentation lists all the data +sources and signals available through this API for +[COVID-19](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html) +and for [other +diseases](https://cmu-delphi.github.io/delphi-epidata/api/README.html#source-specific-parameters). + +**To get started** using this package, view the Getting Started guide at +`vignette("epidatr")`. ## For users of the `covidcast` R package -The `epidatr` package is a complete rewrite of the [`covidcast` +`epidatr` is a complete rewrite of the [`covidcast` package](https://cmu-delphi.github.io/covidcast/covidcastR/), with a focus on speed, reliability, and ease of use. The `covidcast` package is deprecated and will no longer be updated. + +## Get updates + +**You should consider subscribing to the [API mailing +list](https://lists.andrew.cmu.edu/mailman/listinfo/delphi-covidcast-api)** +to be notified of package updates, new data sources, corrections, and +other updates. + +## Usage terms and citation + +We request that if you use the `epidatr` package in your work, or use +any of the data provided by the Delphi Epidata API through +non-`covidcast` endpoints, that you cite us using the citation given by +[`citation("epidatr")`](https://cmu-delphi.github.io/epidatr/dev/authors.html#citation). +If you use any of the data from the `covidcast` endpoint, please use the +[COVIDcast +citation](https://cmu-delphi.github.io/covidcast/covidcastR/authors.html#citation) +as well. See the [COVIDcast licensing +documentation](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_licensing.html) +and the [licensing documentation for other +endpoints](https://cmu-delphi.github.io/delphi-epidata/api/README.html#data-licensing) +for information about citing the datasets provided by the API. + +**Warning:** If you use data from the Epidata API to power a product, +dashboard, app, or other service, please download the data you need and +store it centrally rather than making API requests for every user. Our +server resources are limited and cannot support high-volume interactive +use. + +See also the [Terms of +Use](https://delphi.cmu.edu/covidcast/terms-of-use/), noting that the +data is a research product and not warranted for a particular purpose. diff --git a/vignettes/epidatr.Rmd b/vignettes/epidatr.Rmd index 6a512bd2..d96a2fdf 100644 --- a/vignettes/epidatr.Rmd +++ b/vignettes/epidatr.Rmd @@ -1,8 +1,10 @@ --- -title: "Delphi Epidata R API Client" -output: rmarkdown::html_vignette +title: "Get started with epidatr" +output: + rmarkdown::html_vignette: + code_folding: show vignette: > - %\VignetteIndexEntry{Delphi Epidata R API Client} + %\VignetteIndexEntry{Get started with epidatr} %\VignetteEngine{knitr::rmarkdown} %\VignetteDepends{ggplot2} \usepackage[utf8]{inputenc} @@ -11,379 +13,270 @@ vignette: > ```{r, echo = FALSE, message = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") options(tibble.print_min = 4L, tibble.print_max = 4L, max.print = 4L) -library(epidatr) -library(dplyr) ``` The epidatr package provides access to all the endpoints of the [Delphi Epidata API](https://cmu-delphi.github.io/delphi-epidata/), and can be used to make -requests for specific signals on specific dates and in selected geographic +requests for specific signals on specific dates and in select geographic regions. -We recommend you register for an API key. While most endpoints are available -without one, there are [limits on API usage for anonymous -users](https://cmu-delphi.github.io/delphi-epidata/api/api_keys.html), including -a rate limit. See `save_api_key()` for details on how to obtain an API key and -set this package to use it. -## Basic Usage +## Setup -Fetching some data from the Delphi Epidata API is simple. Suppose we are -interested in the [`covidcast` -endpoint](https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html), which -provides access to a range of data on COVID-19. Reviewing the endpoint -documentation, we see that we need to specify a data source name, a signal name, -a geographic level, a time resolution, and the location and times of interest. +### Installation -In this case, the `pub_covidcast()` function lets us specify these parameters for -the endpoint and returns a tibble with the results: +You can install the stable version of this package from CRAN: -```{r} -epidata <- pub_covidcast( - "fb-survey", "smoothed_cli", "state", "day", "pa", - epirange(20210105, 20210410) -) -epidata +```R +install.packages("epidatr") +pak::pkg_install("epidatr") +renv::install("epidatr") ``` -We can then easily plot the data using ggplot2: +Or if you want the development version, install from GitHub: -```{r, out.height="65%"} -library(ggplot2) -ggplot(epidata, aes(x = time_value, y = value)) + - geom_line() + - labs( - title = "Smoothed CLI from Facebook Survey", - subtitle = "PA, 2021", - x = "Date", - y = "CLI" - ) +```R +# Install the dev version using `pak` or `remotes` +pak::pkg_install("cmu-delphi/epidatr@dev") +remotes::install_github("cmu-delphi/epidatr", ref = "dev") +renv::install("cmu-delphi/epidatr@dev") ``` -The [Delphi Epidata API documentation](https://cmu-delphi.github.io/delphi-epidata/) has more information on the available endpoints and arguments. You can also use the `avail_endpoints()` function to get a table of endpoint functions: +### API Keys -```{r} -avail_endpoints() -``` +The Delphi API requires a (free) API key for full functionality. While most +endpoints are available without one, there are +[limits on API usage for anonymous users](https://cmu-delphi.github.io/delphi-epidata/api/api_keys.html), +including a rate limit. -Example queries with all the endpoint functions available in this package are given [below](#example-queries). +To generate your key, +[register for a pseudo-anonymous account](https://api.delphi.cmu.edu/epidata/admin/registration_form). +See the `save_api_key()` function documentation for details on how to set up +`epidatr` to use your API key. -## Advanced Usage (Experimental) +_Note_ that private endpoints (i.e. those prefixed with `pvt_`) require a +separate key that needs to be passed as an argument. These endpoints require +specific data use agreements to access. -The [COVIDcast -endpoint](https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html) of the -Epidata API contains many separate data sources and signals. It can be difficult -to find the name of the signal you're looking for, so you can use -`covidcast_epidata()` to get help with finding sources and functions without -leaving R. -The `covidcast_epidata()` function fetches a list of all signals, and returns an -object containing fields for every signal: +## Basic Usage -```{r} -epidata <- covidcast_epidata() -epidata$signals -``` +Fetching data from the Delphi Epidata API is simple. Suppose we are +interested in the +[`covidcast` endpoint](https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html), +which provides access to a +[wide range of data](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html) +on COVID-19. Reviewing the endpoint documentation, we see that we +[need to specify](https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html#constructing-api-queries) +a data source name, a signal name, a geographic level, a time resolution, and +the location and times of interest. -If you use an editor that supports tab completion, such as RStudio, type -`epidata$signals$` and wait for the tab completion popup. You will be able to -type the name of signals and have the autocomplete feature select them from the -list for you. Note that some signal names have dashes in them, so to access them -we rely on the backtick operator: +The `pub_covidcast()` function lets us access the `covidcast` endpoint: ```{r} -epidata$signals$`fb-survey:smoothed_cli` -``` - -These objects can be used directly to fetch data, without requiring us to use -the `pub_covidcast()` function. Simply use the `$call` attribute of the object: +library(epidatr) +library(dplyr) -```{r} -epidata$signals$`fb-survey:smoothed_cli`$call("state", "pa", epirange(20210405, 20210410)) +# Obtain the most up-to-date version of the smoothed covid-like illness (CLI) +# signal from the COVID-19 Trends and Impact survey for the US +epidata <- pub_covidcast( + source = "fb-survey", + signals = "smoothed_cli", + geo_type = "nation", + time_type = "day", + geo_values = "us", + time_values = epirange(20210105, 20210410) +) +knitr::kable(head(epidata)) ``` -## Advanced Usage (Debugging) +`pub_covidcast()` returns a `tibble`. (Here we’re using `knitr::kable()` to make +it more readable.) Each row represents one observation in Pennsylvania on one +day. The state abbreviation is given in the `geo_value` column, the date in +the `time_value` column. Here `value` is the requested signal -- in this +case, the smoothed estimate of the percentage of people with COVID-like +illness, based on the symptom surveys, and `stderr` is its standard error. -We can obtain the [`epidata_call`] object underlying a request by setting the -`dry_run` argument to `TRUE` in `fetch_args_list()`: +The Epidata API makes signals available at different geographic levels, +depending on the endpoint. To request signals for all states instead of the +entire US, we use the `geo_type` argument paired with `*` for the +`geo_values` argument. (Only some endpoints allow for the use of `*` to +access data at all locations. Check the help for a given endpoint to see if +it supports `*`.) -```{r} +```{r, eval = FALSE} +# Obtain the most up-to-date version of the smoothed covid-like illness (CLI) +# signal from the COVID-19 Trends and Impact survey for all states pub_covidcast( - "fb-survey", "smoothed_cli", "state", "day", "pa", - epirange(20210405, 20210410), - fetch_args = fetch_args_list(dry_run = TRUE) + source = "fb-survey", + signals = "smoothed_cli", + geo_type = "state", + time_type = "day", + geo_values = "*", + time_values = epirange(20210105, 20210410) ) ``` -## Example Queries - -(Some endpoints allow for the use of `*` to access data at all locations. Check the help for a given endpoint to see if it supports `*`.) - -### COVIDcast Main Endpoint - -API docs: +We can fetch a subset of states by listing out the desired locations: -County geo_values are [FIPS codes](https://en.wikipedia.org/wiki/List_of_United_States_FIPS_codes_by_county) and are discussed in the API docs [here](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_geography.html). The example below is for Orange County, California. - -```{r} +```{r, eval = FALSE} +# Obtain the most up-to-date version of the smoothed covid-like illness (CLI) +# signal from the COVID-19 Trends and Impact survey for Pennsylvania pub_covidcast( source = "fb-survey", - signals = "smoothed_accept_covid_vaccine", - geo_type = "county", + signals = "smoothed_cli", + geo_type = "state", time_type = "day", - time_values = epirange(20201221, 20201225), - geo_values = "06059" + geo_values = c("pa", "ca", "fl"), + time_values = epirange(20210105, 20210410) ) ``` -The `covidcast` endpoint supports `*` in its time and geo fields: +We can also request data for a single location at a time, via the `geo_values` argument. ```{r} -pub_covidcast( +# Obtain the most up-to-date version of the smoothed covid-like illness (CLI) +# signal from the COVID-19 Trends and Impact survey for Pennsylvania +epidata <- pub_covidcast( source = "fb-survey", - signals = "smoothed_accept_covid_vaccine", - geo_type = "county", + signals = "smoothed_cli", + geo_type = "state", time_type = "day", - time_values = epirange(20201221, 20201225), - geo_values = "*" + geo_values = "pa", + time_values = epirange(20210105, 20210410) ) +knitr::kable(head(epidata)) ``` -### Other Covid Endpoints +## Getting versioned data -#### COVID-19 Hospitalization: Facility Lookup - -API docs: +The Epidata API stores a historical record of all data, including corrections +and updates, which is particularly useful for accurately backtesting +forecasting models. To fetch versioned data, we can use the `as_of` +argument. ```{r, eval = FALSE} -pub_covid_hosp_facility_lookup(city = "southlake") -pub_covid_hosp_facility_lookup(state = "WY") -# A non-example (there is no city called New York in Wyoming) -pub_covid_hosp_facility_lookup(state = "WY", city = "New York") -``` - -#### COVID-19 Hospitalization by Facility - -API docs: - -```{r, eval = FALSE} -pub_covid_hosp_facility( - hospital_pks = "100075", - collection_weeks = epirange(20200101, 20200501) +# Obtain the smoothed covid-like illness (CLI) signal from the COVID-19 +# Trends and Impact survey for Pennsylvania as it was on 2021-06-01 +pub_covidcast( + source = "fb-survey", + signals = "smoothed_cli", + geo_type = "state", + time_type = "day", + geo_values = "pa", + time_values = epirange(20210105, 20210410), + as_of = "2021-06-01" ) ``` -#### COVID-19 Hospitalization by State - -API docs: - -```{r, eval = FALSE} -pub_covid_hosp_state_timeseries(states = "MA", dates = "20200510") -``` - -### Flu Endpoints - -#### Delphi's ILINet forecasts - -API docs: - -```{r, eval = FALSE} -del <- pub_delphi(system = "ec", epiweek = 201501) -names(del[[1L]]$forecast) -``` - -#### FluSurv hospitalization data - -API docs: - -```{r, eval = FALSE} -pub_flusurv(locations = "ca", epiweeks = 202001) -``` - -#### Fluview data - -API docs: - -```{r, eval = FALSE} -pub_fluview(regions = "nat", epiweeks = epirange(201201, 202001)) -``` - -#### Fluview virological data from clinical labs - -API docs: - -```{r, eval = FALSE} -pub_fluview_clinical(regions = "nat", epiweeks = epirange(201601, 201701)) -``` - -#### Fluview metadata - -API docs: - -```{r, eval = FALSE} -pub_fluview_meta() -``` - -#### Google Flu Trends data - -API docs: +See `vignette("versioned-data")` for details and more ways to specify versioned data. -```{r, eval = FALSE} -pub_gft(locations = "hhs1", epiweeks = epirange(201201, 202001)) -``` -#### ECDC ILI +## Plotting -API docs: +Because the output data is in a standard `tibble` format, we can easily plot +it using `ggplot2`: -```{r, eval = FALSE} -pub_ecdc_ili(regions = "Armenia", epiweeks = 201840) +```{r, out.height="65%"} +library(ggplot2) +ggplot(epidata, aes(x = time_value, y = value)) + + geom_line() + + labs( + title = "Smoothed CLI from Facebook Survey", + subtitle = "PA, 2021", + x = "Date", + y = "CLI" + ) ``` -#### KCDC ILI - -API docs: +`ggplot2` can also be used to [create choropleths](https://r-graphics.org/recipe-miscgraph-choropleth). -```{r, eval = FALSE} -pub_kcdc_ili(regions = "ROK", epiweeks = 200436) -``` -#### NIDSS Flu +```{r, class.source = "fold-hide", out.height="65%"} +library(maps) -API docs: +# Obtain the most up-to-date version of the smoothed covid-like illness (CLI) +# signal from the COVID-19 Trends and Impact survey for all states on a single day +cli_states <- pub_covidcast( + source = "fb-survey", + signals = "smoothed_cli", + geo_type = "state", + time_type = "day", + geo_values = "*", + time_values = 20210410 +) -```{r, eval = FALSE} -pub_nidss_flu(regions = "taipei", epiweeks = epirange(200901, 201301)) -``` +# Get a mapping of states to longitude/latitude coordinates +states_map <- map_data("state") -#### ILI Nearby Nowcast +# Convert state abbreviations into state names +cli_states <- mutate( + cli_states, + state = ifelse( + geo_value == "dc", + "district of columbia", + state.name[match(geo_value, tolower(state.abb))] %>% tolower() + ) +) -API docs: +# Add coordinates for each state +cli_states <- left_join(states_map, cli_states, by = c("region" = "state")) -```{r, eval = FALSE} -pub_nowcast(locations = "ca", epiweeks = epirange(202201, 202319)) +# Plot +ggplot(cli_states, aes(x = long, y = lat, group = group, fill = value)) + + geom_polygon(colour = "black", linewidth = 0.2) + + coord_map("polyconic") + + labs( + title = "Smoothed CLI from Facebook Survey", + subtitle = "All states, 2021-04-10", + x = "Longitude", + y = "Latitude" + ) ``` -### Dengue Endpoints -#### Delphi's Dengue Nowcast +## Finding locations of interest -API docs: +Most data is only available for the US. Select endpoints report other countries at the national and/or regional levels. Endpoint descriptions explicitly state when they cover non-US locations. -```{r, eval = FALSE} -pub_dengue_nowcast(locations = "pr", epiweeks = epirange(201401, 202301)) -``` +For endpoints that report US data, see the +[geographic coding documentation](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_geography.html) +for available geographic levels. -#### NIDSS dengue -API docs: +### International data -```{r, eval = FALSE} -pub_nidss_dengue(locations = "taipei", epiweeks = epirange(200301, 201301)) -``` +International data is available via -### PAHO Dengue +- `pub_dengue_nowcast` (North and South America) +- `pub_ecdc_ili` (Europe) +- `pub_kcdc_ili` (Korea) +- `pub_nidss_dengue` (Taiwan) +- `pub_nidss_flu` (Taiwan) +- `pub_paho_dengue` (North and South America) +- `pvt_dengue_sensors` (North and South America) -API docs: -```{r, eval=FALSE} -pub_paho_dengue(regions = "ca", epiweeks = epirange(200201, 202319)) -``` +## Finding data sources and signals of interest -### Other Endpoints +Above we used data from [Delphi’s symptom surveys](https://delphi.cmu.edu/covid19/ctis/), +but the Epidata API includes numerous data streams: medical claims data, cases +and deaths, mobility, and many others. This can make it a challenge to find +the data stream that you are most interested in. -#### Wikipedia Access +The Epidata documentation lists all the data sources and signals available +through the API for [COVID-19](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html) +and for [other diseases](https://cmu-delphi.github.io/delphi-epidata/api/README.html#source-specific-parameters). -API docs: +You can also use the `avail_endpoints()` function to get a table of endpoint functions: ```{r, eval = FALSE} -pub_wiki(language = "en", articles = "influenza", epiweeks = epirange(202001, 202319)) -``` - -### Private methods - -These require private access keys to use (separate from the Delphi Epidata API key). -To actually run these locally, you will need to store these secrets in your `.Reviron` file, or set them as environmental variables. - -#### CDC - -API docs: - -```{r, eval=FALSE} -pvt_cdc(auth = Sys.getenv("SECRET_API_AUTH_CDC"), epiweeks = epirange(202003, 202304), locations = "ma") -``` - -#### Dengue Digital Surveillance Sensors - -API docs: - -```{r, eval=FALSE} -pvt_dengue_sensors( - auth = Sys.getenv("SECRET_API_AUTH_SENSORS"), - names = "ght", - locations = "ag", - epiweeks = epirange(201404, 202004) -) -``` - -#### Google Health Trends - -API docs: - -```{r, eval=FALSE} -pvt_ght( - auth = Sys.getenv("SECRET_API_AUTH_GHT"), - epiweeks = epirange(199301, 202304), - locations = "ma", - query = "how to get over the flu" -) -``` - -#### NoroSTAT metadata - -API docs: - -```{r, eval=FALSE} -pvt_meta_norostat(auth = Sys.getenv("SECRET_API_AUTH_NOROSTAT")) -``` - -#### NoroSTAT data - -API docs: - -```{r, eval=FALSE} -pvt_norostat(auth = Sys.getenv("SECRET_API_AUTH_NOROSTAT"), locations = "1", epiweeks = 201233) -``` - -#### Quidel Influenza testing - -API docs: - -```{r, eval=FALSE} -pvt_quidel(auth = Sys.getenv("SECRET_API_AUTH_QUIDEL"), locations = "hhs1", epiweeks = epirange(200301, 202105)) +avail_endpoints() ``` -#### Sensors - -API docs: - -```{r, eval=FALSE} -pvt_sensors( - auth = Sys.getenv("SECRET_API_AUTH_SENSORS"), - names = "sar3", - locations = "nat", - epiweeks = epirange(200301, 202105) -) +```{r, echo = FALSE} +invisible(capture.output(endpts <- avail_endpoints())) +knitr::kable(endpts) ``` -#### Twitter - -API docs: - -```{r, eval=FALSE} -pvt_twitter( - auth = Sys.getenv("SECRET_API_AUTH_TWITTER"), - locations = "nat", - epiweeks = epirange(200301, 202105) -) -``` +See `vignette("signal-discovery")` for more information. diff --git a/vignettes/signal-discovery.Rmd b/vignettes/signal-discovery.Rmd new file mode 100644 index 00000000..8cd7799c --- /dev/null +++ b/vignettes/signal-discovery.Rmd @@ -0,0 +1,374 @@ +--- +title: "Finding data sources and signals of interest" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Finding data sources and signals of interest} + %\VignetteEngine{knitr::rmarkdown} + \usepackage[utf8]{inputenc} +--- + +```{r, echo = FALSE, message = FALSE} +knitr::opts_chunk$set(collapse = TRUE, comment = "#>") +options(tibble.print_min = 4L, tibble.print_max = 4L, max.print = 4L) +library(epidatr) +library(dplyr) +``` + +The Epidata API includes numerous data streams -- medical claims data, cases and deaths, mobility, and many others -- covering different geographic regions. This can make it a challenge to find the data stream that you are most interested in. + +Example queries with all the endpoint functions available in this package are +given [below](#example-queries). + + +## Using the documentation + +The Epidata documentation lists all the data sources and signals available +through the API for +[COVID-19](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html) and +for [other diseases](https://cmu-delphi.github.io/delphi-epidata/api/README.html#source-specific-parameters). +The site also includes a search tool if you have a keyword (e.g. "Taiwan") in mind. + + +## Signal metadata + +Some endpoints have partner metadata available that, depending on +the endpoint, provides information about the signals that are available, what +time ranges they are available for, and when they have been updated. + +```{r, echo = FALSE} +suppressMessages(invisible(capture.output(endpts <- avail_endpoints()))) +filter(endpts, endsWith(Endpoint, "_meta()")) %>% knitr::kable() +``` + +## Interactive tooling + +We provide a couple `epidatr` functions to help find data sources and signals. + +The `avail_endpoints()` function lists endpoints, each of which, except for +COVIDcast, corresponds to a single data source. `avail_endpoints()` outputs a +`tibble` of endpoints and brief descriptions, which explicitly state when they +cover non-US locations: + +```{r, eval = FALSE} +avail_endpoints() +``` + +```{r, echo = FALSE} +suppressMessages(invisible(capture.output(endpts <- avail_endpoints()))) +knitr::kable(endpts) +``` + +The `covidcast_epidata()` function lets you look more in-depth at the data +sources available through the COVIDcast endpoint. The function describes +all available data sources and signals: + +```{r} +covid_sources <- covidcast_epidata() +head(covid_sources$sources, n = 2) +``` + +Each source is included as an entry in the `covid_sources$sources` list, associated +with a `tibble` describing included signals. + +If you use an editor that supports tab completion, such as RStudio, type +`covid_sources$source$` and wait for the tab completion popup. You will be able to +browse the list of data sources. + +```{r} +covid_sources$signals +``` + +If you use an editor that supports tab completion, type +`covid_sources$signals$` and wait for the tab completion popup. You will be +able to type the name of signals and have the autocomplete feature select +them from the list for you. In the tab-completion popup, signal names are +prefixed with the name of the data source for filtering convenience. + +_Note_ that some signal names have dashes in them, so to access them +we rely on the backtick operator: + +```{r} +covid_sources$signals$`fb-survey:smoothed_cli` +``` + +These signal objects can be used directly to fetch data, without requiring us to use +the `pub_covidcast()` function. Simply use the `$call` attribute of the object: + +```{r} +epidata <- covid_sources$signals$`fb-survey:smoothed_cli`$call( + "state", "pa", epirange(20210405, 20210410) +) +knitr::kable(epidata) +``` + + +## Example Queries + +### COVIDcast Main Endpoint + +API docs: + +County geo_values are [FIPS codes](https://en.wikipedia.org/wiki/List_of_United_States_FIPS_codes_by_county) and are discussed in the API docs [here](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_geography.html). The example below is for Orange County, California. + +```{r} +pub_covidcast( + source = "fb-survey", + signals = "smoothed_accept_covid_vaccine", + geo_type = "county", + time_type = "day", + time_values = epirange(20201221, 20201225), + geo_values = "06059" +) +``` + +The `covidcast` endpoint supports `*` in its time and geo fields: + +```{r} +pub_covidcast( + source = "fb-survey", + signals = "smoothed_accept_covid_vaccine", + geo_type = "county", + time_type = "day", + time_values = epirange(20201221, 20201225), + geo_values = "*" +) +``` + +### Other Covid Endpoints + +#### COVID-19 Hospitalization: Facility Lookup + +API docs: + +```{r, eval = FALSE} +pub_covid_hosp_facility_lookup(city = "southlake") +pub_covid_hosp_facility_lookup(state = "WY") +# A non-example (there is no city called New York in Wyoming) +pub_covid_hosp_facility_lookup(state = "WY", city = "New York") +``` + +#### COVID-19 Hospitalization by Facility + +API docs: + +```{r, eval = FALSE} +pub_covid_hosp_facility( + hospital_pks = "100075", + collection_weeks = epirange(20200101, 20200501) +) +``` + +#### COVID-19 Hospitalization by State + +API docs: + +```{r, eval = FALSE} +pub_covid_hosp_state_timeseries(states = "MA", dates = "20200510") +``` + +### Flu Endpoints + +#### Delphi's ILINet forecasts + +API docs: + +```{r, eval = FALSE} +del <- pub_delphi(system = "ec", epiweek = 201501) +names(del[[1L]]$forecast) +``` + +#### FluSurv hospitalization data + +API docs: + +```{r, eval = FALSE} +pub_flusurv(locations = "ca", epiweeks = 202001) +``` + +#### Fluview data + +API docs: + +```{r, eval = FALSE} +pub_fluview(regions = "nat", epiweeks = epirange(201201, 202001)) +``` + +#### Fluview virological data from clinical labs + +API docs: + +```{r, eval = FALSE} +pub_fluview_clinical(regions = "nat", epiweeks = epirange(201601, 201701)) +``` + +#### Fluview metadata + +API docs: + +```{r, eval = FALSE} +pub_fluview_meta() +``` + +#### Google Flu Trends data + +API docs: + +```{r, eval = FALSE} +pub_gft(locations = "hhs1", epiweeks = epirange(201201, 202001)) +``` + +#### ECDC ILI + +API docs: + +```{r, eval = FALSE} +pub_ecdc_ili(regions = "Armenia", epiweeks = 201840) +``` + +#### KCDC ILI + +API docs: + +```{r, eval = FALSE} +pub_kcdc_ili(regions = "ROK", epiweeks = 200436) +``` + +#### NIDSS Flu + +API docs: + +```{r, eval = FALSE} +pub_nidss_flu(regions = "taipei", epiweeks = epirange(200901, 201301)) +``` + +#### ILI Nearby Nowcast + +API docs: + +```{r, eval = FALSE} +pub_nowcast(locations = "ca", epiweeks = epirange(202201, 202319)) +``` + +### Dengue Endpoints + +#### Delphi's Dengue Nowcast + +API docs: + +```{r, eval = FALSE} +pub_dengue_nowcast(locations = "pr", epiweeks = epirange(201401, 202301)) +``` + +#### NIDSS dengue + +API docs: + +```{r, eval = FALSE} +pub_nidss_dengue(locations = "taipei", epiweeks = epirange(200301, 201301)) +``` + +### PAHO Dengue + +API docs: + +```{r, eval=FALSE} +pub_paho_dengue(regions = "ca", epiweeks = epirange(200201, 202319)) +``` + +### Other Endpoints + +#### Wikipedia Access + +API docs: + +```{r, eval = FALSE} +pub_wiki(language = "en", articles = "influenza", epiweeks = epirange(202001, 202319)) +``` + +### Private methods + +These require private access keys to use (separate from the Delphi Epidata API key). +To actually run these locally, you will need to store these secrets in your `.Reviron` file, or set them as environmental variables. + +#### CDC + +API docs: + +```{r, eval=FALSE} +pvt_cdc(auth = Sys.getenv("SECRET_API_AUTH_CDC"), epiweeks = epirange(202003, 202304), locations = "ma") +``` + +#### Dengue Digital Surveillance Sensors + +API docs: + +```{r, eval=FALSE} +pvt_dengue_sensors( + auth = Sys.getenv("SECRET_API_AUTH_SENSORS"), + names = "ght", + locations = "ag", + epiweeks = epirange(201404, 202004) +) +``` + +#### Google Health Trends + +API docs: + +```{r, eval=FALSE} +pvt_ght( + auth = Sys.getenv("SECRET_API_AUTH_GHT"), + epiweeks = epirange(199301, 202304), + locations = "ma", + query = "how to get over the flu" +) +``` + +#### NoroSTAT metadata + +API docs: + +```{r, eval=FALSE} +pvt_meta_norostat(auth = Sys.getenv("SECRET_API_AUTH_NOROSTAT")) +``` + +#### NoroSTAT data + +API docs: + +```{r, eval=FALSE} +pvt_norostat(auth = Sys.getenv("SECRET_API_AUTH_NOROSTAT"), locations = "1", epiweeks = 201233) +``` + +#### Quidel Influenza testing + +API docs: + +```{r, eval=FALSE} +pvt_quidel(auth = Sys.getenv("SECRET_API_AUTH_QUIDEL"), locations = "hhs1", epiweeks = epirange(200301, 202105)) +``` + +#### Sensors + +API docs: + +```{r, eval=FALSE} +pvt_sensors( + auth = Sys.getenv("SECRET_API_AUTH_SENSORS"), + names = "sar3", + locations = "nat", + epiweeks = epirange(200301, 202105) +) +``` + +#### Twitter + +API docs: + +```{r, eval=FALSE} +pvt_twitter( + auth = Sys.getenv("SECRET_API_AUTH_TWITTER"), + locations = "nat", + epiweeks = epirange(200301, 202105) +) +``` diff --git a/vignettes/versioned-data.Rmd b/vignettes/versioned-data.Rmd new file mode 100644 index 00000000..27271c6c --- /dev/null +++ b/vignettes/versioned-data.Rmd @@ -0,0 +1,163 @@ +--- +title: "Understanding and accessing versioned data" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Understanding and accessing versioned data} + %\VignetteEngine{knitr::rmarkdown} + \usepackage[utf8]{inputenc} +--- + +```{r, echo = FALSE, message = FALSE} +knitr::opts_chunk$set(collapse = TRUE, comment = "#>") +options(tibble.print_min = 4L, tibble.print_max = 4L, max.print = 4L) +library(epidatr) +library(dplyr) +``` + + +The Epidata API records not just each signal's estimate for a given location +on a given day, but also *when* that estimate was made, and all updates to that +estimate. + +For example, let's look at the [doctor visits +signal](https://cmu-delphi.github.io/delphi-epidata/api/covidcast-signals/doctor-visits.html) +from the [`covidcast` endpoint](https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html), +which estimates the percentage of outpatient doctor visits that are +COVID-related. Consider a result row with `time_value` 2020-05-01 for +`geo_values = "pa"`. This is an estimate for Pennsylvania on +May 1, 2020. That estimate was *issued* on May 5, 2020, the delay being due to +the aggregation of data by our source and the time taken by the Epidata API to +ingest the data provided. Later, the estimate for May 1st could be updated, +perhaps because additional visit data from May 1st arrived at our source and was +reported to us. This constitutes a new *issue* of the data. + + +### Data known "as of" a specific date + +By default, endpoint functions fetch the most recent issue available. This +is the best option for users who simply want to graph the latest data or +construct dashboards. But if we are interested in knowing *when* data was +reported, we can request specific data versions using the `as_of`, `issues`, or +`lag` arguments. + +_Note_ that these are mutually exclusive; only one can be specified +at a time. Also, not all endpoints support all three parameters, so please +check the documentation for that specific endpoint. + +First, we can request the data that was available *as of* a specific date, using +the `as_of` argument: + + +```{r} +epidata <- pub_covidcast( + source = "doctor-visits", + signals = "smoothed_adj_cli", + time_type = "day", + time_values = epirange("2020-05-01", "2020-05-01"), + geo_type = "state", + geo_values = "pa", + as_of = "2020-05-07" +) +knitr::kable(epidata) +``` + +This shows that an estimate of about 2.3% was issued on May 7. If we don't +specify `as_of`, we get the most recent estimate available: + + +```{r} +epidata <- pub_covidcast( + source = "doctor-visits", + signals = "smoothed_adj_cli", + time_type = "day", + time_values = epirange("2020-05-01", "2020-05-01"), + geo_type = "state", + geo_values = "pa" +) +knitr::kable(epidata) +``` + +Note the substantial change in the estimate, from less than 3% to almost 6%, +reflecting new data that became available after May 7 about visits *occurring on* +May 1. This illustrates the importance of issue date tracking, particularly +for forecasting tasks. To backtest a forecasting model on past data, it is +important to use the data that would have been available *at the time* the model +was or would have been fit, not data that arrived much later. + + +### Multiple issues of observations + +By using the `issues` argument, we can request all issues in a certain time +period: + +```{r} +epidata <- pub_covidcast( + source = "doctor-visits", + signals = "smoothed_adj_cli", + time_type = "day", + time_values = epirange("2020-05-01", "2020-05-01"), + geo_type = "state", + geo_values = "pa", + issues = epirange("2020-05-01", "2020-05-15") +) +knitr::kable(epidata) +``` + +This estimate was clearly updated many times as new data for May 1st arrived. + +Note that these results include only data issued or updated between +(inclusive) 2020-05-01 and 2020-05-15. If a value was first reported on +2020-04-15, and never updated, a query for issues between 2020-05-01 and +2020-05-15 will not include that value among its results. + +The `issues` parameter also accepts a list of dates. + +```{r, eval = FALSE} +pub_covidcast( + source = "doctor-visits", + signals = "smoothed_adj_cli", + time_type = "day", + time_values = epirange("2020-05-01", "2020-05-01"), + geo_type = "state", + geo_values = "pa", + issues = c("2020-05-07", "2020-05-09", "2020-05-15") +) +``` + + +### Observations issued with a specific lag + +Finally, we can use the `lag` argument to request only data reported with a +certain lag. For example, requesting a lag of 7 days fetches only data issued +exactly 7 days after the corresponding `time_value`: + +```{r} +epidata <- pub_covidcast( + source = "doctor-visits", + signals = "smoothed_adj_cli", + time_type = "day", + time_values = epirange("2020-05-01", "2020-05-07"), + geo_type = "state", + geo_values = "pa", + lag = 7 +) +knitr::kable(epidata) +``` + +Note that though this query requested all values between 2020-05-01 and +2020-05-07, May 3rd and May 4th were *not* included in the results set. This is +because the query will only include a result for May 3rd if a value were issued +on May 10th (a 7-day lag), but in fact the value was not updated on that day: + +```{r} +epidata <- pub_covidcast( + source = "doctor-visits", + signals = "smoothed_adj_cli", + time_type = "day", + time_values = epirange("2020-05-03", "2020-05-03"), + geo_type = "state", + geo_values = "pa", + issues = epirange("2020-05-09", "2020-05-15") +) +knitr::kable(epidata) +```