Skip to content

Add checks & test for additional_metadata format in epi_df #182

Closed
@mgyliu

Description

@mgyliu

Action items

  • Update examples in as_epi-df so that additional_metadata is always a list() type.
  • Add example in as_epi_df with multiple other_keys

Related to issue cmu-delphi/epipredict#114

The problem

other_keys gets stored differently in epi_df if you initialize it as a vector vs. a list. The epi_df constructor expects additional_metadata to be a list but passing a vector still "works" (i.e., no error). Examples are shown below for each case. The examples are taken from the as_epi-df reference.

library(dplyr)
library(recipes)
library(epiprocess)

Example 1

ex1_input <- tibble::tibble(
  geo_value = rep(c("ca", "fl", "pa"), each = 2),
  county_code = c("06059","06061","06067",
                  "12111","12113","12117"),
  another_key = 1:6, # <- I added this additional key 
  time_value = rep(seq(as.Date("2020-06-01"), as.Date("2020-06-03"),
                       by = "day"), length.out = length(geo_value)),
  value = 1:length(geo_value) + 0.01 * rnorm(length(geo_value))) %>% 
  tsibble::as_tsibble(
    index = time_value, 
    key = c(geo_value, county_code, another_key))

# The `other_keys` metadata (`"county_code"` in this case) is automatically
# inferred from the `tsibble`'s `key`:
ex1 <- as_epi_df(x = ex1_input, geo_type = "state", time_type = "day", as_of = "2020-06-03")
attr(ex1,"metadata")

Output:

$geo_type
[1] "state"

$time_type
[1] "day"

$as_of
[1] "2020-06-03"

$other_keys
[1] "county_code" "another_key"

Example 2 (but ex3 from the as_epi_df documentation so we'll keep the names)

ex3_input <- jhu_csse_county_level_subset %>%
  dplyr::filter(time_value > "2021-12-01", state_name == "Massachusetts") %>%
  dplyr::slice_tail(n = 6) %>% 
  tsibble::as_tsibble() %>% # needed to add the additional metadata
  dplyr::mutate(state = rep("MA",6)) %>%
  dplyr::mutate(pol = rep(c("blue", "swing", "swing"), each = 2)) # extra key

# Note: additional_metadata is vector, not list
ex3 <- ex3_input %>%  as_epi_df(additional_metadata = c(other_keys = c("state", "pol")))
attr(ex3,"metadata")

Output:

$geo_type
[1] "county"

$time_type
[1] "day"

$as_of
[1] "2022-08-02 15:31:30 PDT"

$other_keys1
[1] "state"

$other_keys2
[1] "pol"

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingdocumentationImprovements or additions to documentation

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions