Skip to content

Correctly parse issue field for pub_covid_hosp_state_timeseries #220

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Nov 30, 2023
1 change: 0 additions & 1 deletion R/endpoints.R
Original file line number Diff line number Diff line change
Expand Up @@ -513,7 +513,6 @@ pub_covid_hosp_state_timeseries <- function(
create_epidata_field_info("state", "text"),
create_epidata_field_info("issue", "date"),
create_epidata_field_info("date", "date"),
create_epidata_field_info("issue", "date"),
create_epidata_field_info("critical_staffing_shortage_today_yes", "bool"),
create_epidata_field_info("critical_staffing_shortage_today_no", "bool"),
create_epidata_field_info("critical_staffing_shortage_today_not_reported", "bool"),
Expand Down
26 changes: 26 additions & 0 deletions R/epidatacall.R
Original file line number Diff line number Diff line change
Expand Up @@ -41,12 +41,38 @@
#' @return
#' - For `create_epidata_call`: an `epidata_call` object
#'
#' @importFrom purrr map_chr map_lgl
create_epidata_call <- function(endpoint, params, meta = NULL,
only_supports_classic = FALSE) {
stopifnot(is.character(endpoint), length(endpoint) == 1)
stopifnot(is.list(params))
stopifnot(is.null(meta) || is.list(meta))
stopifnot(all(map_lgl(meta, ~ inherits(.x, "EpidataFieldInfo"))))
stopifnot(is.logical(only_supports_classic), length(only_supports_classic) == 1)

if (length(unique(meta)) != length(meta)) {
cli::cli_abort(
c(
"List of expected epidata fields contains duplicate entries",
"i" = "duplicates in meta can cause problems parsing fetched data",
"Please fix in `endpoints.R`"
),
class = "epidatr__duplicate_meta_entries"
)
}

meta_field_names <- map_chr(meta, "name")
if (length(meta_field_names) != length(unique(meta_field_names))) {
cli::cli_abort(
c(
"List of expected epidata fields contains duplicate names",
"i" = "duplicates in meta can cause problems parsing fetched data",
"Please fix in `endpoints.R`"
),
class = "epidatr__duplicate_meta_names"
)
}

if (is.null(meta)) {
meta <- list()
}
Expand Down
21 changes: 19 additions & 2 deletions R/model.R
Original file line number Diff line number Diff line change
Expand Up @@ -122,9 +122,9 @@ parse_value <- function(info, value, disable_date_parsing = FALSE) {

if (is.null(value)) {
return(value)
} else if (info$type == "date" && !disable_date_parsing) {
} else if (info$type == "date" && !disable_date_parsing && !inherits(value, "Date")) {
return(parse_api_date(value))
} else if (info$type == "epiweek" && !disable_date_parsing) {
} else if (info$type == "epiweek" && !disable_date_parsing && !inherits(value, "Date")) {
return(parse_api_week(value))
} else if (info$type == "bool") {
return(as.logical(value))
Expand All @@ -138,13 +138,30 @@ parse_value <- function(info, value, disable_date_parsing = FALSE) {
value
}

#' @importFrom purrr map_chr
parse_data_frame <- function(epidata_call, df, disable_date_parsing = FALSE) {
stopifnot(inherits(epidata_call, "epidata_call"))
meta <- epidata_call$meta
df <- as.data.frame(df)

if (length(meta) == 0) {
return(df)
}

meta_field_names <- map_chr(meta, "name")
missing_fields <- setdiff(names(df), meta_field_names)
if (
length(missing_fields) != 0
) {
cli::cli_warn(
c(
"Not all return columns are specified as expected epidata fields",
Copy link
Contributor Author

@nmdefries nmdefries Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding missing specs for public endpoints in another PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, so #223 specifies the headers of every endpoint? That seems likely to require quite a bit of attention to keep up to date. I don't quite get the motivation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most endpoint fields were already fully specified. #223 adds some that weren't included. @lcbrooks or @dshemetov any context on why we specify headers?

FWIW headers shouldn't change often/fast. Most endpoints aren't adding new fields. Many aren't being updated anymore and at least covidcast uses the same header for every signal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just worries me as yet another place where we're duplicating state. I hadn't caught that it was already pretty thoroughly covered though; in for a penny, in for a pound I guess

Copy link
Contributor

@dshemetov dshemetov Nov 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's something we inherited from Sam's original development effort. I am guessing he ran into issues where he couldn't rely on automatic type inference from the JSON output and went in for hard coding all data types.

It's definitely a maintenance burden. We could try parsing without it and see where the issues come up, if they still do. Another possibility is generating these type annotations directly from the SQL schemas.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that duplicating state is undesirable and adds maintenance load, but relying on auto-classing isn't robust (and there will inevitably be some fields that aren't parsed correctly that we will have to manually specify -- better to specify everything in that case).

generating these type annotations directly from the SQL schemas

That would be ideal. What approach are you imagining @dshemetov ? Would this be parsing the server code or can we query the DB directly for schemas?

"i" = "Unspecified fields {missing_fields} may need to be manually converted to more appropriate classes"
),
class = "epidatr__missing_meta_fields"
)
}

columns <- colnames(df)
for (i in seq_len(length(meta))) {
info <- meta[[i]]
Expand Down
50 changes: 50 additions & 0 deletions tests/testthat/test-epidatacall.R
Original file line number Diff line number Diff line change
Expand Up @@ -160,3 +160,53 @@ test_that("classic only fetch", {
# making sure that fetch_tbl and throws the expected error on classic only
expect_error(epidata_call %>% fetch_tbl(), class = "only_supports_classic_format")
})

test_that("create_epidata_call basic behavior", {
endpoint <- "endpoint"
params <- list()

# Success
meta <- list(
create_epidata_field_info("time_value", "date"),
create_epidata_field_info("value", "float")
)
expected <- list(
endpoint = endpoint,
params = params,
base_url = "https://api.delphi.cmu.edu/epidata/",
meta = meta,
only_supports_classic = FALSE
)
class(expected) <- "epidata_call"
expect_identical(create_epidata_call(endpoint, params, meta = meta), expected)

expected$meta <- list()
expect_identical(create_epidata_call(endpoint, params, meta = NULL), expected)
expect_identical(create_epidata_call(endpoint, params, meta = list()), expected)
})


test_that("create_epidata_call fails when meta arg contains duplicates", {
endpoint <- "endpoint"
params <- list()

# Duplicate names
meta <- list(
create_epidata_field_info("time_value", "date"),
create_epidata_field_info("time_value", "int")
)
expect_error(
create_epidata_call(endpoint, params, meta = meta),
class = "epidatr__duplicate_meta_names"
)

# Duplicate entries
meta <- list(
create_epidata_field_info("time_value", "date"),
create_epidata_field_info("time_value", "date")
)
expect_error(
create_epidata_call(endpoint, params, meta = meta),
class = "epidatr__duplicate_meta_entries"
)
})
31 changes: 28 additions & 3 deletions tests/testthat/test-model.R
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,32 @@ test_that("null parsing", {
})

test_that("parse invalid time", {
vale <- list(3)
vale$class <- "my nonexistant class"
expect_error(parse_timeset_input(vale))
value <- list(3)
value$class <- "my nonexistant class"
expect_error(parse_timeset_input(value))
})

test_that("parse_data_frame warns when df contains fields not listed in meta", {
epidata_call <- pub_flusurv(
locations = "ca",
epiweeks = 202001,
fetch_args = fetch_args_list(dry_run = TRUE)
)
# see generate_test_data.R
mock_df <- as.data.frame(readr::read_rds(testthat::test_path("data/flusurv-epiweeks.rds")))

# Success when meta and df fields match exactly
expect_no_warning(parse_data_frame(epidata_call, mock_df))

# Warning when df contains extra fields
mock_df$extra <- 5
expect_warning(
parse_data_frame(epidata_call, mock_df),
class = "epidatr__missing_meta_fields"
)
mock_df$extra <- NULL

# Success when meta contains extra fields
mock_df$rate_age_0 <- NULL
expect_no_warning(parse_data_frame(epidata_call, mock_df))
})