Skip to content

Release covidcast-indicators 0.3.12 #1606

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 38 commits into from
May 2, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
1d9f353
Add basic interpolation function and a test stub
dshemetov Mar 10, 2022
a60baa5
Linear interpolation by default, add more tests
dshemetov Mar 23, 2022
4cf6268
Enforce a float type on val and sample_size
dshemetov Mar 25, 2022
ed90cd8
Interpolate standard error
dshemetov Apr 1, 2022
cd5b7bd
Update test
dshemetov Apr 1, 2022
8f9d464
Add gitignore and remove indent in test file
dshemetov Apr 13, 2022
c38db2e
Exclude naat_pct_positive from interpolation
dshemetov Apr 13, 2022
4b03630
Extend files to be processed for interpolation
krivard Apr 14, 2022
d00f42a
appease linter
krivard Apr 14, 2022
1b574bc
Minor comment update
dshemetov Apr 14, 2022
917baef
Merge pull request #1578 from cmu-delphi/krivard/dsew-interp-extend-l…
dshemetov Apr 14, 2022
177e048
Merge branch 'ds/dsew-interpolation' of https://github.com/cmu-delphi…
dshemetov Apr 14, 2022
6dba43d
Merge pull request #1580 from cmu-delphi/bot/sync-prod-main
krivard Apr 14, 2022
edae08a
Make the linter happy
dshemetov Apr 15, 2022
dec92af
Merge branch 'main' into ds/dsew-interpolation
dshemetov Apr 15, 2022
a38fcf8
Merge pull request #1555 from cmu-delphi/ds/dsew-interpolation
krivard Apr 18, 2022
0486d7d
Fix UnboundLocalError
krivard Apr 18, 2022
db20c40
whitespace
krivard Apr 18, 2022
1e5e5cf
Merge pull request #1583 from cmu-delphi/krivard/dsew-keep
krivard Apr 18, 2022
06ae8a8
Add archive params to prod dsew-cpr configuration
krivard Apr 19, 2022
ff3f36e
Merge pull request #1584 from cmu-delphi/krivard/dsew-prod-interp-params
krivard Apr 19, 2022
842a169
chore: bump covidcast-indicators to 0.3.11
Apr 19, 2022
3cb7411
Merge pull request #1586 from cmu-delphi/release/indicators_v0.3.11_u…
krivard Apr 19, 2022
1a44a58
Merge pull request #1587 from cmu-delphi/bot/sync-prod-main
krivard Apr 19, 2022
ab62c9d
Don't drop positivity signal when interpolating
dshemetov Apr 20, 2022
5404e7d
Merge pull request #1590 from cmu-delphi/ds/fix-cpr-positivity
krivard Apr 21, 2022
85887f9
explicitly cast to float before interpolating
nmdefries Apr 21, 2022
939501b
test that interp correclty handles object-type fields
nmdefries Apr 21, 2022
3f99eeb
comment object-type interp test
nmdefries Apr 21, 2022
5957e60
remove second float conversion
nmdefries Apr 22, 2022
449eb1d
drop NAs during interpolation
nmdefries Apr 22, 2022
0d384b5
Merge pull request #1594 from cmu-delphi/ndefries/float-before-interp
korlaxxalrok Apr 26, 2022
c88dba0
compress contingency tables
nmdefries Apr 28, 2022
4797799
Merge pull request #1603 from cmu-delphi/ndefries/compress-contingency
korlaxxalrok Apr 28, 2022
73f4a7c
suppress county, state for appt_location_tried and other_tried
nmdefries May 2, 2022
106e231
Merge pull request #1604 from cmu-delphi/ndefries/suppress-sircal-app…
korlaxxalrok May 2, 2022
799ed00
chore: bump covidcast-indicators to 0.3.12
May 2, 2022
0005e09
Merge branch 'prod' into release/indicators_v0.3.12_utils_v0.3.3
korlaxxalrok May 2, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 0.3.11
current_version = 0.3.12
commit = True
message = chore: bump covidcast-indicators to {new_version}
tag = False
4 changes: 3 additions & 1 deletion ansible/templates/sir_complainsalot-params-prod.json.j2
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,9 @@
"smoothed_vaccine_barrier_type_has", "smoothed_wvaccine_barrier_type_has",
"smoothed_vaccine_barrier_none_has", "smoothed_wvaccine_barrier_none_has",
"smoothed_vaccine_barrier_appointment_location_has", "smoothed_wvaccine_barrier_appointment_location_has",
"smoothed_vaccine_barrier_other_has", "smoothed_wvaccine_barrier_other_has"
"smoothed_vaccine_barrier_other_has", "smoothed_wvaccine_barrier_other_has",
["smoothed_vaccine_barrier_appointment_location_tried", "county", "state"],
["smoothed_vaccine_barrier_other_tried", "county", "state"]
]
},
"quidel": {
Expand Down
15 changes: 11 additions & 4 deletions dsew_community_profile/delphi_dsew_community_profile/pull.py
Original file line number Diff line number Diff line change
Expand Up @@ -617,6 +617,7 @@ def interpolate_missing_values(dfs: DataDict) -> DataDict:
# https://github.com/cmu-delphi/covidcast-indicators/issues/1576
_, sig, _ = key
if sig == "positivity":
interpolate_df[key] = df.set_index(["geo_id", "timestamp"]).sort_index().reset_index()
continue

geo_dfs = []
Expand All @@ -628,31 +629,37 @@ def interpolate_missing_values(dfs: DataDict) -> DataDict:
if "val" in reindexed_group_df.columns and not reindexed_group_df["val"].isna().all():
reindexed_group_df["val"] = (
reindexed_group_df["val"]
.interpolate(method="linear", limit_area="inside")
.astype(float)
.interpolate(method="linear", limit_area="inside")
)
if "se" in reindexed_group_df.columns:
reindexed_group_df["se"] = (
reindexed_group_df["se"]
.interpolate(method="linear", limit_area="inside")
.astype(float)
.interpolate(method="linear", limit_area="inside")
)
if (
"sample_size" in reindexed_group_df.columns
and not reindexed_group_df["sample_size"].isna().all()
):
reindexed_group_df["sample_size"] = (
reindexed_group_df["sample_size"]
.interpolate(method="linear", limit_area="inside")
.astype(float)
.interpolate(method="linear", limit_area="inside")
)
if "publish_date" in reindexed_group_df.columns:
reindexed_group_df["publish_date"] = reindexed_group_df["publish_date"].fillna(
method="bfill"
)
reindexed_group_df = reindexed_group_df[~reindexed_group_df.val.isna()]
geo_dfs.append(reindexed_group_df)
interpolate_df[key] = (
pd.concat(geo_dfs).reset_index().rename(columns={"index": "timestamp"})
pd.concat(geo_dfs)
.reset_index()
.rename(columns={"index": "timestamp"})
.set_index(["geo_id", "timestamp"])
.sort_index()
.reset_index()
)
return interpolate_df

Expand Down
29 changes: 25 additions & 4 deletions dsew_community_profile/tests/test_pull.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
from itertools import chain
from typing import Any, Dict, List, Union
import pandas as pd
from pandas.util.testing import assert_frame_equal
from pandas.testing import assert_frame_equal
import numpy as np
import pytest
from unittest.mock import patch, Mock
Expand Down Expand Up @@ -506,7 +506,7 @@ def test_interpolation(self):
"sample_size": [line(i) for i in range(0, 10)],
"publish_date": pd.to_datetime("2022-01-10")
}), dtypes=DTYPES)
# A signal missing everything, should be left alone.
# A signal missing everything, should be dropped since it's all NAs.
missing_sig3 = sig3[(sig3.timestamp <= "2022-01-05") | (sig3.timestamp >= "2022-01-08")]

sig4 = _set_df_dtypes(pd.DataFrame({
Expand All @@ -517,12 +517,33 @@ def test_interpolation(self):
"sample_size": [line(i) for i in range(0, 10)],
"publish_date": pd.to_datetime("2022-01-10")
}), dtypes=DTYPES)
# A signal missing everything except for one point, should be left alone.
# A signal missing everything except for one point, should output a reduced range without NAs.
missing_sig4 = sig4[(sig4.timestamp <= "2022-01-05") | (sig4.timestamp >= "2022-01-08")]

missing_dfs = [missing_sig1, missing_sig2, missing_sig3, missing_sig4]
interpolated_dfs1 = interpolate_missing_values({("src", "sig", False): pd.concat(missing_dfs)})
expected_dfs = pd.concat([sig1, sig2, sig3, sig4])
expected_dfs = pd.concat([sig1, sig2, sig4.loc[9:]])
_assert_frame_equal(interpolated_dfs1[("src", "sig", False)], expected_dfs, index_cols=["geo_id", "timestamp"])

def test_interpolation_object_type(self):
DTYPES = {"geo_id": str, "timestamp": "datetime64[ns]", "val": float, "se": float, "sample_size": float, "publish_date": "datetime64[ns]"}
line = lambda x: 3 * x + 5

sig1 = _set_df_dtypes(pd.DataFrame({
"geo_id": "1",
"timestamp": pd.date_range("2022-01-01", "2022-01-10"),
"val": [line(i) for i in range(2, 12)],
"se": [line(i) for i in range(1, 11)],
"sample_size": [line(i) for i in range(0, 10)],
"publish_date": pd.to_datetime("2022-01-10")
}), dtypes=DTYPES)
# A linear signal missing two days which should be filled exactly by the linear interpolation.
missing_sig1 = sig1[(sig1.timestamp <= "2022-01-05") | (sig1.timestamp >= "2022-01-08")]
# set all columns to object type to simulate the miscast we sometimes see when combining dfs
missing_sig1 = _set_df_dtypes(missing_sig1, {key: object for key in DTYPES.keys()})

interpolated_dfs1 = interpolate_missing_values({("src", "sig", False): missing_sig1})
expected_dfs = pd.concat([sig1])
_assert_frame_equal(interpolated_dfs1[("src", "sig", False)], expected_dfs, index_cols=["geo_id", "timestamp"])

@patch("delphi_dsew_community_profile.pull.INTERP_LENGTH", 2)
Expand Down
4 changes: 2 additions & 2 deletions facebook/contingency-combine.R
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,11 @@ suppressPackageStartupMessages({
#' create new ones, relative to the current working directory.
#' @param pattern Regular expression indicating which files in that directory to
#' open. By default, selects all `.csv` files with standard table date prefix.
run_rollup <- function(input_dir, output_dir, pattern = "^[0-9]{8}_[0-9]{8}.*[.]csv$") {
run_rollup <- function(input_dir, output_dir, pattern = "^[0-9]{8}_[0-9]{8}.*[.]csv.gz$") {
if (!dir.exists(output_dir)) { dir.create(output_dir) }

files <- list.files(input_dir, pattern = pattern)
if (length(files) == 0) { stop("No matching data files.") }
if (length(files) == 0) { stop("No matching contingency files to combine.") }

# Get df of input files and corresponding output files. Reformat as a list
# such that input files with same grouping variables (and thus same output
Expand Down
4 changes: 3 additions & 1 deletion facebook/delphiFacebook/R/contingency_write.R
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ write_contingency_tables <- function(data, params, geo_type, groupby_vars)

file_name <- get_file_name(params, geo_type, groupby_vars)
msg_df(sprintf("saving contingency table data to %-35s", file_name), data)
# Automatically uses gzip compression based on output file name.
write_csv(data, file.path(params$export_dir, file_name))

} else {
Expand Down Expand Up @@ -169,7 +170,8 @@ get_file_name <- function(params, geo_type, groupby_vars) {
if (!is.null(params$debug) && params$debug) {
file_name <- paste0("DebugOn-DoNotShare_", file_name)
}
file_name <- paste0(file_name, ".csv")
# Always use gzip compression.
file_name <- paste0(file_name, ".csv.gz")
return(file_name)
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -67,10 +67,10 @@ test_that("small dataset produces no output", {
### This test relies on `setup-run.R` to run the full pipeline and tests basic
### properties of the output.
test_that("full synthetic dataset produces expected output format", {
expected_files <- c("20200501_20200531_monthly_nation_gender.csv")
expected_files <- c("20200501_20200531_monthly_nation_gender.csv.gz")
actual_files <- dir(test_path("receiving_contingency_full"))

out <- read.csv(file.path(params$export_dir, "20200501_20200531_monthly_nation_gender.csv"))
out <- read.csv(file.path(params$export_dir, "20200501_20200531_monthly_nation_gender.csv.gz"))

expect_setequal(expected_files, actual_files)
expect_equal(dir.exists(test_path("receiving_contingency_full")), TRUE)
Expand Down Expand Up @@ -101,7 +101,7 @@ test_that("simple equal-weight dataset produces correct percents", {
run_contingency_tables_many_periods(params, base_aggs[2,])

# Expected files
expect_setequal(!!dir(params$export_dir), c("20200501_20200531_monthly_nation_gender.csv"))
expect_setequal(!!dir(params$export_dir), c("20200501_20200531_monthly_nation_gender.csv.gz"))

# Expected file contents
raw_data <- read.csv(test_path("./input/simple_synthetic.csv"))
Expand All @@ -112,7 +112,7 @@ test_that("simple equal-weight dataset produces correct percents", {
"us", "Female", fever_prop * 100, NA, 2000L, 100 * 2000
))

df <- read.csv(file.path(params$export_dir, "20200501_20200531_monthly_nation_gender.csv"))
df <- read.csv(file.path(params$export_dir, "20200501_20200531_monthly_nation_gender.csv.gz"))
expect_equivalent(df, expected_output)
})

Expand Down Expand Up @@ -148,7 +148,7 @@ test_that("testing run with multiple aggregations per group", {
represented_pct_heartdisease = 100 * 2000,
)

out <- read.csv(file.path(params$export_dir, "20200501_20200531_monthly_nation_gender.csv"))
out <- read.csv(file.path(params$export_dir, "20200501_20200531_monthly_nation_gender.csv.gz"))
expect_equivalent(out, expected)
})

Expand Down Expand Up @@ -198,7 +198,7 @@ test_that("simple weighted dataset produces correct percents", {
run_contingency_tables_many_periods(params, base_aggs[2,])

# Expected files
expect_equal(!!dir(params$export_dir), c("20200501_20200531_monthly_nation_gender.csv"))
expect_equal(!!dir(params$export_dir), c("20200501_20200531_monthly_nation_gender.csv.gz"))

# Expected file contents
raw_data <- read.csv(test_path("./input/simple_synthetic.csv"))
Expand All @@ -209,7 +209,7 @@ test_that("simple weighted dataset produces correct percents", {
"us", "Female", fever_prop * 100, NA, 2000L, sum(rand_weights)
))

out <- read.csv(file.path(params$export_dir, "20200501_20200531_monthly_nation_gender.csv"))
out <- read.csv(file.path(params$export_dir, "20200501_20200531_monthly_nation_gender.csv.gz"))
expect_equivalent(out, expected_output)
})

Expand All @@ -228,7 +228,7 @@ test_that("production of historical CSVs for range of dates", {

run_contingency_tables_many_periods(params, base_aggs[2,])
# Expected files
expect_equal(!!dir(params$export_dir), c("20200503_20200509_weekly_nation_gender.csv", "20200510_20200516_weekly_nation_gender.csv"))
expect_equal(!!dir(params$export_dir), c("20200503_20200509_weekly_nation_gender.csv.gz", "20200510_20200516_weekly_nation_gender.csv.gz"))
})


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,9 +45,9 @@ test_that("testing write_contingency_tables command", {
aggregate_range = "week"),
"state",
c("geo_id", "tested"))
expect_setequal(!!dir(tdir), c("20200510_20200516_weekly_state_tested.csv"))
expect_setequal(!!dir(tdir), c("20200510_20200516_weekly_state_tested.csv.gz"))

df <- read_csv(file.path(tdir, "20200510_20200516_weekly_state_tested.csv"))
df <- read_csv(file.path(tdir, "20200510_20200516_weekly_state_tested.csv.gz"))
expect_equivalent(df, test_data)
})

Expand All @@ -59,13 +59,13 @@ test_that("testing command to create output filenames", {
end_date=as.Date("2021-01-02")
)
out <- get_file_name(params, "nation", c("gender"))
expected <- "DebugOn-DoNotShare_20210101_20210102_monthly_nation_gender.csv"
expected <- "DebugOn-DoNotShare_20210101_20210102_monthly_nation_gender.csv.gz"

expect_equal(out, expected)

params$debug <- FALSE
out <- get_file_name(params, "nation", c("gender", "race", "ethnicity"))
expected <- "20210101_20210102_monthly_nation_ethnicity_gender_race.csv"
expected <- "20210101_20210102_monthly_nation_ethnicity_gender_race.csv.gz"

expect_equal(out, expected)
})
4 changes: 3 additions & 1 deletion sir_complainsalot/params.json.template
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,9 @@
"smoothed_vaccine_barrier_type_has", "smoothed_wvaccine_barrier_type_has",
"smoothed_vaccine_barrier_none_has", "smoothed_wvaccine_barrier_none_has",
"smoothed_vaccine_barrier_appointment_location_has", "smoothed_wvaccine_barrier_appointment_location_has",
"smoothed_vaccine_barrier_other_has", "smoothed_wvaccine_barrier_other_has"
"smoothed_vaccine_barrier_other_has", "smoothed_wvaccine_barrier_other_has",
["smoothed_vaccine_barrier_appointment_location_tried", "county", "state"],
["smoothed_vaccine_barrier_other_tried", "county", "state"]
]
},
"quidel": {
Expand Down