Description
Because the CPR only includes one (sometimes two) reference dates for each signal, and because the CPR is not published on weekends, the resulting COVIDcast signals are only available 5 days a week. See the green line in this timeseries chart:
We should use simple interpolation to fill these gaps retrospectively (once the next file becomes available). Extrapolation on days where no new files are posted is left as a future research project.
We'll need to do something like this:
- When computing the list of CPR files to process, extend the series backwards by one file ("additional CPR file")
- Process the CPR files into a df as usual
- Expand the df to cover a contiguous date sequence, NA-filling
- Impute the missing values
- Drop the per-signal reference dates in the additional CPR file
- Aggregate to nation, export, etc as usual
Complication
The existing imputation utility in delphi_utils.smooth
assumes we only ever impute a value for date X using data from dates Y<X. We will probably need to add support for some kind of symmetrical mode. Expected signal data for this change looks like [x1, NA, NA, x4]
-- literally, since these gaps mostly occur at weekends. x1
is from the additional CPR file, x4
is from today's CPR file, and we want to publish the two imputed values and x4
. This probably means a linear or other low-degree fit that doesn't need a lot of context to work.
This revision must be completed before we can begin showing county-level hospital admissions in the visualizations on the website.