Skip to content

Doctor Visits show an unexpected weekly trend #2044

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
nolangormley opened this issue Aug 30, 2024 · 6 comments
Open

Doctor Visits show an unexpected weekly trend #2044

nolangormley opened this issue Aug 30, 2024 · 6 comments
Assignees
Labels
data quality Missing data, weird data, broken data

Comments

@nolangormley
Copy link
Contributor

Actual Behavior:

When looking at the data from the Doctor Visits signal, it shows a weekly trend, similar to signals that have weekly reporting (where there is a spike at the beginning of the week).

docvisit

Expected behavior

Roni and I were looking through this yesterday and didn't seem to understand why this was. Since this is a daily reported signal, we expected it to be much more smooth.

Context

Here's some code to replicate the plot above

import wget

docvisit = wget.download("https://api.covidcast.cmu.edu/epidata/covidcast/csv?signal=doctor-visits:smoothed_cli&start_day=2024-05-29&end_day=2024-08-29&geo_type=nation")
docvisitadj = wget.download("https://api.covidcast.cmu.edu/epidata/covidcast/csv?signal=doctor-visits:smoothed_adj_cli&start_day=2024-05-29&end_day=2024-08-29&geo_type=nation")

df = pd.read_csv("covidcast-doctor-visits-smoothed_cli-2024-05-29-to-2024-08-29.csv")
dfadj = pd.read_csv("covidcast-doctor-visits-smoothed_adj_cli-2024-05-29-to-2024-08-29.csv")

df.time_value = pd.to_datetime(df.time_value, utc=True)
dfadj.time_value = pd.to_datetime(dfadj.time_value, utc=True)
dfadj = dfadj[['time_value', 'value']].rename(columns={'time_value':'time_value', 'value':'valueadj'})

foo = df[['time_value', 'value']].merge(dfadj, on='time_value', how='left')
foo.plot(x='time_value', y=['value', 'valueadj'])
@nolangormley nolangormley added the data quality Missing data, weird data, broken data label Aug 30, 2024
@nolangormley nolangormley self-assigned this Aug 30, 2024
@RoniRos
Copy link
Member

RoniRos commented Sep 1, 2024

Back in July Peter and I noticed this pattern in the "hospital admissions" signals in Texas. Dmitry investigated it and concluded (1) it is already present in the raw signal we receive; and (2) in that signal it is only present in data from Texas but not from other states. The current signal (Doctors Visits) is from the same source. :-(

@dshemetov Did I remember your conclusions correctly? And did we ever file an issue about it? If so, we should link/consolidate them.

@dshemetov
Copy link
Contributor

dshemetov commented Oct 2, 2024

Hi @RoniRos, sorry this slipped my radar.

As to (1): I only guessed that it wasn't in the raw data in hospital-admissions. I looked at the data with weekday effects removed (unadjusted) and found that it still had this anomaly, but the unadjusted signal is still downstream of a "left Gaussian linear" smoother, which could have a bug. My personal hunch was that a smoother was unlikely to lead to this pattern, but you had the opposite intuition. I didn't try to look for pre-smoother raw data, have had very few extra cycles for this investigation.

As to (2): I didn't do a comprehensive comparison to all the states, but a handful or so, and Texas was the only one with this anomaly.

@RoniRos
Copy link
Member

RoniRos commented May 23, 2025

A related observation: the dow (day-of-week) adjusted signal exhibits numerous "jumps" that don't seem to reflect the original data. Perhaps the most egregious example in the above indicator pair (doctor-visits:smoothed_cli and signal=doctor-visits:smoothed_adj_cli) for "United States" is on 2024-03-25 -- 2024-03-28 (M-Th), where the dow-adjusted signal is significantly bumped up but the original doesn't show any changes. It is also visible in individual states at the same time, e.g. PA, CA, TX.
Is this a bug in the dow adjustment algorithm? There shouldn't be a holiday or other calendar effect in late March.

@melange396
Copy link
Contributor

(Epivis link to the time window and signals Roni just referenced)

I think this is probably caused by a bug in an engine used by the solver in the weekday effects adjustment routine, similar to what was noticed in issue #1915. Fixes were made in PR #1966 and PR #1975, and then released to production on July 10, 2024.

@RoniRos
Copy link
Member

RoniRos commented May 23, 2025

Thanks George! So have the fixes been applied retroactively? Given the current behavior for that period, I assume the answer is "no". Is it on our stack to apply it at some point? Maybe together with the move to Airflow?

@melange396
Copy link
Contributor

It definitely was on the docket at one point; it must've gotten punted so that we could do it with the new pipeline system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data quality Missing data, weird data, broken data
Projects
None yet
Development

No branches or pull requests

4 participants