Skip to content

Fix CPR bugs associated with non-unique timestamps by publish date #1562

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Apr 6, 2022

Conversation

nmdefries
Copy link
Contributor

Description

Fix two CPR bugs:

  • Some asserts are too strict given earlier logic.
  • On 2021-03-17, the reference dates for test volume and test positivity changed from being the same within a report to being different. This caused some timestamps to appear for multiple report dates, which the existing logic can't handle appropriately. Fix by splitting data into before/after that date and processing separately.

Changelog

  • pull.py
  • tests

Fixes

Closes #1556.

Since deduplication keeps latest publish date for a given timestamp, it
is possible for an older publish date to have fewer than two unique
timestamps per geo region-signal type.

tune asserts
@nmdefries nmdefries requested a review from krivard March 23, 2022 10:17
Copy link
Contributor

@krivard krivard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some missing documentation, and I may also be getting confused about one context of two reference dates ("last week" "previous week") and another ("last week" positivity vs volume after 2021-03-17)

Comment on lines 646 to 647
"Each publish date-geo value combination should be available for both " + \
"test positivity and test volume."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Community_Profile_Report_20220225.xlsx has 158 counties where positivity is not available for "last week" but volume is, and 7 counties where volume is not available but positivity is. Will this assert inappropriately reject that file? Or do you mean that previous steps should have already dropped the entries with mismatched availability?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wording may be ambiguous on my part. "available" here means that there is a row for a given region and date. It doesn't check if the signal is NAor not. pandas.read_excel reads empty fields as NA/NaN, so obs are included as NA where the signal was not available in the original report.

The way the sample size filter is set up at the moment, any obs with missing (NA) n or n <= 5 are dropped.

Copy link
Contributor Author

@nmdefries nmdefries Mar 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rows with missing val (test positivity) but sample size (test volume) available and > 5 are included.

Copy link
Contributor

@krivard krivard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! 👍 want me to wait for ananya before merging or no?

@nmdefries
Copy link
Contributor Author

Prefer to wait for @Ananya-Joshi to take a look given the complexity of some of the logic

@Ananya-Joshi
Copy link
Contributor

Tests and some sample dates I looked at from march 2022 seem to be working fine/generally in the right range. Happy to approve pending some information on the comments above. Good job Nat!

@nmdefries nmdefries requested a review from Ananya-Joshi April 6, 2022 14:17
@krivard krivard merged commit 4c03f4e into main Apr 6, 2022
@krivard krivard deleted the ndefries/cpr-lenient-check-ts-per-publishdate branch April 6, 2022 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DSEW-CPR testing signals fail for Community_Profile_Report_20220225
3 participants