Skip to content

Return google symptoms to using current pandas version, with mitigations #1497

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Oct 17, 2022

Conversation

nmdefries
Copy link
Contributor

@nmdefries nmdefries commented Jan 27, 2022

Description

Followup to #1496.

Adapt pipeline to work correctly with new version of pandas. Add test to check for nulled-out rows. Fails when using pandas 1.4.0 and passes when using 1.3.5, which matches expected behavior.

Turn on numeric_only=True to suppress future deprecation warning.

Changelog

  • setup.py
  • pull.py
  • geo.py

Fixes

  • Replace deprecated DataFrame.append().
  • Explicitly convert dbdate to datetime64 to prevent data from being nulled out.

Bug probably related to the datetime-like changes in 1.4.0 and an issue they fixed, pandas-dev/pandas#42921

@nmdefries nmdefries marked this pull request as ready for review September 30, 2022 17:40
@nmdefries nmdefries requested a review from krivard September 30, 2022 18:12
@@ -37,7 +37,7 @@ def generate_transition_matrix(geo_res):
if geo_res == "hrr":
map_df["population"] = map_df["population"] * map_df["weight"]

aggregated_pop = map_df.groupby(geo_res).sum().reset_index()
aggregated_pop = map_df.groupby(geo_res).sum(numeric_only=True).reset_index()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering about the numeric_only=True. In the function arg type hint, it says there's no default, but in the docs below that it says it defaults to bool=True. Guessing that it actually tries to use NANs by default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this because I'm getting a warning about to-be-deprecated behavior. Thought I'd just throw it in here.

FutureWarning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated. 
In a future version, numeric_only will default to False. Either specify numeric_only or select 
only columns which should be valid for the function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current default does actually seem to be numeric_only=True

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks for that! Curious why they're moving away from numeric_only=True.

Copy link
Contributor

@krivard krivard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good so long as we expect to only ever iterate over a small number of timestamps

@nmdefries
Copy link
Contributor Author

@krivard This is ready to merge.

@krivard krivard merged commit 2d9df04 into main Oct 17, 2022
@krivard krivard deleted the ndefries/gs-deprecated-pandas-fns branch October 17, 2022 13:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants