simplify Dockerfile #113

bochocki · 2023-05-09T22:47:46Z

Overview

This PR uses updated versions of Python and prophet to greatly simplify the python environment setup in the Dockerfile. The code has been tested by creating a local Docker container, and sample outputs were written to the following tables in moz-fx-data-bq-data-science.bochocki:

tmp_desktop_kpi_forecast
tmp_desktop_kpi_forecast_confidences
tmp_mobile_kpi_forecast
tmp_mobile_kpi_forecast_confidences

Additional Changes

.gitignore: ignore additional filetypes
kpi_forecasting.py: set confidence intervals target from config instead of relying on hardcoded "desktop". This target is overwritten in write_confidence_intervals_to_bigquery here, but I think this change makes the it clear that we're not unintentionally using "desktop" labels on "mobile" forecasts.
PosteriorSampling.py: minor refactoring required to resolve errors and deprecation warnings that are now being raised by pandas as a result of package upgrades.
README.md: update examples
requirements.txt: updated packages to get easier-install versions of prophet and statsforecast.

Checklist for reviewer:

Commits should reference a bug or github issue, if relevant (if a bug is
referenced, the pull request should include the bug number in the title)
Scan the PR and verify that no changes (particularly to
.circleci/config.yml) will cause environment variables (particularly
credentials) to be exposed in test logs
Ensure the container image will be using permissions granted to
telemetry-airflow
responsibly.

## Overview This PR uses updated versions of Python and `prophet` to greatly simplify the python environment setup in the Dockerfile. The code has been tested by creating a local Docker container, and sample outputs were written to the following tables in `moz-fx-data-bq-data-science.bochocki`: - `tmp_desktop_kpi_forecast` - `tmp_desktop_kpi_forecast_confidences` - `tmp_mobile_kpi_forecast` - `tmp_mobile_kpi_forecast_confidences` ## Additional Changes - `.gitignore`: ignore additional filetypes - `kpi_forecasting.py`: set confidence intervals `target` from `config` instead of relying on hardcoded `"desktop"`. This `target` is overwritten in `write_confidence_intervals_to_bigquery` [here](https://github.com/mozilla/docker-etl/blob/4cfbec915375343023944d1ca23f527251a5ada8/jobs/kpi-forecasting/kpi-forecasting/Utils/DBWriter.py#L116), but I think this change makes the it clear that we're not unintentionally using "desktop" labels on "mobile" forecasts. - `PosteriorSampling.py`: minor refactoring required to resolve errors and deprecation warnings that are now being raised by pandas as a result of package upgrades. - `README.md`: update examples - `requirements.txt`: updated packages to get easier-install versions of `prophet` and `statsforecast`.

bochocki · 2023-05-09T22:52:13Z

jobs/kpi-forecasting/kpi-forecasting/Utils/PosteriorSampling.py

@@ -31,10 +31,9 @@ def get_confidence_intervals(
            uncertainty_samples["ds"] > np.datetime64(final_observed_sample_date)
        ]
        .groupby("{}".format(aggregation_unit_of_time))
-        .sum()
+        .sum(numeric_only=True)


Applying vanilla sum over a dataframe with non-numeric columns raises a deprection warning.

bochocki · 2023-05-09T22:54:55Z

jobs/kpi-forecasting/kpi-forecasting/Utils/PosteriorSampling.py

+        uncertainty_samples_aggregated.iloc[0, 1:] += observed_aggregated["value"].iloc[
+            -1
+        ]


This is the same intended logic as before, but the previous code doesn't work in new versions of pandas because observed_aggregated.iloc[-1].value doesn't return a single value, it returns an array of values. Using the . column access method was also confusing, because at first glance it looks like a typo of .values which casts a pandas column to a numpy array.

bochocki · 2023-05-09T23:00:59Z

jobs/kpi-forecasting/kpi-forecasting/Utils/PosteriorSampling.py

@@ -71,6 +70,8 @@ def get_confidence_intervals(
        columns={"y": "value"}
    ).sort_values(by="{}".format(aggregation_unit_of_time))

+    observed_aggregated = observed_aggregated.astype({"value": np.float64})


observed_aggregated["value"] is being stored as an Int64Dtype, which is a pandas type for storing large integers. For some reason, using this type breaks the following merge on line 100:

all_aggregated = pd.merge( observed_aggregated, uncertainty_samples_aggregated, on=["{}".format(aggregation_unit_of_time), "value", "type"], how="outer", )

I think using float64 instead is an okay workaround here, since the values in the confidence intervals are reported as float64 anyways.

This reverts commit 27229dd.

perrymcmanis144

Very happy to see this PR. LGTM and matches the expectations I had about this work based on prior conversations we've had 👍

bochocki commented May 9, 2023

View reviewed changes

bochocki requested a review from perrymcmanis144 May 9, 2023 23:01

bochocki added 4 commits May 9, 2023 16:07

black format

d230e99

change MAINTAINER label

27229dd

Revert "change MAINTAINER label"

02b5f2a

This reverts commit 27229dd.

include pytest-black

f627ae7

bochocki enabled auto-merge (squash) May 9, 2023 23:36

bochocki requested a review from irrationalagent May 9, 2023 23:36

perrymcmanis144 approved these changes May 10, 2023

View reviewed changes

bochocki merged commit 545c6f9 into main May 10, 2023

bochocki deleted the kpi-forecasting_simplify-dockerfile branch May 10, 2023 13:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

simplify Dockerfile #113

simplify Dockerfile #113

Uh oh!

bochocki commented May 9, 2023

Uh oh!

bochocki May 9, 2023

Uh oh!

bochocki May 9, 2023

Uh oh!

bochocki May 9, 2023

Uh oh!

perrymcmanis144 left a comment

Uh oh!

Uh oh!

simplify Dockerfile #113

simplify Dockerfile #113

Uh oh!

Conversation

bochocki commented May 9, 2023

Overview

Additional Changes

Uh oh!

bochocki May 9, 2023

Choose a reason for hiding this comment

Uh oh!

bochocki May 9, 2023

Choose a reason for hiding this comment

Uh oh!

bochocki May 9, 2023

Choose a reason for hiding this comment

Uh oh!

perrymcmanis144 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!