style(black): format acquisition with `black`, line-length=200 #1186

dshemetov · 2023-05-27T03:04:03Z

Run the acquisition Python files through the code formatter black.

Summary:

Everything but covidcast (do that one after 1078 - Refactor csv_importer.py and csv_to_database.py #1103).
Also makes SonarCode happy.

Prerequisites:

Unless it is a documentation hotfix it should be merged against the dev branch
Branch is up-to-date with the branch to be merged with, i.e. dev
Build is successful
Code is cleaned up and formatted

sonarqubecloud · 2023-05-30T18:41:30Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
41 Code Smells

No Coverage information
10.8% Duplication

nmdefries

Looks good overall. There are some spots where I prefer the old line length-respecting format, for readability, but I don't feel that strongly about them.

nmdefries · 2023-06-02T14:17:13Z

src/acquisition/covid_hosp/common/utils.py

-      logger.warning(
-        "expensive operation",
-        msg="concatenating more than 7 files may result in long running times",
-        count=len(dfs))


suggestion [non-blocking]: On this line, I have a slight preference for the old wrapped formatting here for readability/maintaining line-character limit.

question: black's line-wrapping behavior seems inconsistent, e.g. sometimes it turns each element of a long list into its own line but sometimes it combines into one line lists that were already formatted with the multiline approach. Is it possible to change line wrapping to be more strict about line length? Or to tune it at all?

I see other lines (lots of Columndef("previous_day_admission_adult_covid_confirmed_20-29_7_day_sum", "previous_day_admission_adult_covid_confirmed_20_29_7_day_sum", int), in facility/database.py and in all the other variants of database.py) that had been wrapped manually to standard line length limits that black combined back into a single line.

I've noted other instances of line-wrapping changes where I prefer the old version. All are non-blocking.

black's line-wrap (roughly) tries to reformat any expression that contains lines with a line-length less than its line-length setting (set in pyproject.toml in this repo to 200 by Sam). see here for Black's docs on it.

so the inconsistency you're seeing is because it's only reformatting expressions with lines that are over 200 long.

i'm personally fine with having such a long line setting, but i know that it breaks from the programming standard of 80. even black's default setting is 100-ish.

the difficulty with setting black to short line-lengths is that it is very likely to break semantic spacing in that case, like in the example below, every line is long, but at least the formatting is consistent

Columndef("previous_day_admission_adult_covid_confirmed_18-19_7_day_sum", "previous_day_admission_adult_covid_confirmed_18_19_7_day_sum", int), Columndef("previous_day_admission_adult_covid_confirmed_20-29_7_day_sum", "previous_day_admission_adult_covid_confirmed_20_29_7_day_sum", int),

if we had set the default line-length shorter, many of these would turn into

Columndef( "previous_day_admission_adult_covid_confirmed_18-19_7_day_sum", "previous_day_admission_adult_covid_confirmed_18_19_7_day_sum", int ), Columndef( "previous_day_admission_adult_covid_confirmed_20-29_7_day_sum", "previous_day_admission_adult_covid_confirmed_20_29_7_day_sum", int ),

(though not all, if the string args were shorter, it would keep just that particular Columndef a single line)

i think both are acceptable and have tradeoffs (e.g. vertical space conservation, alignment, horizontal space conservation, consistency of formatting).

not sure how to arrive at a consensus in this space, where prefs will vary a lot.

so the inconsistency you're seeing is because it's only reformatting expressions with lines that are over 200 long.

Gotcha, didn't realize that the line-length setting was longer than the usual 80 characters.

If we're going to be running all of our repos through black, it would be nice to have them formatted with the same settings, so maybe it's worth messing around with those a bit here? Or do you thing the 200-length line is already as good as it gets for automated formatting?

Are we turning on automated linting after we finish this pass through black? Do we need to consider how black's settings will interact with that (e.g. trying to minimize errors we'll need to fix manually)?

not sure how to arrive at a consensus in this space, where prefs will vary a lot.

I want to emphasize that my preferences here are very weak. As long as it's reasonable/readable, it's fine.

I'm with you on the weak preferences thing: I've just gotten used to 200 in this repo, but I will let it go for consistency with e.g. covidcast-indicators (where we use pylint to enforce a default 100 line-length)

Let me give the 100 line-length a shot in a separate PR and see how different it looks

.editorconfig

nmdefries · 2023-06-02T18:18:11Z

src/acquisition/covid_hosp/facility/database.py

+        Columndef("previous_day_admission_adult_covid_confirmed_18-19_7_day_sum", "previous_day_admission_adult_covid_confirmed_18_19_7_day_sum", int),
+        Columndef("previous_day_admission_adult_covid_confirmed_20-29_7_day_sum", "previous_day_admission_adult_covid_confirmed_20_29_7_day_sum", int),
+        Columndef("previous_day_admission_adult_covid_confirmed_30-39_7_day_sum", "previous_day_admission_adult_covid_confirmed_30_39_7_day_sum", int),


note: lots of stuff here that black recombined into a single line that exceeds standard line length limits. See question above for more info.

nmdefries · 2023-06-02T18:21:14Z

src/acquisition/covid_hosp/state_daily/database.py

+        Columndef("total_adult_patients_hospitalized_confirmed_covid", "total_adult_patients_hosp_confirmed_covid", int),
+        Columndef("total_adult_patients_hospitalized_confirmed_covid_coverage", "total_adult_patients_hosp_confirmed_covid_coverage", int),


note: more line-combining here.

nmdefries · 2023-06-02T18:33:52Z

src/acquisition/quidel/quidel.py

+        self.xlsx_uptodate_list = [f[:-5] for f in listdir(self.excel_uptodate_path) if isfile(join(self.excel_uptodate_path, f)) and f[-5:] == ".xlsx"]
+        self.xlsx_history_list = [f[:-5] for f in listdir(self.excel_history_path) if isfile(join(self.excel_history_path, f)) and f[-5:] == ".xlsx"]
+        self.csv_list = [f[:-4] for f in listdir(self.csv_path) if isfile(join(self.csv_path, f)) and f[-4:] == ".csv"]


note: another line recombine.

src/acquisition/paho/paho_download.py

nmdefries · 2023-06-02T18:38:35Z

src/acquisition/norostat/norostat_raw.py

-  long_raw = (long_raw_df, release_date, parse_time, location)
-  return long_raw
+    (wide_raw_df, release_date, parse_time, location) = wide_raw
+    long_raw_df = wide_raw_df.melt(id_vars=["Week"], var_name="measurement_type", value_name="value").rename(index=str, columns={"Week": "week"})


note: another line recombine

nmdefries · 2023-06-02T18:42:45Z

src/acquisition/kcdc/kcdc_update.py

-    }
+    last_season = issue // 100 + (1 if issue % 100 > 35 else 0)
+    url = "https://www.cdc.go.kr/npt/biz/npp/iss/influenzaListAjax.do"
+    params = {"icdNm": "influenza", "startYear": "2004", "endYear": str(last_season)}  # Started in 2004


note: another line recombine.

nmdefries · 2023-06-02T18:45:51Z

src/acquisition/fluview/fluview_update.py

-    'percent_a': nullable_float(row[8]),
-    'percent_b': nullable_float(row[9])
-  }
+    if row[0] == "REGION TYPE" and row != ["REGION TYPE", "REGION", "YEAR", "WEEK", "TOTAL SPECIMENS", "TOTAL A", "TOTAL B", "PERCENT POSITIVE", "PERCENT A", "PERCENT B"]:


note: another line recombine.

nmdefries · 2023-06-02T18:46:13Z

src/acquisition/fluview/fluview_update.py

+    hrow1 = ["REGION TYPE", "REGION", "SEASON_DESCRIPTION", "TOTAL SPECIMENS", "A (2009 H1N1)", "A (H3)", "A (Subtyping not Performed)", "B", "BVic", "BYam", "H3N2v"]
+    hrow2 = ["REGION TYPE", "REGION", "YEAR", "WEEK", "TOTAL SPECIMENS", "A (2009 H1N1)", "A (H3)", "A (Subtyping not Performed)", "B", "BVic", "BYam", "H3N2v"]


note: another line recombine.

nmdefries · 2023-06-20T22:26:08Z

May not be important given our general preference for the 100-length line version, but this is missing the pyproject.toml file.

krivard

most of my irritation in this one comes from:

things that should be f-strings
statements that were split across multiple lines before but are now collapsed onto a single line and significantly less legible

krivard · 2023-06-22T18:35:17Z

src/acquisition/fluview/fluview_update.py

+    insert = cnx.cursor()
+    for row in entries:
+        lag = delta_epiweeks(row["epiweek"], issue)
+        args = [row["total_specimens"], row["total_a_h1n1"], row["total_a_h3"], row["total_a_h3n2v"], row["total_a_no_sub"], row["total_b"], row["total_b_vic"], row["total_b_yam"]]


another line recombine

krivard · 2023-06-22T18:35:36Z

src/acquisition/fluview/fluview_update.py

+    insert = cnx.cursor()
+    for row in entries:
+        lag = delta_epiweeks(row["epiweek"], issue)
+        args = [row["n_ili"], row["n_patients"], row["n_providers"], row["wili"], row["ili"], row["age0"], row["age1"], row["age2"], row["age3"], row["age4"], row["age5"]]


another line recombine

krivard · 2023-06-22T18:41:34Z

src/acquisition/norostat/norostat_add_history.py

+    print(
+        'Successfully uploaded the following snapshots, with the count indicating the number of data-table versions found inside each snapshot (expected to be 1, or maybe 2 if there was a change in capitalization; 0 indicates the NoroSTAT page was not found within a snapshot directory); just "Counter()" indicates no snapshot directories were found:',
+        snapshot_version_counter,
+    )


this isn't great but there aren't really any good options here

fwiw, i like this convention for breaking long strings in Pandas

turns out Python just automatically concatenates strings like that

krivard · 2023-06-22T18:42:50Z

src/acquisition/norostat/norostat_raw.py

+    expect_value_eq(resp.status_code, 200, "Wanted status code {}.  Received: ")
+    expect_value_eq(resp.headers.get("Content-Type"), "text/html", 'Expected Content-Type "{}"; Received ')


this is worse

krivard · 2023-06-22T18:45:08Z

src/acquisition/norostat/norostat_sql.py

+          DROP TABLE IF EXISTS `norostat_point_diffs`,
+                              `norostat_point_version_list`,


this misalignment would make me itch

fwiw, i don't think black touches the formatting in multi-line strings. so this is just there from the original author 🤷

something modified the spacing inside the quotes:

The original has DROP TABLE indented by 6 spaces; the new one has DROP TABLE indented by 10. I do not know why it added 4 spaces to DROP TABLE and only 3 to norostat_point_version_list

ah you're right, very weird. im going to see if it will try to format it back, if I add one more space in here

krivard · 2023-06-22T18:57:28Z

src/acquisition/twtr/healthtweets.py

+    values = ht.get_values(args.state, args.date1, args.date2)
+    print("Daily counts in %s from %s to %s:" % (ht.check_state(args.state), args.date1, args.date2))
+    for date in sorted(list(values.keys())):
+        print("%s: num=%-4d total=%-5d (%.3f%%)" % (date, values[date]["num"], values[date]["total"], 100 * values[date]["num"] / values[date]["total"]))


better as an f-string

krivard · 2023-06-22T18:58:48Z

src/acquisition/wiki/wiki.py

+    # step 2: run a few jobs
+    print("running jobs...")
+    try:
+        wiki_download.run(secrets.wiki.hmac, download_limit=1024 * 1024 * 1024, job_limit=12)


this is worse

krivard · 2023-06-22T20:44:06Z

src/acquisition/wiki/wiki_download.py

+        for line in f:
+            content = line.strip().split()
+            if len(content) != 4:
+                print("unexpected article format: {0}".format(line))


this can be an f-string

krivard · 2023-06-22T20:44:40Z

src/acquisition/wiki/wiki_download.py

+    for article in articles:
+        if debug_mode:
+            print(" %s" % (article))
+        out = text(subprocess.check_output('LC_ALL=C grep -a -i "^en %s " raw2 | cat' % (article.lower()), shell=True)).strip()


line collapsed; this one is particularly gross

krivard · 2023-06-22T20:45:19Z

src/acquisition/wiki/wiki_download.py

+            # year, month = int(job['name'][11:15]), int(job['name'][15:17])
+            year, month = int(job["name"][10:14]), int(job["name"][14:16])
+            # print 'year=%d | month=%d'%(year, month)
+            url = "https://dumps.wikimedia.org/other/pageviews/%d/%d-%02d/%s" % (year, year, month, job["name"])


this would be better as an f-string

dshemetov · 2023-06-26T19:48:38Z

Closing in favor of #1189

dshemetov added 19 commits May 26, 2023 13:36

style(black): acquisition afhsb

3840136

style(black): acquisition cdcp

546742b

style(black): acquisition covid_hosp

46b28c8

style(black): acquisition covidcast_nowcast

39146ec

style(black): acquisition ecdc

2dad3e8

style(black): acquisition flusurv

61427c5

style(black): acquisition fluview

87f1fac

style(black): acquisition ght

a4da4d2

style(black): acquisition kcdc

8f33ff5

style(black): acquisition nidss

2285514

style(black): acquisition norostat

7c2331d

style(black): acquisition paho

355b8f3

style(black): acquisition quidel

9bf4b91

style(black): acquisition twitter

d07d9f4

style(black): acquisition wiki

1278d27

style: add .editorconfig

43dc3b1

gh: add .git-blame-ignore-revs

35f5836

ci(sonar): fix security warning with tempfiles

3823c75

ci(sonar): https links wow amazing

973e296

dshemetov requested a review from nmdefries May 30, 2023 21:41

nmdefries approved these changes Jun 2, 2023

View reviewed changes

dshemetov mentioned this pull request Jun 5, 2023

style(black): format acquisition with black, line-length=100 #1189

Merged

4 tasks

krivard changed the title ~~style(black): format acquisition~~ style(black): format acquisition (length=200) Jun 21, 2023

dshemetov changed the title ~~style(black): format acquisition (length=200)~~ style(black): format acquisition with black, line-length=200 Jun 21, 2023

krivard reviewed Jun 22, 2023

View reviewed changes

dshemetov closed this Jun 26, 2023

dshemetov deleted the ds/format branch June 26, 2023 19:48

		Columndef("total_adult_patients_hospitalized_confirmed_covid", "total_adult_patients_hosp_confirmed_covid", int),
		Columndef("total_adult_patients_hospitalized_confirmed_covid_coverage", "total_adult_patients_hosp_confirmed_covid_coverage", int),

		hrow1 = ["REGION TYPE", "REGION", "SEASON_DESCRIPTION", "TOTAL SPECIMENS", "A (2009 H1N1)", "A (H3)", "A (Subtyping not Performed)", "B", "BVic", "BYam", "H3N2v"]
		hrow2 = ["REGION TYPE", "REGION", "YEAR", "WEEK", "TOTAL SPECIMENS", "A (2009 H1N1)", "A (H3)", "A (Subtyping not Performed)", "B", "BVic", "BYam", "H3N2v"]

		expect_value_eq(resp.status_code, 200, "Wanted status code {}. Received: ")
		expect_value_eq(resp.headers.get("Content-Type"), "text/html", 'Expected Content-Type "{}"; Received ')

		DROP TABLE IF EXISTS `norostat_point_diffs`,
		`norostat_point_version_list`,

style(black): format acquisition with black, line-length=200 #1186

style(black): format acquisition with black, line-length=200 #1186

Uh oh!

Conversation

dshemetov commented May 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary:

Prerequisites:

Uh oh!

sonarqubecloud bot commented May 30, 2023

Uh oh!

nmdefries left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dshemetov Jun 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dshemetov Jun 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nmdefries commented Jun 20, 2023

Uh oh!

krivard left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dshemetov commented Jun 26, 2023

style(black): format acquisition with `black`, line-length=200 #1186

style(black): format acquisition with `black`, line-length=200 #1186

dshemetov commented May 27, 2023 •

edited

Loading

dshemetov Jun 2, 2023 •

edited

Loading

dshemetov Jun 5, 2023 •

edited

Loading