Skip to content

style(black): format acquisition with black, line-length=200 #1186

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 19 commits into from

Conversation

dshemetov
Copy link
Contributor

@dshemetov dshemetov commented May 27, 2023

Run the acquisition Python files through the code formatter black.

Summary:

Prerequisites:

  • Unless it is a documentation hotfix it should be merged against the dev branch
  • Branch is up-to-date with the branch to be merged with, i.e. dev
  • Build is successful
  • Code is cleaned up and formatted

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 41 Code Smells

No Coverage information No Coverage information
10.8% 10.8% Duplication

@dshemetov dshemetov requested a review from nmdefries May 30, 2023 21:41
Copy link
Contributor

@nmdefries nmdefries left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. There are some spots where I prefer the old line length-respecting format, for readability, but I don't feel that strongly about them.

Comment on lines -152 to -155
logger.warning(
"expensive operation",
msg="concatenating more than 7 files may result in long running times",
count=len(dfs))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion [non-blocking]: On this line, I have a slight preference for the old wrapped formatting here for readability/maintaining line-character limit.

question: black's line-wrapping behavior seems inconsistent, e.g. sometimes it turns each element of a long list into its own line but sometimes it combines into one line lists that were already formatted with the multiline approach. Is it possible to change line wrapping to be more strict about line length? Or to tune it at all?

I see other lines (lots of Columndef("previous_day_admission_adult_covid_confirmed_20-29_7_day_sum", "previous_day_admission_adult_covid_confirmed_20_29_7_day_sum", int), in facility/database.py and in all the other variants of database.py) that had been wrapped manually to standard line length limits that black combined back into a single line.

I've noted other instances of line-wrapping changes where I prefer the old version. All are non-blocking.

Copy link
Contributor Author

@dshemetov dshemetov Jun 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

black's line-wrap (roughly) tries to reformat any expression that contains lines with a line-length less than its line-length setting (set in pyproject.toml in this repo to 200 by Sam). see here for Black's docs on it.

so the inconsistency you're seeing is because it's only reformatting expressions with lines that are over 200 long.

i'm personally fine with having such a long line setting, but i know that it breaks from the programming standard of 80. even black's default setting is 100-ish.

the difficulty with setting black to short line-lengths is that it is very likely to break semantic spacing in that case, like in the example below, every line is long, but at least the formatting is consistent

Columndef("previous_day_admission_adult_covid_confirmed_18-19_7_day_sum", "previous_day_admission_adult_covid_confirmed_18_19_7_day_sum", int),
Columndef("previous_day_admission_adult_covid_confirmed_20-29_7_day_sum", "previous_day_admission_adult_covid_confirmed_20_29_7_day_sum", int),

if we had set the default line-length shorter, many of these would turn into

Columndef(
    "previous_day_admission_adult_covid_confirmed_18-19_7_day_sum",
    "previous_day_admission_adult_covid_confirmed_18_19_7_day_sum", 
    int
),
Columndef(
    "previous_day_admission_adult_covid_confirmed_20-29_7_day_sum",
    "previous_day_admission_adult_covid_confirmed_20_29_7_day_sum", 
    int
),

(though not all, if the string args were shorter, it would keep just that particular Columndef a single line)

i think both are acceptable and have tradeoffs (e.g. vertical space conservation, alignment, horizontal space conservation, consistency of formatting).

not sure how to arrive at a consensus in this space, where prefs will vary a lot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the inconsistency you're seeing is because it's only reformatting expressions with lines that are over 200 long.

Gotcha, didn't realize that the line-length setting was longer than the usual 80 characters.

If we're going to be running all of our repos through black, it would be nice to have them formatted with the same settings, so maybe it's worth messing around with those a bit here? Or do you thing the 200-length line is already as good as it gets for automated formatting?

Are we turning on automated linting after we finish this pass through black? Do we need to consider how black's settings will interact with that (e.g. trying to minimize errors we'll need to fix manually)?

not sure how to arrive at a consensus in this space, where prefs will vary a lot.

I want to emphasize that my preferences here are very weak. As long as it's reasonable/readable, it's fine.

Copy link
Contributor Author

@dshemetov dshemetov Jun 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm with you on the weak preferences thing: I've just gotten used to 200 in this repo, but I will let it go for consistency with e.g. covidcast-indicators (where we use pylint to enforce a default 100 line-length)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me give the 100 line-length a shot in a separate PR and see how different it looks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +68 to +70
Columndef("previous_day_admission_adult_covid_confirmed_18-19_7_day_sum", "previous_day_admission_adult_covid_confirmed_18_19_7_day_sum", int),
Columndef("previous_day_admission_adult_covid_confirmed_20-29_7_day_sum", "previous_day_admission_adult_covid_confirmed_20_29_7_day_sum", int),
Columndef("previous_day_admission_adult_covid_confirmed_30-39_7_day_sum", "previous_day_admission_adult_covid_confirmed_30_39_7_day_sum", int),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: lots of stuff here that black recombined into a single line that exceeds standard line length limits. See question above for more info.

Comment on lines +127 to +128
Columndef("total_adult_patients_hospitalized_confirmed_covid", "total_adult_patients_hosp_confirmed_covid", int),
Columndef("total_adult_patients_hospitalized_confirmed_covid_coverage", "total_adult_patients_hosp_confirmed_covid_coverage", int),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: more line-combining here.

Comment on lines +91 to +93
self.xlsx_uptodate_list = [f[:-5] for f in listdir(self.excel_uptodate_path) if isfile(join(self.excel_uptodate_path, f)) and f[-5:] == ".xlsx"]
self.xlsx_history_list = [f[:-5] for f in listdir(self.excel_history_path) if isfile(join(self.excel_history_path, f)) and f[-5:] == ".xlsx"]
self.csv_list = [f[:-4] for f in listdir(self.csv_path) if isfile(join(self.csv_path, f)) and f[-4:] == ".csv"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: another line recombine.

long_raw = (long_raw_df, release_date, parse_time, location)
return long_raw
(wide_raw_df, release_date, parse_time, location) = wide_raw
long_raw_df = wide_raw_df.melt(id_vars=["Week"], var_name="measurement_type", value_name="value").rename(index=str, columns={"Week": "week"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: another line recombine

}
last_season = issue // 100 + (1 if issue % 100 > 35 else 0)
url = "https://www.cdc.go.kr/npt/biz/npp/iss/influenzaListAjax.do"
params = {"icdNm": "influenza", "startYear": "2004", "endYear": str(last_season)} # Started in 2004
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: another line recombine.

'percent_a': nullable_float(row[8]),
'percent_b': nullable_float(row[9])
}
if row[0] == "REGION TYPE" and row != ["REGION TYPE", "REGION", "YEAR", "WEEK", "TOTAL SPECIMENS", "TOTAL A", "TOTAL B", "PERCENT POSITIVE", "PERCENT A", "PERCENT B"]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: another line recombine.

Comment on lines +218 to +219
hrow1 = ["REGION TYPE", "REGION", "SEASON_DESCRIPTION", "TOTAL SPECIMENS", "A (2009 H1N1)", "A (H3)", "A (Subtyping not Performed)", "B", "BVic", "BYam", "H3N2v"]
hrow2 = ["REGION TYPE", "REGION", "YEAR", "WEEK", "TOTAL SPECIMENS", "A (2009 H1N1)", "A (H3)", "A (Subtyping not Performed)", "B", "BVic", "BYam", "H3N2v"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: another line recombine.

@nmdefries
Copy link
Contributor

May not be important given our general preference for the 100-length line version, but this is missing the pyproject.toml file.

@krivard krivard changed the title style(black): format acquisition style(black): format acquisition (length=200) Jun 21, 2023
@dshemetov dshemetov changed the title style(black): format acquisition (length=200) style(black): format acquisition with black, line-length=200 Jun 21, 2023
Copy link
Contributor

@krivard krivard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most of my irritation in this one comes from:

  • things that should be f-strings
  • statements that were split across multiple lines before but are now collapsed onto a single line and significantly less legible

insert = cnx.cursor()
for row in entries:
lag = delta_epiweeks(row["epiweek"], issue)
args = [row["total_specimens"], row["total_a_h1n1"], row["total_a_h3"], row["total_a_h3n2v"], row["total_a_no_sub"], row["total_b"], row["total_b_vic"], row["total_b_yam"]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another line recombine

insert = cnx.cursor()
for row in entries:
lag = delta_epiweeks(row["epiweek"], issue)
args = [row["n_ili"], row["n_patients"], row["n_providers"], row["wili"], row["ili"], row["age0"], row["age1"], row["age2"], row["age3"], row["age4"], row["age5"]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another line recombine

Comment on lines +40 to +43
print(
'Successfully uploaded the following snapshots, with the count indicating the number of data-table versions found inside each snapshot (expected to be 1, or maybe 2 if there was a change in capitalization; 0 indicates the NoroSTAT page was not found within a snapshot directory); just "Counter()" indicates no snapshot directories were found:',
snapshot_version_counter,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't great but there aren't really any good options here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, i like this convention for breaking long strings in Pandas

turns out Python just automatically concatenates strings like that

Comment on lines +31 to +32
expect_value_eq(resp.status_code, 200, "Wanted status code {}. Received: ")
expect_value_eq(resp.headers.get("Content-Type"), "text/html", 'Expected Content-Type "{}"; Received ')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is worse

Comment on lines +137 to +138
DROP TABLE IF EXISTS `norostat_point_diffs`,
`norostat_point_version_list`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this misalignment would make me itch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, i don't think black touches the formatting in multi-line strings. so this is just there from the original author 🤷

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something modified the spacing inside the quotes:
image

The original has DROP TABLE indented by 6 spaces; the new one has DROP TABLE indented by 10. I do not know why it added 4 spaces to DROP TABLE and only 3 to norostat_point_version_list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah you're right, very weird. im going to see if it will try to format it back, if I add one more space in here

values = ht.get_values(args.state, args.date1, args.date2)
print("Daily counts in %s from %s to %s:" % (ht.check_state(args.state), args.date1, args.date2))
for date in sorted(list(values.keys())):
print("%s: num=%-4d total=%-5d (%.3f%%)" % (date, values[date]["num"], values[date]["total"], 100 * values[date]["num"] / values[date]["total"]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better as an f-string

# step 2: run a few jobs
print("running jobs...")
try:
wiki_download.run(secrets.wiki.hmac, download_limit=1024 * 1024 * 1024, job_limit=12)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is worse

for line in f:
content = line.strip().split()
if len(content) != 4:
print("unexpected article format: {0}".format(line))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be an f-string

for article in articles:
if debug_mode:
print(" %s" % (article))
out = text(subprocess.check_output('LC_ALL=C grep -a -i "^en %s " raw2 | cat' % (article.lower()), shell=True)).strip()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line collapsed; this one is particularly gross

# year, month = int(job['name'][11:15]), int(job['name'][15:17])
year, month = int(job["name"][10:14]), int(job["name"][14:16])
# print 'year=%d | month=%d'%(year, month)
url = "https://dumps.wikimedia.org/other/pageviews/%d/%d-%02d/%s" % (year, year, month, job["name"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would be better as an f-string

@dshemetov
Copy link
Contributor Author

Closing in favor of #1189

@dshemetov dshemetov closed this Jun 26, 2023
@dshemetov dshemetov deleted the ds/format branch June 26, 2023 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants