-
Notifications
You must be signed in to change notification settings - Fork 67
style(black): format acquisition with black
, line-length=200
#1186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Kudos, SonarCloud Quality Gate passed!
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. There are some spots where I prefer the old line length-respecting format, for readability, but I don't feel that strongly about them.
logger.warning( | ||
"expensive operation", | ||
msg="concatenating more than 7 files may result in long running times", | ||
count=len(dfs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion [non-blocking]: On this line, I have a slight preference for the old wrapped formatting here for readability/maintaining line-character limit.
question: black
's line-wrapping behavior seems inconsistent, e.g. sometimes it turns each element of a long list into its own line but sometimes it combines into one line lists that were already formatted with the multiline approach. Is it possible to change line wrapping to be more strict about line length? Or to tune it at all?
I see other lines (lots of Columndef("previous_day_admission_adult_covid_confirmed_20-29_7_day_sum", "previous_day_admission_adult_covid_confirmed_20_29_7_day_sum", int),
in facility/database.py
and in all the other variants of database.py
) that had been wrapped manually to standard line length limits that black
combined back into a single line.
I've noted other instances of line-wrapping changes where I prefer the old version. All are non-blocking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
black's line-wrap (roughly) tries to reformat any expression that contains lines with a line-length less than its line-length
setting (set in pyproject.toml
in this repo to 200 by Sam). see here for Black's docs on it.
so the inconsistency you're seeing is because it's only reformatting expressions with lines that are over 200 long.
i'm personally fine with having such a long line setting, but i know that it breaks from the programming standard of 80. even black's default setting is 100-ish.
the difficulty with setting black to short line-lengths is that it is very likely to break semantic spacing in that case, like in the example below, every line is long, but at least the formatting is consistent
Columndef("previous_day_admission_adult_covid_confirmed_18-19_7_day_sum", "previous_day_admission_adult_covid_confirmed_18_19_7_day_sum", int),
Columndef("previous_day_admission_adult_covid_confirmed_20-29_7_day_sum", "previous_day_admission_adult_covid_confirmed_20_29_7_day_sum", int),
if we had set the default line-length shorter, many of these would turn into
Columndef(
"previous_day_admission_adult_covid_confirmed_18-19_7_day_sum",
"previous_day_admission_adult_covid_confirmed_18_19_7_day_sum",
int
),
Columndef(
"previous_day_admission_adult_covid_confirmed_20-29_7_day_sum",
"previous_day_admission_adult_covid_confirmed_20_29_7_day_sum",
int
),
(though not all, if the string args were shorter, it would keep just that particular Columndef a single line)
i think both are acceptable and have tradeoffs (e.g. vertical space conservation, alignment, horizontal space conservation, consistency of formatting).
not sure how to arrive at a consensus in this space, where prefs will vary a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the inconsistency you're seeing is because it's only reformatting expressions with lines that are over 200 long.
Gotcha, didn't realize that the line-length setting was longer than the usual 80 characters.
If we're going to be running all of our repos through black
, it would be nice to have them formatted with the same settings, so maybe it's worth messing around with those a bit here? Or do you thing the 200-length line is already as good as it gets for automated formatting?
Are we turning on automated linting after we finish this pass through black
? Do we need to consider how black
's settings will interact with that (e.g. trying to minimize errors we'll need to fix manually)?
not sure how to arrive at a consensus in this space, where prefs will vary a lot.
I want to emphasize that my preferences here are very weak. As long as it's reasonable/readable, it's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm with you on the weak preferences thing: I've just gotten used to 200 in this repo, but I will let it go for consistency with e.g. covidcast-indicators (where we use pylint to enforce a default 100 line-length)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me give the 100 line-length a shot in a separate PR and see how different it looks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Columndef("previous_day_admission_adult_covid_confirmed_18-19_7_day_sum", "previous_day_admission_adult_covid_confirmed_18_19_7_day_sum", int), | ||
Columndef("previous_day_admission_adult_covid_confirmed_20-29_7_day_sum", "previous_day_admission_adult_covid_confirmed_20_29_7_day_sum", int), | ||
Columndef("previous_day_admission_adult_covid_confirmed_30-39_7_day_sum", "previous_day_admission_adult_covid_confirmed_30_39_7_day_sum", int), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: lots of stuff here that black
recombined into a single line that exceeds standard line length limits. See question above for more info.
Columndef("total_adult_patients_hospitalized_confirmed_covid", "total_adult_patients_hosp_confirmed_covid", int), | ||
Columndef("total_adult_patients_hospitalized_confirmed_covid_coverage", "total_adult_patients_hosp_confirmed_covid_coverage", int), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: more line-combining here.
self.xlsx_uptodate_list = [f[:-5] for f in listdir(self.excel_uptodate_path) if isfile(join(self.excel_uptodate_path, f)) and f[-5:] == ".xlsx"] | ||
self.xlsx_history_list = [f[:-5] for f in listdir(self.excel_history_path) if isfile(join(self.excel_history_path, f)) and f[-5:] == ".xlsx"] | ||
self.csv_list = [f[:-4] for f in listdir(self.csv_path) if isfile(join(self.csv_path, f)) and f[-4:] == ".csv"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: another line recombine.
long_raw = (long_raw_df, release_date, parse_time, location) | ||
return long_raw | ||
(wide_raw_df, release_date, parse_time, location) = wide_raw | ||
long_raw_df = wide_raw_df.melt(id_vars=["Week"], var_name="measurement_type", value_name="value").rename(index=str, columns={"Week": "week"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: another line recombine
} | ||
last_season = issue // 100 + (1 if issue % 100 > 35 else 0) | ||
url = "https://www.cdc.go.kr/npt/biz/npp/iss/influenzaListAjax.do" | ||
params = {"icdNm": "influenza", "startYear": "2004", "endYear": str(last_season)} # Started in 2004 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: another line recombine.
'percent_a': nullable_float(row[8]), | ||
'percent_b': nullable_float(row[9]) | ||
} | ||
if row[0] == "REGION TYPE" and row != ["REGION TYPE", "REGION", "YEAR", "WEEK", "TOTAL SPECIMENS", "TOTAL A", "TOTAL B", "PERCENT POSITIVE", "PERCENT A", "PERCENT B"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: another line recombine.
hrow1 = ["REGION TYPE", "REGION", "SEASON_DESCRIPTION", "TOTAL SPECIMENS", "A (2009 H1N1)", "A (H3)", "A (Subtyping not Performed)", "B", "BVic", "BYam", "H3N2v"] | ||
hrow2 = ["REGION TYPE", "REGION", "YEAR", "WEEK", "TOTAL SPECIMENS", "A (2009 H1N1)", "A (H3)", "A (Subtyping not Performed)", "B", "BVic", "BYam", "H3N2v"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: another line recombine.
May not be important given our general preference for the 100-length line version, but this is missing the |
black
, line-length=200
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
most of my irritation in this one comes from:
- things that should be f-strings
- statements that were split across multiple lines before but are now collapsed onto a single line and significantly less legible
insert = cnx.cursor() | ||
for row in entries: | ||
lag = delta_epiweeks(row["epiweek"], issue) | ||
args = [row["total_specimens"], row["total_a_h1n1"], row["total_a_h3"], row["total_a_h3n2v"], row["total_a_no_sub"], row["total_b"], row["total_b_vic"], row["total_b_yam"]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another line recombine
insert = cnx.cursor() | ||
for row in entries: | ||
lag = delta_epiweeks(row["epiweek"], issue) | ||
args = [row["n_ili"], row["n_patients"], row["n_providers"], row["wili"], row["ili"], row["age0"], row["age1"], row["age2"], row["age3"], row["age4"], row["age5"]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another line recombine
print( | ||
'Successfully uploaded the following snapshots, with the count indicating the number of data-table versions found inside each snapshot (expected to be 1, or maybe 2 if there was a change in capitalization; 0 indicates the NoroSTAT page was not found within a snapshot directory); just "Counter()" indicates no snapshot directories were found:', | ||
snapshot_version_counter, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't great but there aren't really any good options here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw, i like this convention for breaking long strings in Pandas
turns out Python just automatically concatenates strings like that
expect_value_eq(resp.status_code, 200, "Wanted status code {}. Received: ") | ||
expect_value_eq(resp.headers.get("Content-Type"), "text/html", 'Expected Content-Type "{}"; Received ') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is worse
DROP TABLE IF EXISTS `norostat_point_diffs`, | ||
`norostat_point_version_list`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this misalignment would make me itch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw, i don't think black touches the formatting in multi-line strings. so this is just there from the original author 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah you're right, very weird. im going to see if it will try to format it back, if I add one more space in here
values = ht.get_values(args.state, args.date1, args.date2) | ||
print("Daily counts in %s from %s to %s:" % (ht.check_state(args.state), args.date1, args.date2)) | ||
for date in sorted(list(values.keys())): | ||
print("%s: num=%-4d total=%-5d (%.3f%%)" % (date, values[date]["num"], values[date]["total"], 100 * values[date]["num"] / values[date]["total"])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better as an f-string
# step 2: run a few jobs | ||
print("running jobs...") | ||
try: | ||
wiki_download.run(secrets.wiki.hmac, download_limit=1024 * 1024 * 1024, job_limit=12) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is worse
for line in f: | ||
content = line.strip().split() | ||
if len(content) != 4: | ||
print("unexpected article format: {0}".format(line)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be an f-string
for article in articles: | ||
if debug_mode: | ||
print(" %s" % (article)) | ||
out = text(subprocess.check_output('LC_ALL=C grep -a -i "^en %s " raw2 | cat' % (article.lower()), shell=True)).strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line collapsed; this one is particularly gross
# year, month = int(job['name'][11:15]), int(job['name'][15:17]) | ||
year, month = int(job["name"][10:14]), int(job["name"][14:16]) | ||
# print 'year=%d | month=%d'%(year, month) | ||
url = "https://dumps.wikimedia.org/other/pageviews/%d/%d-%02d/%s" % (year, year, month, job["name"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would be better as an f-string
Closing in favor of #1189 |
Run the acquisition Python files through the code formatter black.
Summary:
Prerequisites:
dev
branchdev