Skip to content

[3.9] bpo-46503: Prevent an assert from firing when parsing some invalid \N sequences in f-strings. (GH-30865) #30867

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 25, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions Lib/test/test_fstring.py
Original file line number Diff line number Diff line change
Expand Up @@ -747,12 +747,16 @@ def test_misformed_unicode_character_name(self):
# differently inside f-strings.
self.assertAllRaise(SyntaxError, r"\(unicode error\) 'unicodeescape' codec can't decode bytes in position .*: malformed \\N character escape",
[r"f'\N'",
r"f'\N '",
r"f'\N '", # See bpo-46503.
r"f'\N{'",
r"f'\N{GREEK CAPITAL LETTER DELTA'",

# Here are the non-f-string versions,
# which should give the same errors.
r"'\N'",
r"'\N '",
r"'\N '",
r"'\N{'",
r"'\N{GREEK CAPITAL LETTER DELTA'",
])
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Fix an assert when parsing some invalid \N escape sequences in f-strings.
16 changes: 14 additions & 2 deletions Parser/pegen/parse_string.c
Original file line number Diff line number Diff line change
Expand Up @@ -444,12 +444,23 @@ fstring_find_literal(Parser *p, const char **str, const char *end, int raw,
if (!raw && ch == '\\' && s < end) {
ch = *s++;
if (ch == 'N') {
/* We need to look at and skip matching braces for "\N{name}"
sequences because otherwise we'll think the opening '{'
starts an expression, which is not the case with "\N".
Keep looking for either a matched '{' '}' pair, or the end
of the string. */

if (s < end && *s++ == '{') {
while (s < end && *s++ != '}') {
}
continue;
}
break;

/* This is an invalid "\N" sequence, since it's a "\N" not
followed by a "{". Just keep parsing this literal. This
error will be caught later by
decode_unicode_with_escapes(). */
continue;
}
if (ch == '{' && warn_invalid_escape_sequence(p, ch, t) < 0) {
return -1;
Expand Down Expand Up @@ -493,7 +504,8 @@ fstring_find_literal(Parser *p, const char **str, const char *end, int raw,
*literal = PyUnicode_DecodeUTF8Stateful(literal_start,
s - literal_start,
NULL, NULL);
} else {
}
else {
*literal = decode_unicode_with_escapes(p, literal_start,
s - literal_start, t);
}
Expand Down