-
-
Notifications
You must be signed in to change notification settings - Fork 32k
gh-112943: Correctly compute end offsets for multiline tokens in the tokenize module #112949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pablogsal
commented
Dec 11, 2023
•
edited by bedevere-app
bot
Loading
edited by bedevere-app
bot
- Issue: Change in tokenize.generate_tokens behaviour with non-ASCII #112943
…n the tokenize module
@@ -225,7 +225,7 @@ tokenizeriter_next(tokenizeriterobject *it) | |||
col_offset = _PyPegen_byte_offset_to_character_offset(line, token.start - line_start); | |||
} | |||
if (token.end != NULL && token.end >= it->tok->line_start) { | |||
end_col_offset = _PyPegen_byte_offset_to_character_offset(line, token.end - it->tok->line_start); | |||
end_col_offset = _PyPegen_byte_offset_to_character_offset_raw(it->tok->line_start, token.end - it->tok->line_start); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it->tok->line_start
holds the last line tokenised (the last line of the multi-line string) while line
holds the entire multi-line string from the start to the end.
Co-authored-by: Serhiy Storchaka <[email protected]>
Thanks @pablogsal for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12. |
Sorry, @pablogsal, I could not cleanly backport this to
|
GH-112957 is a backport of this pull request to the 3.12 branch. |
…okens in the tokenize module (pythonGH-112949) (cherry picked from commit a135a6d) Co-authored-by: Pablo Galindo Salgado <[email protected]>
…n the tokenize module (python#112949)
…n the tokenize module (python#112949)