gh-112943: Correctly compute end offsets for multiline tokens in the tokenize module #112949

pablogsal · 2023-12-11T00:50:44Z

Issue: Change in tokenize.generate_tokens behaviour with non-ASCII #112943

…n the tokenize module

pablogsal · 2023-12-11T00:51:38Z

Python/Python-tokenize.c

@@ -225,7 +225,7 @@ tokenizeriter_next(tokenizeriterobject *it)
        col_offset = _PyPegen_byte_offset_to_character_offset(line, token.start - line_start);
    }
    if (token.end != NULL && token.end >= it->tok->line_start) {
-        end_col_offset = _PyPegen_byte_offset_to_character_offset(line, token.end - it->tok->line_start);
+        end_col_offset = _PyPegen_byte_offset_to_character_offset_raw(it->tok->line_start, token.end - it->tok->line_start);


it->tok->line_start holds the last line tokenised (the last line of the multi-line string) while line holds the entire multi-line string from the start to the end.

Lib/test/test_tokenize.py

Co-authored-by: Serhiy Storchaka <[email protected]>

miss-islington-app · 2023-12-11T11:44:26Z

Thanks @pablogsal for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12.
🐍🍒⛏🤖 I'm not a witch! I'm not a witch!

miss-islington-app · 2023-12-11T11:44:30Z

Sorry, @pablogsal, I could not cleanly backport this to 3.12 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker a135a6d2c6d503b186695f01efa7eed65611b04e 3.12

bedevere-app · 2023-12-11T12:04:43Z

GH-112957 is a backport of this pull request to the 3.12 branch.

…okens in the tokenize module (pythonGH-112949) (cherry picked from commit a135a6d) Co-authored-by: Pablo Galindo Salgado <[email protected]>

…in the tokenize module (GH-112949) (#112957) (cherry picked from commit a135a6d)

…n the tokenize module (python#112949)

pythongh-112943: Correctly compute end offsets for multiline tokens i…

de5cc3c

…n the tokenize module

pablogsal requested a review from lysnikolaou as a code owner December 11, 2023 00:50

pablogsal added the needs backport to 3.12 only security fixes label Dec 11, 2023

bedevere-app bot added the awaiting core review label Dec 11, 2023

bedevere-app bot mentioned this pull request Dec 11, 2023

Change in tokenize.generate_tokens behaviour with non-ASCII #112943

Closed

pablogsal commented Dec 11, 2023

View reviewed changes

serhiy-storchaka approved these changes Dec 11, 2023

View reviewed changes

Lib/test/test_tokenize.py Outdated Show resolved Hide resolved

bedevere-app bot added awaiting merge and removed awaiting core review labels Dec 11, 2023

Update test_tokenize.py

947414b

Co-authored-by: Serhiy Storchaka <[email protected]>

pablogsal enabled auto-merge (squash) December 11, 2023 11:24

pablogsal merged commit a135a6d into python:main Dec 11, 2023

pablogsal deleted the gh-112943 branch December 11, 2023 11:44

bedevere-app bot removed the awaiting merge label Dec 11, 2023

miss-islington-app bot assigned pablogsal Dec 11, 2023

bedevere-app bot removed the needs backport to 3.12 only security fixes label Dec 11, 2023

pablogsal added a commit that referenced this pull request Dec 11, 2023

[3.12] gh-112943: Correctly compute end offsets for multiline tokens …

e4d2fb2

…in the tokenize module (GH-112949) (#112957) (cherry picked from commit a135a6d)

aisk pushed a commit to aisk/cpython that referenced this pull request Feb 11, 2024

pythongh-112943: Correctly compute end offsets for multiline tokens i…

79e2905

…n the tokenize module (python#112949)

Glyphack pushed a commit to Glyphack/cpython that referenced this pull request Sep 2, 2024

pythongh-112943: Correctly compute end offsets for multiline tokens i…

5bc545c

…n the tokenize module (python#112949)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-112943: Correctly compute end offsets for multiline tokens in the tokenize module #112949

gh-112943: Correctly compute end offsets for multiline tokens in the tokenize module #112949

pablogsal commented Dec 11, 2023 •

edited by bedevere-app bot

Loading

pablogsal Dec 11, 2023

miss-islington-app bot commented Dec 11, 2023

miss-islington-app bot commented Dec 11, 2023

bedevere-app bot commented Dec 11, 2023

gh-112943: Correctly compute end offsets for multiline tokens in the tokenize module #112949

gh-112943: Correctly compute end offsets for multiline tokens in the tokenize module #112949

Conversation

pablogsal commented Dec 11, 2023 • edited by bedevere-app bot Loading

pablogsal Dec 11, 2023

Choose a reason for hiding this comment

miss-islington-app bot commented Dec 11, 2023

miss-islington-app bot commented Dec 11, 2023

bedevere-app bot commented Dec 11, 2023

pablogsal commented Dec 11, 2023 •

edited by bedevere-app bot

Loading