-
-
Notifications
You must be signed in to change notification settings - Fork 32k
bpo-38056: overhaul Error Handlers section in codecs documentation #15732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @animalize.
As someone who is not particularly experienced with using the codec
module (for my purposes, I've never had much of a need for it), these examples are quite helpful for visually representing the purpose of each of the error handlers.
I'm not entirely certain that we need to provide an example for all four of them, but in general I think that providing an example or two would be very helpful to readers.
If these changes are approved, I think it would be appropriate to backport the changes to the 3.7 and 3.8 docs. Also, I've added a I'll cc the experts for codecs. |
Thanks for your review.
So I would like to include at least these three. I prefered to use this example, it covers a non-BMP character 🏫. >>> '42 µ ♪ 🏫'.encode(encoding='ascii', errors='replace')
b'42 ? ? ?'
>>> '42 µ ♪ 🏫'.encode(encoding='ascii', errors='xmlcharrefreplace')
b'42 µ ♪ 🏫'
>>> '42 µ ♪ 🏫'.encode(encoding='ascii', errors='backslashreplace')
b'42 \\xb5 \\u266a \\U0001f3eb'
>>> '42 µ ♪ 🏫'.encode(encoding='ascii', errors='namereplace')
b'42 \\N{MICRO SIGN} \\N{EIGHTH NOTE} \\N{SCHOOL}' But if paste the statement into the REPL on Windows, the non-BMP character is broken, and get an invalid result >>> '42 µ ♪ �'.encode(encoding='ascii', errors='backslashreplace')
b'42 \\xb5 \\u266a \\ufffd' |
Not that I'm opposed to the changes, but I think the examples alone would be significantly easier and less controversial to review. Also, it's usually more efficient from a workflow perspective to have more multiple focused PRs than a single large overhaul at once. |
Ah, just like suddenly getting excited for this PR. 😄 There are obvious errors, for example the
|
If codec experts want to do this, or who thinks he/she can do better than me, free feel to create a new PR. |
I'm not recommending for anyone else to do the other parts of the current one, you can still potentially apply the all of the changes. I just mean having only the examples in this PR and then splitting the other sections into their own PRs. As the scope of a PR gets larger, the time required to fully review it prior to a merge significantly increases. From my experience, the core developers tend to prefer smaller and more focused PRs when it's possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure reorganizing the table is worth it, what did you try to enhance by replacing the separation "all codecs vs text codecs" by the separation "both encode/decode vs encode only"?
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
* Some handlers were wrongly described as text-encoding only, but actually they can also be used in text-decoding. * Add more description to each handler. * Add two REPL examples. * Add indexes for Error Handler's name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with nits, but not codec expert here.
I am reading the source code these days, hope there will be no mistakes. This PR bases on #15135. |
Co-authored-by: Jelle Zijlstra <[email protected]>
Co-authored-by: Jelle Zijlstra <[email protected]>
Co-authored-by: Jelle Zijlstra <[email protected]>
Co-authored-by: Jelle Zijlstra <[email protected]>
Thanks @animalize for the PR, and @JelleZijlstra for merging it 🌮🎉.. I'm working now to backport this PR to: 3.9. |
Thanks @animalize for the PR, and @JelleZijlstra for merging it 🌮🎉.. I'm working now to backport this PR to: 3.11. |
Thanks @animalize for the PR, and @JelleZijlstra for merging it 🌮🎉.. I'm working now to backport this PR to: 3.10. |
GH-92526 is a backport of this pull request to the 3.9 branch. |
…ythonGH-15732) * Some handlers were wrongly described as text-encoding only, but actually they can also be used in text-decoding. * Add more description to each handler. * Add two REPL examples. * Add indexes for Error Handler's name. Co-authored-by: Kyle Stanley <[email protected]> Co-authored-by: Victor Stinner <[email protected]> Co-authored-by: Jelle Zijlstra <[email protected]> (cherry picked from commit 5bc2390) Co-authored-by: Ma Lin <[email protected]>
…ythonGH-15732) * Some handlers were wrongly described as text-encoding only, but actually they can also be used in text-decoding. * Add more description to each handler. * Add two REPL examples. * Add indexes for Error Handler's name. Co-authored-by: Kyle Stanley <[email protected]> Co-authored-by: Victor Stinner <[email protected]> Co-authored-by: Jelle Zijlstra <[email protected]> (cherry picked from commit 5bc2390) Co-authored-by: Ma Lin <[email protected]>
GH-92527 is a backport of this pull request to the 3.11 branch. |
GH-92528 is a backport of this pull request to the 3.10 branch. |
…ythonGH-15732) * Some handlers were wrongly described as text-encoding only, but actually they can also be used in text-decoding. * Add more description to each handler. * Add two REPL examples. * Add indexes for Error Handler's name. Co-authored-by: Kyle Stanley <[email protected]> Co-authored-by: Victor Stinner <[email protected]> Co-authored-by: Jelle Zijlstra <[email protected]> (cherry picked from commit 5bc2390) Co-authored-by: Ma Lin <[email protected]>
…H-15732) * Some handlers were wrongly described as text-encoding only, but actually they can also be used in text-decoding. * Add more description to each handler. * Add two REPL examples. * Add indexes for Error Handler's name. Co-authored-by: Kyle Stanley <[email protected]> Co-authored-by: Victor Stinner <[email protected]> Co-authored-by: Jelle Zijlstra <[email protected]> (cherry picked from commit 5bc2390) Co-authored-by: Ma Lin <[email protected]>
…H-15732) * Some handlers were wrongly described as text-encoding only, but actually they can also be used in text-decoding. * Add more description to each handler. * Add two REPL examples. * Add indexes for Error Handler's name. Co-authored-by: Kyle Stanley <[email protected]> Co-authored-by: Victor Stinner <[email protected]> Co-authored-by: Jelle Zijlstra <[email protected]> (cherry picked from commit 5bc2390) Co-authored-by: Ma Lin <[email protected]>
…H-15732) * Some handlers were wrongly described as text-encoding only, but actually they can also be used in text-decoding. * Add more description to each handler. * Add two REPL examples. * Add indexes for Error Handler's name. Co-authored-by: Kyle Stanley <[email protected]> Co-authored-by: Victor Stinner <[email protected]> Co-authored-by: Jelle Zijlstra <[email protected]> (cherry picked from commit 5bc2390) Co-authored-by: Ma Lin <[email protected]>
Thanks for your reviews, which has led to these revisions being improved in every aspects. I just didn't add more examples, but I think too many examples will disturb the readers. |
…ythonGH-15732) * Some handlers were wrongly described as text-encoding only, but actually they can also be used in text-decoding. * Add more description to each handler. * Add two REPL examples. * Add indexes for Error Handler's name. Co-authored-by: Kyle Stanley <[email protected]> Co-authored-by: Victor Stinner <[email protected]> Co-authored-by: Jelle Zijlstra <[email protected]> (cherry picked from commit 5bc2390) Co-authored-by: Ma Lin <[email protected]>
replace
/backslashreplace
/surrogateescape
were wrongly described as encoding only, in fact they can also be used in decoding.surrogatepass
.I read the source code, hope there is no mistake and misleading.
The indexes were tested in .chm documentation.
The term "text encoding" appeared 38 times in the entire document, the change in
glossary.rst
should be correct.Also verified these terms with Wikipedia and Google:
October 2nd effect:

https://bugs.python.org/issue38056