-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
bpo-38056: overhaul Error Handlers section in codecs documentation #15732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
b7db073
96cc186
96cd2f8
8f6da14
9a6d378
f9a082a
e8844a8
503400b
d81e5b1
deafda3
28e2075
38883a7
ee2bc20
31158f3
5656c3d
0788153
5ef2131
85a3021
434de51
912933f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,10 +23,10 @@ | |
This module defines base classes for standard Python codecs (encoders and | ||
decoders) and provides access to the internal Python codec registry, which | ||
manages the codec and error handling lookup process. Most standard codecs | ||
are :term:`text encodings <text encoding>`, which encode text to bytes, | ||
but there are also codecs provided that encode text to text, and bytes to | ||
bytes. Custom codecs may encode and decode between arbitrary types, but some | ||
module features are restricted to use specifically with | ||
are :term:`text encodings <text encoding>`, which encode text to bytes (and | ||
reverse), but there are also codecs provided that encode text to text, and | ||
This conversation was marked as resolved.
Show resolved
Hide resolved
|
||
bytes to bytes. Custom codecs may encode and decode between arbitrary types, | ||
but some module features are restricted to use specifically with | ||
This conversation was marked as resolved.
Show resolved
Hide resolved
|
||
:term:`text encodings <text encoding>`, or with codecs that encode to | ||
This conversation was marked as resolved.
Show resolved
Hide resolved
|
||
:class:`bytes`. | ||
|
||
|
@@ -290,58 +290,56 @@ codec will handle encoding and decoding errors. | |
Error Handlers | ||
^^^^^^^^^^^^^^ | ||
|
||
To simplify and standardize error handling, | ||
codecs may implement different error handling schemes by | ||
accepting the *errors* string argument. The following string values are | ||
defined and implemented by all standard Python codecs: | ||
To simplify and standardize error handling, codecs may implement different | ||
This conversation was marked as resolved.
Show resolved
Hide resolved
|
||
error handling schemes by accepting the *errors* string argument: | ||
|
||
.. tabularcolumns:: |l|L| | ||
|
||
+-------------------------+-----------------------------------------------+ | ||
| Value | Meaning | | ||
+=========================+===============================================+ | ||
| ``'strict'`` | Raise :exc:`UnicodeError` (or a subclass); | | ||
| | this is the default. Implemented in | | ||
| | :func:`strict_errors`. | | ||
+-------------------------+-----------------------------------------------+ | ||
| ``'ignore'`` | Ignore the malformed data and continue | | ||
| | without further notice. Implemented in | | ||
| | :func:`ignore_errors`. | | ||
+-------------------------+-----------------------------------------------+ | ||
|
||
The following error handlers are only applicable to | ||
:term:`text encodings <text encoding>`: | ||
>>> 'German ß, ♬'.encode(encoding='ascii', errors='backslashreplace') | ||
b'German \\xdf, \\u266c' | ||
>>> 'German ß, ♬'.encode(encoding='ascii', errors='xmlcharrefreplace') | ||
b'German ß, ♬' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like the idea of examples. Would you mind to add an example for all available error handlers? It may be interesting to add an example for surrogatepass which is an uncommon case. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Currently there are only two REPL examples, just demonstrate this sentence:
If add example for all available error handlers, will the page become ugly?
In addition, >>> '\uD8AA'.encode(encoding='utf-8', errors='surrogatepass')
b'\xed\xa2\xaa' There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'm in agreement with adding an example for anything commonly utilized, but I don't think we should necessarily add one for all of the error handlers.
IMO, we should try to focus on having examples for the common cases. Code examples can be very helpful, but in excess they can become distracting to readers. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like the idea of example errors as well. I wonder if the table is making this more cluttered. Perhaps something like:
would be more helpful. Alternatively, a blank line between REPL examples would increase readability. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems you all like the examples, I will try to plan a new layout to suit this idea. My idea about the current change:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think the simple ones could still benefit from an example, just to show the basics of how it works. Even if it's fairly simple, it may not be quite as easy to understand for someone reading over the codecs documentation for the first time.
👍 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Sorry, I've been too busy recently, very intense. |
||
|
||
.. index:: | ||
pair: strict; error handler's name | ||
pair: ignore; error handler's name | ||
pair: replace; error handler's name | ||
pair: backslashreplace; error handler's name | ||
pair: surrogateescape; error handler's name | ||
single: ? (question mark); replacement character | ||
single: \ (backslash); escape sequence | ||
single: \x; escape sequence | ||
single: \u; escape sequence | ||
single: \U; escape sequence | ||
single: \N; escape sequence | ||
|
||
The following error handlers can be used with all :ref:`standard-encodings` | ||
This conversation was marked as resolved.
Show resolved
Hide resolved
|
||
codecs: | ||
|
||
.. tabularcolumns:: |l|L| | ||
|
||
+-------------------------+-----------------------------------------------+ | ||
| Value | Meaning | | ||
+=========================+===============================================+ | ||
| ``'replace'`` | Replace with a suitable replacement | | ||
| | marker; Python will use the official | | ||
| | ``U+FFFD`` REPLACEMENT CHARACTER for the | | ||
| | built-in codecs on decoding, and '?' on | | ||
| | encoding. Implemented in | | ||
| | :func:`replace_errors`. | | ||
| ``'strict'`` | Raise :exc:`UnicodeError` (or a subclass), | | ||
| | this is the default. Implemented in | | ||
| | :func:`strict_errors`. | | ||
+-------------------------+-----------------------------------------------+ | ||
| ``'xmlcharrefreplace'`` | Replace with the appropriate XML character | | ||
| | reference (only for encoding). Implemented | | ||
| | in :func:`xmlcharrefreplace_errors`. | | ||
| ``'ignore'`` | Ignore the malformed data and continue without| | ||
| | further notice. Implemented in | | ||
| | :func:`ignore_errors`. | | ||
+-------------------------+-----------------------------------------------+ | ||
| ``'replace'`` | Replace with a replacement marker. On | | ||
| | encoding, use ``?`` (ASCII character). On | | ||
| | decoding, use ``U+FFFD`` (the official | | ||
| | REPLACEMENT CHARACTER). Implemented in | | ||
| | :func:`replace_errors`. | | ||
+-------------------------+-----------------------------------------------+ | ||
| ``'backslashreplace'`` | Replace with backslashed escape sequences. | | ||
| | On encoding, use hexadecimal form of Unicode | | ||
| | code point with formats ``\xhh`` ``\uxxxx`` | | ||
| | ``\Uxxxxxxxx``. On decoding, use hexadecimal | | ||
| | form of byte value with format ``\xhh``. | | ||
| | Implemented in | | ||
| | :func:`backslashreplace_errors`. | | ||
+-------------------------+-----------------------------------------------+ | ||
| ``'namereplace'`` | Replace with ``\N{...}`` escape sequences | | ||
| | (only for encoding). Implemented in | | ||
| | :func:`namereplace_errors`. | | ||
+-------------------------+-----------------------------------------------+ | ||
| ``'surrogateescape'`` | On decoding, replace byte with individual | | ||
| | surrogate code ranging from ``U+DC80`` to | | ||
| | ``U+DCFF``. This code will then be turned | | ||
|
@@ -351,6 +349,31 @@ The following error handlers are only applicable to | |
| | more.) | | ||
+-------------------------+-----------------------------------------------+ | ||
|
||
.. index:: | ||
pair: xmlcharrefreplace; error handler's name | ||
pair: namereplace; error handler's name | ||
single: \N; escape sequence | ||
|
||
The following error handlers are only applicable to encoding (within | ||
:term:`text encodings <text encoding>`): | ||
|
||
+-------------------------+-----------------------------------------------+ | ||
| Value | Meaning | | ||
+=========================+===============================================+ | ||
| ``'xmlcharrefreplace'`` | Replace with XML/HTML numeric character | | ||
| | reference, which is a decimal form of Unicode | | ||
| | code point with format ``&#num;`` Implemented | | ||
| | in :func:`xmlcharrefreplace_errors`. | | ||
+-------------------------+-----------------------------------------------+ | ||
| ``'namereplace'`` | Replace with ``\N{...}`` escape sequences, | | ||
| | what appears in the brace is the Name property| | ||
| | from Unicode Character Database. Implemented | | ||
| | in :func:`namereplace_errors`. | | ||
+-------------------------+-----------------------------------------------+ | ||
|
||
.. index:: | ||
pair: surrogatepass; error handler's name | ||
|
||
In addition, the following error handler is specific to the given codecs: | ||
|
||
+-------------------+------------------------+-------------------------------------------+ | ||
|
@@ -365,13 +388,14 @@ In addition, the following error handler is specific to the given codecs: | |
The ``'surrogateescape'`` and ``'surrogatepass'`` error handlers. | ||
|
||
.. versionchanged:: 3.4 | ||
The ``'surrogatepass'`` error handlers now works with utf-16\* and utf-32\* codecs. | ||
The ``'surrogatepass'`` error handler now works with utf-16\* and utf-32\* | ||
codecs. | ||
|
||
.. versionadded:: 3.5 | ||
The ``'namereplace'`` error handler. | ||
|
||
.. versionchanged:: 3.5 | ||
The ``'backslashreplace'`` error handlers now works with decoding and | ||
The ``'backslashreplace'`` error handler now works with decoding and | ||
translating. | ||
|
||
The set of allowed values can be extended by registering a new named error | ||
|
@@ -414,42 +438,58 @@ functions: | |
|
||
.. function:: strict_errors(exception) | ||
|
||
Implements the ``'strict'`` error handling: each encoding or | ||
decoding error raises a :exc:`UnicodeError`. | ||
Implements the ``'strict'`` error handling. | ||
|
||
Each encoding or decoding error raises a :exc:`UnicodeError`. | ||
|
||
.. function:: replace_errors(exception) | ||
|
||
Implements the ``'replace'`` error handling (for :term:`text encodings | ||
<text encoding>` only): substitutes ``'?'`` for encoding errors | ||
(to be encoded by the codec), and ``'\ufffd'`` (the Unicode replacement | ||
character) for decoding errors. | ||
.. function:: ignore_errors(exception) | ||
|
||
Implements the ``'ignore'`` error handling. | ||
|
||
.. function:: ignore_errors(exception) | ||
Malformed data is ignored and encoding or decoding is continued without | ||
This conversation was marked as resolved.
Show resolved
Hide resolved
|
||
further notice. | ||
|
||
Implements the ``'ignore'`` error handling: malformed data is ignored and | ||
encoding or decoding is continued without further notice. | ||
|
||
.. function:: replace_errors(exception) | ||
|
||
.. function:: xmlcharrefreplace_errors(exception) | ||
Implements the ``'replace'`` error handling. | ||
|
||
Implements the ``'xmlcharrefreplace'`` error handling (for encoding with | ||
:term:`text encodings <text encoding>` only): the | ||
unencodable character is replaced by an appropriate XML character reference. | ||
Substitutes ``?`` (ASCII character) for encoding errors, or ``U+FFFD`` (the | ||
This conversation was marked as resolved.
Show resolved
Hide resolved
|
||
official REPLACEMENT CHARACTER) for decoding errors. | ||
|
||
|
||
.. function:: backslashreplace_errors(exception) | ||
|
||
Implements the ``'backslashreplace'`` error handling (for | ||
:term:`text encodings <text encoding>` only): malformed data is | ||
replaced by a backslashed escape sequence. | ||
Implements the ``'backslashreplace'`` error handling. | ||
|
||
Malformed data is replaced by a backslashed escape sequence. | ||
On encoding, use hexadecimal form of Unicode code point with formats | ||
This conversation was marked as resolved.
Show resolved
Hide resolved
|
||
``\xhh`` ``\uxxxx`` ``\Uxxxxxxxx``. On decoding, use hexadecimal form of | ||
This conversation was marked as resolved.
Show resolved
Hide resolved
|
||
byte value with format ``\xhh``. | ||
This conversation was marked as resolved.
Show resolved
Hide resolved
|
||
|
||
.. versionchanged:: 3.5 | ||
now works with decoding and translating. | ||
|
||
|
||
.. function:: xmlcharrefreplace_errors(exception) | ||
|
||
Implements the ``'xmlcharrefreplace'`` error handling (for encoding within | ||
:term:`text encoding` only). | ||
|
||
The unencodable character is replaced by an appropriate XML/HTML numeric | ||
character reference, which is a decimal form of Unicode code point with | ||
format ``&#num;`` | ||
This conversation was marked as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
.. function:: namereplace_errors(exception) | ||
|
||
Implements the ``'namereplace'`` error handling (for encoding with | ||
:term:`text encodings <text encoding>` only): the | ||
unencodable character is replaced by a ``\N{...}`` escape sequence. | ||
Implements the ``'namereplace'`` error handling (for encoding within | ||
:term:`text encoding` only). | ||
|
||
The unencodable character is replaced by a ``\N{...}`` escape sequence, | ||
This conversation was marked as resolved.
Show resolved
Hide resolved
|
||
what appears in the brace is the Name property from Unicode Character | ||
This conversation was marked as resolved.
Show resolved
Hide resolved
|
||
Database. | ||
|
||
.. versionadded:: 3.5 | ||
|
||
|
@@ -463,7 +503,7 @@ The base :class:`Codec` class defines these methods which also define the | |
function interfaces of the stateless encoder and decoder: | ||
|
||
|
||
.. method:: Codec.encode(input[, errors]) | ||
.. method:: Codec.encode(input[, errors='strict']) | ||
This conversation was marked as resolved.
Show resolved
Hide resolved
|
||
|
||
Encodes the object *input* and returns a tuple (output object, length consumed). | ||
For instance, :term:`text encoding` converts | ||
|
@@ -481,7 +521,7 @@ function interfaces of the stateless encoder and decoder: | |
of the output object type in this situation. | ||
|
||
|
||
.. method:: Codec.decode(input[, errors]) | ||
.. method:: Codec.decode(input[, errors='strict']) | ||
This conversation was marked as resolved.
Show resolved
Hide resolved
|
||
|
||
Decodes the object *input* and returns a tuple (output object, length | ||
consumed). For instance, for a :term:`text encoding`, decoding converts | ||
|
@@ -548,7 +588,7 @@ define in order to be compatible with the Python codec registry. | |
object. | ||
|
||
|
||
.. method:: encode(object[, final]) | ||
.. method:: encode(object[, final=False]) | ||
This conversation was marked as resolved.
Show resolved
Hide resolved
|
||
|
||
Encodes *object* (taking the current state of the encoder into account) | ||
and returns the resulting encoded object. If this is the last call to | ||
|
@@ -605,7 +645,7 @@ define in order to be compatible with the Python codec registry. | |
object. | ||
|
||
|
||
.. method:: decode(object[, final]) | ||
.. method:: decode(object[, final=False]) | ||
This conversation was marked as resolved.
Show resolved
Hide resolved
|
||
|
||
Decodes *object* (taking the current state of the decoder into account) | ||
and returns the resulting decoded object. If this is the last call to | ||
|
@@ -738,7 +778,7 @@ compatible with the Python codec registry. | |
:func:`register_error`. | ||
|
||
|
||
.. method:: read([size[, chars, [firstline]]]) | ||
.. method:: read([size=-1[, chars=-1, [firstline=False]]]) | ||
This conversation was marked as resolved.
Show resolved
Hide resolved
|
||
|
||
Decodes data from the stream and returns the resulting object. | ||
|
||
|
@@ -764,7 +804,7 @@ compatible with the Python codec registry. | |
available on the stream, these should be read too. | ||
|
||
|
||
.. method:: readline([size[, keepends]]) | ||
.. method:: readline([size=None[, keepends=True]]) | ||
This conversation was marked as resolved.
Show resolved
Hide resolved
|
||
|
||
Read one line from the input stream and return the decoded data. | ||
|
||
|
@@ -775,7 +815,7 @@ compatible with the Python codec registry. | |
returned. | ||
|
||
|
||
.. method:: readlines([sizehint[, keepends]]) | ||
.. method:: readlines([sizehint=None[, keepends=True]]) | ||
This conversation was marked as resolved.
Show resolved
Hide resolved
|
||
|
||
Read all lines available on the input stream and return them as a list of | ||
lines. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Overhaul :ref:`error-handlers` section in :mod:`codecs` module documentation. | ||
This conversation was marked as resolved.
Show resolved
Hide resolved
|
Uh oh!
There was an error while loading. Please reload this page.