Description
I hope these three instances of type mismatch can help in enhancing cpython type hints.
1. Type mismatch in email.charset.Charset.header_encode_lines
Overview
Target for Modification: Charset.header_encode_lines
in email/charset.pyi
Existing Type Hint: (self, string: str, maxlengths: Iterator[int]) -> list[str]
Suggested Type Hint: (self, string: str, maxlengths: Iterator[int]) -> list[Optional[str]]
Explanation
The method Charset.header_encode_lines
may potentially return a list containing None
.
class Charset:
def header_encode_lines(self, string, maxlengths):
...
lines = []
current_line = []
...
# Does nothing fit on the first line?
if not lines and not current_line:
lines.append(None)
else:
joined_line = EMPTYSTRING.join(current_line)
header_bytes = _encode(joined_line, codec)
lines.append(encoder(header_bytes))
...
return lines
Although the path through which control flow might reach lines.append(None)
is not immediately clear, it's important to note that this condition (where None
is included in the returned list) was encountered in CPython's email unit test, specifically in TestEmailAsianCodecs.test_japanese_codecs in test/test_email/test_asian_codecs.py
. The minimal example provided below illustrates this type mismatch:
class TestEmailAsianCodecs(TestEmailBase):
def test_japanese_codecs(self):
eq = self.ndiffAssertEqual
jcode = "euc-jp"
gcode = "iso-8859-1"
j = Charset(jcode)
g = Charset(gcode)
h = Header("Hello World!")
jhello = str(b'\xa5\xcf\xa5\xed\xa1\xbc\xa5\xef\xa1\xbc'
b'\xa5\xeb\xa5\xc9\xa1\xaa', jcode)
ghello = str(b'Gr\xfc\xdf Gott!', gcode)
h.append(jhello, j)
h.append(ghello, g)
eq(h.encode(), """\
Hello World! =?iso-2022-jp?b?GyRCJU8lbSE8JW8hPCVrJUkhKhsoQg==?=
=?iso-8859-1?q?Gr=FC=DF_Gott!?=""")
In this example, the method header_encode_lines
gives the output [None, '=?iso-8859-1?q?Gr=FC=DF_Gott!?=']
. This occurs through the follwoing control flow:
email.header.Header.encode
-> email.header._ValueFormatter.feed
-> email.charset.Charset.header_encode_lines
2. Type mismatch in email.charset.Charset.body_encode
Overview
Target for Modification: Charset.body_encode
in email/charset.pyi
Existing Type Hint: (overload) (self, string: None) -> None
, (self, string: str) -> str
Suggested Type Hint: (overload) (self, string: None) -> None
, (self, string: str | bytes) -> str
Explanation
The method Charset.body_encode
may be called with bytes
as the string
parameter, a case not currently covered by the existing type hint:
typeshed/stdlib/email/charset.pyi
Lines 23 to 26 in cfc5425
As mentioned in #10429 , Message.set_charset
method can pass self._payload
value to Charset.body_encode
as the string
parameter. However, there are scenarios where Message.set_charset
modifies the str
type self._payload
to bytes
type, before passing it to Charset.body_encode
method. The following snippet from the library code demonstrates this:
class Message:
def set_charset(self, charset):
...
payload = self._payload
if payload:
try:
payload = payload.encode('ascii', 'surrogateescape') # "str" payload converted to "bytes" here
except UnicodeError:
...
self._payload = charset.body_encode(payload)
...
This mismatch (passing bytes
in place of str
) was noticed in the CPython email unit test, TestEmailAsianCodecs.test_payload_encoding_utf8
found in test/test_email/test_asian_codecs.py
. The compact example provided below highlights this type mismatch:
class TestEmailAsianCodecs(TestEmailBase):
def test_payload_encoding_utf8(self):
jhello = str(b'\xa5\xcf\xa5\xed\xa1\xbc\xa5\xef\xa1\xbc'
b'\xa5\xeb\xa5\xc9\xa1\xaa', 'euc-jp')
msg = Message()
msg.set_payload(jhello, 'utf-8')
In this example, the body_encode
method receives bytes
, invoked via the following control flow:
email.message.Message.set_payload
-> email.message.Message.set_charset
-> email.charset.Charset.body_encode
3. Type Mismatch in email.policy.EmailPolicy.header_store_parse
Overview
Target for Modification: EmailPolicy.header_store_parse
in email/policy.pyi
Existing Type Hint: (self, name: str, value: str) -> tuple[str, str]
Suggested Type Hint: (self, name: str, value: Any) -> tuple[str, str]
Explanation
The EmailPolicy.header_store_parse
method can be invoked with an object of any type as the value
parameter, but the current type hint only allows for a str
value.
typeshed/stdlib/email/policy.pyi
Line 36 in cfc5425
Considering that email.message.Message
's special method __setitem__
calls EmailPolicy.header_store_parse
and given the signature of __setitem__
, the EmailPolicy.header_store_parse
parameter type should be (self, name: str, value: Any)
.
class Message:
def __setitem__(self, name, val):
...
self._headers.append(self.policy.header_store_parse(name, val))
_HeaderType: TypeAlias = Any
class Message:
def __setitem__(self, name: str, val: _HeaderType) -> None: ...
This inconsistency was detected in the CPython email unit test, TestBytesGenerator.test_smtp_policy
found in test/test_email/test_generator.py
. The trimmed example provided below illustrates this type mismatch:
from email.message import EmailMessage
from email.headerregistry import Address
msg = EmailMessage()
msg["From"] = Address(addr_spec="[email protected]", display_name="Páolo")
In this scenario, the msg["From"] = Address(...)
statement invokes message.EmailMessage.__setitem__
( = message.Message.__setitem__
) and passes an Address
object to the value
parameter.
P.S. All links to CPython reference commit hash ae315991...
from the current CPython 3.12 branch.
P.S. In my attempts to apply existing static type checking tools such as mypy, pytype, and pyright to the type hints in the cpython standard library, I've encountered various errors and imprecise results. I'm wondering if there are any established best practices for this process. Any guidance or tips would be greatly appreciated.