Skip to content

Correcting type mismatches in the email module #10444

Closed
@cushionbadak

Description

@cushionbadak

I hope these three instances of type mismatch can help in enhancing cpython type hints.

1. Type mismatch in email.charset.Charset.header_encode_lines

Overview

Target for Modification: Charset.header_encode_lines in email/charset.pyi
Existing Type Hint: (self, string: str, maxlengths: Iterator[int]) -> list[str]
Suggested Type Hint: (self, string: str, maxlengths: Iterator[int]) -> list[Optional[str]]

Explanation

The method Charset.header_encode_lines may potentially return a list containing None.

class Charset:
    def header_encode_lines(self, string, maxlengths):
        ...
        lines = []
        current_line = []
        ...
                # Does nothing fit on the first line?
                if not lines and not current_line:
                    lines.append(None)
                else:
                    joined_line = EMPTYSTRING.join(current_line)
                    header_bytes = _encode(joined_line, codec)
                    lines.append(encoder(header_bytes))
        ...
        return lines

Although the path through which control flow might reach lines.append(None) is not immediately clear, it's important to note that this condition (where None is included in the returned list) was encountered in CPython's email unit test, specifically in TestEmailAsianCodecs.test_japanese_codecs in test/test_email/test_asian_codecs.py. The minimal example provided below illustrates this type mismatch:

class TestEmailAsianCodecs(TestEmailBase):
    def test_japanese_codecs(self):
        eq = self.ndiffAssertEqual
        jcode = "euc-jp"
        gcode = "iso-8859-1"
        j = Charset(jcode)
        g = Charset(gcode)
        h = Header("Hello World!")
        jhello = str(b'\xa5\xcf\xa5\xed\xa1\xbc\xa5\xef\xa1\xbc'
                     b'\xa5\xeb\xa5\xc9\xa1\xaa', jcode)
        ghello = str(b'Gr\xfc\xdf Gott!', gcode)
        h.append(jhello, j)
        h.append(ghello, g)
        eq(h.encode(), """\
Hello World! =?iso-2022-jp?b?GyRCJU8lbSE8JW8hPCVrJUkhKhsoQg==?=
 =?iso-8859-1?q?Gr=FC=DF_Gott!?=""")

In this example, the method header_encode_lines gives the output [None, '=?iso-8859-1?q?Gr=FC=DF_Gott!?=']. This occurs through the follwoing control flow:

email.header.Header.encode
-> email.header._ValueFormatter.feed
-> email.charset.Charset.header_encode_lines

2. Type mismatch in email.charset.Charset.body_encode

Overview

Target for Modification: Charset.body_encode in email/charset.pyi
Existing Type Hint: (overload) (self, string: None) -> None, (self, string: str) -> str
Suggested Type Hint: (overload) (self, string: None) -> None, (self, string: str | bytes) -> str

Explanation

The method Charset.body_encode may be called with bytes as the string parameter, a case not currently covered by the existing type hint:

@overload
def body_encode(self, string: None) -> None: ...
@overload
def body_encode(self, string: str) -> str: ...

As mentioned in #10429 , Message.set_charset method can pass self._payload value to Charset.body_encode as the string parameter. However, there are scenarios where Message.set_charset modifies the str type self._payload to bytes type, before passing it to Charset.body_encode method. The following snippet from the library code demonstrates this:

class Message:
    def set_charset(self, charset):
        ...
                payload = self._payload
                if payload:
                    try:
                        payload = payload.encode('ascii', 'surrogateescape') # "str" payload converted to "bytes" here
                    except UnicodeError:
                        ...
                self._payload = charset.body_encode(payload)
        ...

This mismatch (passing bytes in place of str) was noticed in the CPython email unit test, TestEmailAsianCodecs.test_payload_encoding_utf8 found in test/test_email/test_asian_codecs.py. The compact example provided below highlights this type mismatch:

class TestEmailAsianCodecs(TestEmailBase):
    def test_payload_encoding_utf8(self):
       jhello = str(b'\xa5\xcf\xa5\xed\xa1\xbc\xa5\xef\xa1\xbc'
                    b'\xa5\xeb\xa5\xc9\xa1\xaa', 'euc-jp')
       msg = Message()
       msg.set_payload(jhello, 'utf-8')

In this example, the body_encode method receives bytes, invoked via the following control flow:

email.message.Message.set_payload
-> email.message.Message.set_charset
-> email.charset.Charset.body_encode

3. Type Mismatch in email.policy.EmailPolicy.header_store_parse

Overview

Target for Modification: EmailPolicy.header_store_parse in email/policy.pyi
Existing Type Hint: (self, name: str, value: str) -> tuple[str, str]
Suggested Type Hint: (self, name: str, value: Any) -> tuple[str, str]

Explanation

The EmailPolicy.header_store_parse method can be invoked with an object of any type as the value parameter, but the current type hint only allows for a str value.

def header_store_parse(self, name: str, value: str) -> tuple[str, str]: ...

Considering that email.message.Message's special method __setitem__ calls EmailPolicy.header_store_parse and given the signature of __setitem__, the EmailPolicy.header_store_parse parameter type should be (self, name: str, value: Any).

class Message:
    def __setitem__(self, name, val):
        ...
        self._headers.append(self.policy.header_store_parse(name, val))
_HeaderType: TypeAlias = Any
class Message:
    def __setitem__(self, name: str, val: _HeaderType) -> None: ...

This inconsistency was detected in the CPython email unit test, TestBytesGenerator.test_smtp_policy found in test/test_email/test_generator.py. The trimmed example provided below illustrates this type mismatch:

from email.message import EmailMessage
from email.headerregistry import Address
msg = EmailMessage()
msg["From"] = Address(addr_spec="[email protected]", display_name="Páolo")

In this scenario, the msg["From"] = Address(...) statement invokes message.EmailMessage.__setitem__ ( = message.Message.__setitem__) and passes an Address object to the value parameter.


P.S. All links to CPython reference commit hash ae315991... from the current CPython 3.12 branch.

P.S. In my attempts to apply existing static type checking tools such as mypy, pytype, and pyright to the type hints in the cpython standard library, I've encountered various errors and imprecise results. I'm wondering if there are any established best practices for this process. Any guidance or tips would be greatly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions