Correcting type mismatches in the email module

I hope these three instances of type mismatch can help in enhancing cpython type hints.

# 1. Type mismatch in `email.charset.Charset.header_encode_lines`


## Overview
**Target for Modification**: [`Charset.header_encode_lines` in `email/charset.pyi`](https://github.com/python/typeshed/blob/cfc5425cb3fb2985cf105c712e89f96e94f7979c/stdlib/email/charset.pyi#L22)
**Existing Type Hint**: `(self, string: str, maxlengths: Iterator[int]) -> list[str]`
**Suggested Type Hint**: `(self, string: str, maxlengths: Iterator[int]) -> list[Optional[str]]`


## Explanation
The method [`Charset.header_encode_lines`](https://github.com/python/cpython/blob/ae315991431df5172799ef7ccc0202ac7c0841c9/Lib/email/charset.py#L293-L352) may potentially return a list containing `None`.
```python
class Charset:
    def header_encode_lines(self, string, maxlengths):
        ...
        lines = []
        current_line = []
        ...
                # Does nothing fit on the first line?
                if not lines and not current_line:
                    lines.append(None)
                else:
                    joined_line = EMPTYSTRING.join(current_line)
                    header_bytes = _encode(joined_line, codec)
                    lines.append(encoder(header_bytes))
        ...
        return lines
```

Although the path through which control flow might reach `lines.append(None)` is not immediately clear, it's important to note that this condition (where `None` is included in the returned list) was encountered in CPython's email unit test, specifically in [TestEmailAsianCodecs.test_japanese_codecs](https://github.com/python/cpython/blob/ae315991431df5172799ef7ccc0202ac7c0841c9/Lib/test/test_email/test_asian_codecs.py#L22-L59) in `test/test_email/test_asian_codecs.py`. The minimal example provided below illustrates this type mismatch:
```python
class TestEmailAsianCodecs(TestEmailBase):
    def test_japanese_codecs(self):
        eq = self.ndiffAssertEqual
        jcode = "euc-jp"
        gcode = "iso-8859-1"
        j = Charset(jcode)
        g = Charset(gcode)
        h = Header("Hello World!")
        jhello = str(b'\xa5\xcf\xa5\xed\xa1\xbc\xa5\xef\xa1\xbc'
                     b'\xa5\xeb\xa5\xc9\xa1\xaa', jcode)
        ghello = str(b'Gr\xfc\xdf Gott!', gcode)
        h.append(jhello, j)
        h.append(ghello, g)
        eq(h.encode(), """\
Hello World! =?iso-2022-jp?b?GyRCJU8lbSE8JW8hPCVrJUkhKhsoQg==?=
 =?iso-8859-1?q?Gr=FC=DF_Gott!?=""")
```

In this example, the method `header_encode_lines` gives the output `[None, '=?iso-8859-1?q?Gr=FC=DF_Gott!?=']`. This occurs through the follwoing control flow:
```text
email.header.Header.encode
-> email.header._ValueFormatter.feed
-> email.charset.Charset.header_encode_lines
```




# 2. Type mismatch in `email.charset.Charset.body_encode`


## Overview
**Target for Modification**: [`Charset.body_encode` in `email/charset.pyi`](https://github.com/python/typeshed/blob/cfc5425cb3fb2985cf105c712e89f96e94f7979c/stdlib/email/charset.pyi#L23-L26)
**Existing Type Hint**: (overload) `(self, string: None) -> None`, `(self, string: str) -> str`
**Suggested Type Hint**: (overload) `(self, string: None) -> None`, `(self, string: str | bytes) -> str`


## Explanation
The method [`Charset.body_encode`](https://github.com/python/cpython/blob/af06a8ad4d94f78d86d59a6268b3f38543921beb/Lib/email/charset.py#L369-L398) may be called with `bytes` as the `string` parameter, a case not currently covered by the existing type hint:
https://github.com/python/typeshed/blob/cfc5425cb3fb2985cf105c712e89f96e94f7979c/stdlib/email/charset.pyi#L23-L26

As mentioned in #10429 , [`Message.set_charset` method](https://github.com/python/cpython/blob/af06a8ad4d94f78d86d59a6268b3f38543921beb/Lib/email/message.py#L350-L395) can pass `self._payload` value to `Charset.body_encode` as the `string` parameter. However, there are scenarios where `Message.set_charset` modifies the `str` type `self._payload` to `bytes` type, before passing it to `Charset.body_encode` method. The following snippet from the library code demonstrates this:
```python
class Message:
    def set_charset(self, charset):
        ...
                payload = self._payload
                if payload:
                    try:
                        payload = payload.encode('ascii', 'surrogateescape') # "str" payload converted to "bytes" here
                    except UnicodeError:
                        ...
                self._payload = charset.body_encode(payload)
        ...
```

This mismatch (passing `bytes` in place of `str`) was noticed in the CPython email unit test, [`TestEmailAsianCodecs.test_payload_encoding_utf8`](https://github.com/python/cpython/blob/ae315991431df5172799ef7ccc0202ac7c0841c9/Lib/test/test_email/test_asian_codecs.py#L61-L67) found in `test/test_email/test_asian_codecs.py`. The compact example provided below highlights this type mismatch:
```python
class TestEmailAsianCodecs(TestEmailBase):
    def test_payload_encoding_utf8(self):
       jhello = str(b'\xa5\xcf\xa5\xed\xa1\xbc\xa5\xef\xa1\xbc'
                    b'\xa5\xeb\xa5\xc9\xa1\xaa', 'euc-jp')
       msg = Message()
       msg.set_payload(jhello, 'utf-8')
```

In this example, the `body_encode` method receives `bytes`, invoked via the following control flow:
```text
email.message.Message.set_payload
-> email.message.Message.set_charset
-> email.charset.Charset.body_encode
```




# 3. Type Mismatch in `email.policy.EmailPolicy.header_store_parse`


## Overview
**Target for Modification**: [`EmailPolicy.header_store_parse` in `email/policy.pyi`](https://github.com/python/typeshed/blob/cfc5425cb3fb2985cf105c712e89f96e94f7979c/stdlib/email/policy.pyi#L36)
**Existing Type Hint**: `(self, name: str, value: str) -> tuple[str, str]`
**Suggested Type Hint**: `(self, name: str, value: Any) -> tuple[str, str]`


## Explanation
The [`EmailPolicy.header_store_parse` method](https://github.com/python/cpython/blob/af06a8ad4d94f78d86d59a6268b3f38543921beb/Lib/email/policy.py#L131-L148) can be invoked with an object of any type as the `value` parameter, but the current type hint only allows for a `str` value.
https://github.com/python/typeshed/blob/cfc5425cb3fb2985cf105c712e89f96e94f7979c/stdlib/email/policy.pyi#L36

Considering that [`email.message.Message`'s special method `__setitem__`](https://github.com/python/cpython/blob/af06a8ad4d94f78d86d59a6268b3f38543921beb/Lib/email/message.py#L420-L436) calls `EmailPolicy.header_store_parse` and given the [signature of `__setitem__`](https://github.com/python/typeshed/blob/cfc5425cb3fb2985cf105c712e89f96e94f7979c/stdlib/email/message.pyi#L35), the `EmailPolicy.header_store_parse` parameter type should be `(self, name: str, value: Any)`.
```python
class Message:
    def __setitem__(self, name, val):
        ...
        self._headers.append(self.policy.header_store_parse(name, val))
```
```python
_HeaderType: TypeAlias = Any
class Message:
    def __setitem__(self, name: str, val: _HeaderType) -> None: ...
```

This inconsistency was detected in the CPython email unit test, [`TestBytesGenerator.test_smtp_policy`](https://github.com/python/cpython/blob/af06a8ad4d94f78d86d59a6268b3f38543921beb/Lib/test/test_email/test_generator.py#L295-L314) found in `test/test_email/test_generator.py`. The trimmed example provided below illustrates this type mismatch:
```python
from email.message import EmailMessage
from email.headerregistry import Address
msg = EmailMessage()
msg["From"] = Address(addr_spec="foo@bar.com", display_name="Páolo")
```

In this scenario, the `msg["From"] = Address(...)` statement invokes `message.EmailMessage.__setitem__` ( = `message.Message.__setitem__`) and passes an `Address` object to the `value` parameter.


---
P.S. All links to CPython reference commit hash `ae315991...` from the current CPython 3.12 branch.

P.S. In my attempts to apply existing static type checking tools such as mypy, pytype, and pyright to the type hints in the cpython standard library, I've encountered various errors and imprecise results. I'm wondering if there are any established best practices for this process. Any guidance or tips would be greatly appreciated.


	@overload
	def body_encode(self, string: None) -> None: ...
	@overload
	def body_encode(self, string: str) -> str: ...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Correcting type mismatches in the email module #10444

1. Type mismatch in `email.charset.Charset.header_encode_lines`

Overview

Explanation

2. Type mismatch in `email.charset.Charset.body_encode`

Overview

Explanation

3. Type Mismatch in `email.policy.EmailPolicy.header_store_parse`

Overview

Explanation

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Correcting type mismatches in the email module #10444

Description

1. Type mismatch in email.charset.Charset.header_encode_lines

Overview

Explanation

2. Type mismatch in email.charset.Charset.body_encode

Overview

Explanation

3. Type Mismatch in email.policy.EmailPolicy.header_store_parse

Overview

Explanation

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

1. Type mismatch in `email.charset.Charset.header_encode_lines`

2. Type mismatch in `email.charset.Charset.body_encode`

3. Type Mismatch in `email.policy.EmailPolicy.header_store_parse`