Skip to content

EmailMessage objects break when folding malformed header #132105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
davidmcnabnz opened this issue Apr 5, 2025 · 4 comments
Open

EmailMessage objects break when folding malformed header #132105

davidmcnabnz opened this issue Apr 5, 2025 · 4 comments
Labels
stdlib Python modules in the Lib dir topic-email type-bug An unexpected behavior, bug, or error

Comments

@davidmcnabnz
Copy link

davidmcnabnz commented Apr 5, 2025

Bug report

Bug description:

In certain cases, with EmailMessage objects, encoded headers can fold with double line endings, causing breakage of flattened message objects, making messages display improperly in email clients and rendering attachments inaccessible.

Python3.11.11 does not exhibit this behaviour. But numerous other Python versions eg 3.11.9, 3.12.x, 3.10.x and 3.9.x are affected by the bug.

The specific malformation is where:

  • a header is long
  • a header is presented with its value presented as RFC 2047 UTF8 base64
  • when decoded, the header's payload ends with a newline \n

An example of this:

Subject: =?utf-8?B?Vm9pY2Vib3ggRmlybWE6IFRlc3QtTmFjaHJpY2h0IHVtIDEwOjAzOjM1IDA0LjA0LjI1IC0gQUdGRU8gRVMgNTIyIElUIHVwIC0gc3RhaXBzbmV0Cg==?=

Here is a test script which reproduces the problem on many/most recent Python versions:

#!/usr/bin/env python3
"""
Reproduces a bug with flattening email headers whose encoded payloads end in a newline
"""
import sys
from email import policy
from email.parser import Parser, BytesParser, HeaderParser

sampleRaw = """Date: Fri, 04 Apr 2025 10:03:35 +0200\r
From: [email protected]\r
Subject: =?utf-8?B?Vm9pY2Vib3ggRmlybWE6IFRlc3QtTmFjaHJpY2h0IHVtIDEwOjAzOjM1IDA0LjA0LjI1IC0gQUdGRU8gRVMgNTIyIElUIHVwIC0gc3RhaXBzbmV0Cg==?=\r
MIME-Version: 1.0\r
Content-Type: multipart/mixed;\r
 boundary="235711131719"\r
To: [email protected]\r
\r
This is a multi-part message in MIME format.\r
--235711131719\r
Content-Type: text/plain; charset=UTF-8; format=flowed\r
Content-Transfer-Encoding: 7bit\r
\r
This is readable part of the body\r
--235711131719--\r
\r
"""

messageObj = Parser(policy=policy.default).parsestr(sampleRaw)

print(sys.version)
print("--------")
print(messageObj.as_string())
print("--------")

Python 3.11.11 flattens the message correctly. Here is an excerpt in/around the Subject: header:

From: [email protected]
Subject: Voicebox Firma: Test-Nachricht um 10:03:35 04.04.25 - AGFEO ES 522 IT
 up - =?utf-8?q?staipsnet=0A?=
MIME-Version: 1.0
Content-Type: multipart/mixed;
 boundary="235711131719"
To: [email protected]

This is a multi-part message in MIME format.

As can be seen here, the header has been re-wrapped to quoted-printable (which is fine), the embedded newline is present in the encoded payload, but when the header is folded out, it has only the one line ending. Great.

But other pythons I've tested with come up with:

From: [email protected]
Subject: Voicebox Firma: Test-Nachricht um 10:03:35 04.04.25 - AGFEO ES 522 IT
 up - staipsnet

MIME-Version: 1.0
Content-Type: multipart/mixed;
 boundary="235711131719"
To: [email protected]

This is a multi-part message in MIME format.
--235711131719
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit

This is readable part of the body

Note the blank line after Subject:.

When messages folded and delivered with this breakage are delivered, MTAs will often add Content-Type: text/plain because the headers after Subject have been lost into the body.

The recipient of the above message will see this formatted as plain text in their viewing pane:

MIME-Version: 1.0
Content-Type: multipart/mixed;
 boundary="235711131719"
To: [email protected]

This is a multi-part message in MIME format.
--235711131719
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit

This is readable part of the body

If there were any attachments, their encoded representations will appear after that as gibberish strings, and the mail client won't indicate an attachment is present.

This has potentially serious security implications. If an email delivery chain has python stdlib-based mail processing at or near the end, a carefully structured malicious header can

  • evade sanitisation
  • nullify and/or replace headers added upstream in the chain
  • inject malicious extra headers and MIME-encoded context which may cause subsequent handling steps, and/or the final MUA, to be hijacked for exploit attempts.

Workaround for me has been to subclass a policy, and implement a _fold() method to get the parent class' folding, then .rstrip() it, then add "\r\n" to the end.

CPython versions tested on:

3.12

Operating systems tested on:

Linux

@davidmcnabnz davidmcnabnz added the type-bug An unexpected behavior, bug, or error label Apr 5, 2025
@skirpichev skirpichev added stdlib Python modules in the Lib dir topic-email labels Apr 5, 2025
@davidmcnabnz
Copy link
Author

Two further notes:

  1. Newlines within the encoded header also fold with added newlines, allowing any number of extra headers to be injected, and
  2. I am unable to reproduce any of this vulnerable behaviour in a vanilla Python 3.13.2 which I just built from source on Debian.

So now it's both 3.11.11 and 3.13.2 which wrap illegal line breaks (plus a few chars before/after) in quoted-printable encoding. On the other hand, all other Python versions I've tried have the vulnerability.

@rijo7
Copy link

rijo7 commented Apr 6, 2025

I’m interested in working on this issue and would appreciate any advice or recommendations you might have to get started.

@davidmcnabnz
Copy link
Author

@rijo7 a good starting point is comparing the behaviour of different Python sub-versions. Some have the vulnerability, some don't.

@medmunds
Copy link
Contributor

medmunds commented May 6, 2025

@davidmcnabnz could you identify the specific 3.12.x, 3.10.x and 3.9.x versions where you have seen the bug?

It's very possible this was fixed by either #122754 (last month, so just now showing up in the latest Python releases) or #122233 (last summer).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir topic-email type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants