Skip to content

Fix bpo-36041: fix folding of quoted string in display_name violates RFC #12054

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Fix bpo-36041: fix folding of quoted string in display_name violates RFC #12054

wants to merge 2 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Feb 26, 2019

Fix bug 36041: don't unquote quoted strings during email header folding

The refolding process is unable to determine if it is folding an unstructured header (where unquoting and
unescaping the string would be allowed) or a structured header where the quotes are required. Therefore, do the safe thing and keep the quotes and escapes.

https://bugs.python.org/issue36041

Aaryn Tonita added 2 commits February 26, 2019 14:21
For display names that are longer than the policy's maximum line length,
the folding process removes the quotes from the display name which
may violate the structure of an address header if the content of the
display name is specifically formed.
Since the _refold_parse_tree is context unaware, the function is unable
to determine if we can directly encode the text into the header. In the
language of RFC 5322, _refold_parse_tree is unaware if it is folding a
structured header or an unstructured header.

Since a quoted string will always be acceptable for both header types,
when a quoted string needs to be refolded (because it would make the
line overflow) prepare quoted children instead of the usual semantic
children of the BareQuotedString.
@ghost ghost self-requested a review as a code owner February 26, 2019 13:27
@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA (this might be simply due to a missing "GitHub Name" entry in your b.p.o account settings). This is necessary for legal reasons before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

You can check yourself to see if the CLA has been received.

Thanks again for your contribution, we look forward to reviewing it!

@nanjekyejoannah
Copy link
Contributor

nanjekyejoannah commented Feb 26, 2019

@SwampWalker put "Fix bug 36041: folding of quoted string in display_name violates RFC" in the PR title.

It is good to add the issue title also. Not just the bpo number.

@bitdancer
Copy link
Member

I haven't tested it, but it seems to me the problem with this is that if you have something like "this is an example of a very long quoted string" <[email protected]>, then what you will get is "this" "is" "an".... which isn't what we want. I think you almost have to implement something like the EW algorithm, where you try to glue back together adjacent quoted strings. It's been long enough since I wrote this code that I'm not sure I'm correct, though :) And like I said I didn't actually test your fix to see what happens in this case, at least not yet.

@ghost ghost changed the title bpo-36041 Fix bug 36041: folding of quoted string in display_name violates RFC Feb 27, 2019
@ghost
Copy link
Author

ghost commented Feb 27, 2019

I was just following the instructions in the PR prefill nanjekyejoannah, you might want to see those instructions updated. I've updated the title.

@ghost
Copy link
Author

ghost commented Feb 27, 2019

Bitdancer, you have misunderstood what the fix does. It does not quote the children, it adds a DQUOTE terminal as the first and last child and escapes the intermediate children. The test I added shows that each individual child is not separately quoted. I had considered refactoring the quote_string() function to extract a separate escape_string() function so I wouldn't have to call quote_string()[1:-1] with the followup slicing to remove the quotes. The reason I did not is because that escaping is somewhat specific to quoted strings and making a escape_string_for_quoted_string() method seemed to add more boilerplate than what I thought the function would deserve.

@ghost ghost changed the title Fix bug 36041: folding of quoted string in display_name violates RFC bug 36041: fix folding of quoted string in display_name violates RFC Feb 27, 2019
@ghost ghost changed the title bug 36041: fix folding of quoted string in display_name violates RFC Fix bpo-36041: fix folding of quoted string in display_name violates RFC Feb 27, 2019
@bitdancer
Copy link
Member

bitdancer commented Feb 27, 2019

Ah, I see. You are right, I did not spend enough time understanding the code (I was trying to give you quick feedback and obviously I was too quick :)

I will try to review this completely soon. but that sounds like the right solution.

@mdavis-xyz
Copy link
Contributor

Does this pull request also happen to fix 40359?

(Which is a ticket I created earlier today, about parsing long attachment file names.)

@wbolster
Copy link
Contributor

I will try to review this completely soon. but that sounds like the right solution.

@bitdancer is there any chance you can can be bothered do another review round on these changes? 🙃 if so, that would be great! 🙏

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants