-
Notifications
You must be signed in to change notification settings - Fork 33
✨ Parse UTF-8 encoded strings, for UTF8=ACCEPT and IMAP4rev2 #111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In RFC3501 (IMAP4rev1): resp-text = ["[" resp-text-code "]" SP] text In RFC9051 (IMAP4rev2): resp-text = ["[" resp-text-code "]" SP] [text] And in RFC9051 Appendix E: 23. resp-text ABNF non-terminal was updated to allow for empty text. In the spirit of Appendix E. 23 (and based on some actual server responses I've seen over the years), I've leniently re-interpreted this as also allowing us to drop the trailing `SP` char after `[resp-text-code parsable code data]`, like so: resp-text = "[" resp-text-code "]" [SP [text]] / [text] Actually, the original parser already _mostly_ behaved this way, because the original regexps for `T_TEXT` used `*` and not `+`. But, as I updated the parser in many other places to more closely match the RFCs, that broke this behavior. This commit originally came _after_ many many other changes. While rebasing, I moved this commit first because that simplified later commits. Also: * ♻️ Add `Patterns` module, to organize regexps. * ♻️ Use `Patterns::CharClassSubtraction` refinement to simplify exceptions. * ♻️ Add `ParserUtils::Generator#def_char_matchers` to define `SP`, `LBRA`, `RBRA`. * ♻️ Add `ParserUtils#{match,accept}_re` to replace `TEXT`, `CTEXT` lex states. * ♻️ Remove unused `lex_state` kwarg from match
The parser update supports both RFC6855 (UTF8=ALLOW, UTF8=ONLY) and the UTF8 requirements of IMAP4rev2 (resp-text). Also updated #enable documentation and method signature: * document `UTF8=ACCEPT` as "supported" * use `*rest` args => flatten => map(aliases) => uniq * add `:utf8` as an alias for `UTF8=ACCEPT`
Wow, that's nice. I wrote the same thing while I was away being teambuilt, and discovered yours while making a pull request. A joyous kind of disappointment, if you see what I mean. Mine has a couple of details I like better (if I've understood the diff correctly), I may make a tiny PR for those. But as it stands, great work, I'm so happy. Once ruby/net-smtp#49 is merged Ruby will have fine support for EAI. |
@arnt Sorry about that! It's a good problem to have, and honestly I might still be sitting on my code in private branch if you hadn't beaten me to the And, I actually just came here to send you a note and ask if you had any thoughts on this PR! In addition to your note on the other PR, I knew from looking up your github profile that you had experience with email UTF-8 encoding. And later when I was searching extra-wg mailing list archives I saw your name in there too! So I figured you had experience that was well worth consulting. 🙂 I look forward to any thoughts you have to offer, and... I'm gonna race you to finish my |
@arnt BTW, if you want to simply make a fake "PR" against the earlier version of the code, I'd be happy to take a look at that, too. |
I'll be glad to review it, preferably next week, when I have two eleven-hour flights. This week I have lots to do. I'll push the branch again, though. |
Looks good to me. Better than my code, which was perhaps overly optimised for being a small diff. I see 'git push' didn't recreate the branch I had deleted. No matter. I made a PR with the bits I had that weren't in your code. Really just details. Thanks for doing this. I think I'll check mikelmail+arabic next. |
One other thing. I can't race you on qresync. I can do favours, and will ;) but my job description says I should make EAI/IDN work better, find and fix interop problems, all that kind of thing. There's enough of it to keep me busy for a long time. |
Split off from #104, this PR updates how
text
andquoted
are handled (without the other bugfixes and updates in that PR).RFC6855
(2013):UTF8=ACCEPT
(recommended byIMAP4rev2
for backward compatibility) #38