Skip to content

✨ Parse UTF-8 encoded strings, for UTF8=ACCEPT and IMAP4rev2 #111

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 12, 2023

Conversation

nevans
Copy link
Collaborator

@nevans nevans commented Feb 12, 2023

In RFC3501 (IMAP4rev1):

    resp-text       = ["[" resp-text-code "]" SP] text

In RFC9051 (IMAP4rev2):

    resp-text       = ["[" resp-text-code "]" SP] [text]

And in RFC9051 Appendix E:

     23.  resp-text ABNF non-terminal was updated to allow for empty text.

In the spirit of Appendix E. 23 (and based on some actual server
responses I've seen over the years), I've leniently re-interpreted this
as also allowing us to drop the trailing `SP` char after
`[resp-text-code parsable code data]`, like so:

    resp-text       = "[" resp-text-code "]" [SP [text]] / [text]

Actually, the original parser already _mostly_ behaved this way, because
the original regexps for `T_TEXT` used `*` and not `+`. But, as I
updated the parser in many other places to more closely match the RFCs,
that broke this behavior. This commit originally came _after_ many many
other changes.  While rebasing, I moved this commit first because that
simplified later commits.

Also:
* ♻️ Add `Patterns` module, to organize regexps.
* ♻️ Use `Patterns::CharClassSubtraction` refinement to simplify
  exceptions.
* ♻️ Add `ParserUtils::Generator#def_char_matchers` to define `SP`,
  `LBRA`, `RBRA`.
* ♻️ Add `ParserUtils#{match,accept}_re` to replace `TEXT`, `CTEXT` lex
  states.
* ♻️ Remove unused `lex_state` kwarg from match
The parser update supports both RFC6855 (UTF8=ALLOW, UTF8=ONLY) and the
UTF8 requirements of IMAP4rev2 (resp-text).

Also updated #enable documentation and method signature:
* document `UTF8=ACCEPT` as "supported"
* use `*rest` args => flatten => map(aliases) => uniq
* add `:utf8` as an alias for `UTF8=ACCEPT`
@nevans nevans added the IMAP4rev2 Requirement for IMAP4rev2, RFC9051 label Feb 12, 2023
@nevans nevans merged commit b557d51 into master Feb 12, 2023
@nevans nevans deleted the UTF8-for-IMAP4rev2 branch February 12, 2023 06:02
@arnt
Copy link
Contributor

arnt commented Feb 13, 2023

Wow, that's nice. I wrote the same thing while I was away being teambuilt, and discovered yours while making a pull request. A joyous kind of disappointment, if you see what I mean.

Mine has a couple of details I like better (if I've understood the diff correctly), I may make a tiny PR for those. But as it stands, great work, I'm so happy. Once ruby/net-smtp#49 is merged Ruby will have fine support for EAI.

@nevans
Copy link
Collaborator Author

nevans commented Feb 13, 2023

@arnt Sorry about that! It's a good problem to have, and honestly I might still be sitting on my code in private branch if you hadn't beaten me to the ENABLE PR. 😉 I understand this joyous disappointment well!

And, I actually just came here to send you a note and ask if you had any thoughts on this PR! In addition to your note on the other PR, I knew from looking up your github profile that you had experience with email UTF-8 encoding. And later when I was searching extra-wg mailing list archives I saw your name in there too! So I figured you had experience that was well worth consulting. 🙂

I look forward to any thoughts you have to offer, and... I'm gonna race you to finish my QRESYNC PRs first, too. 😜

@nevans
Copy link
Collaborator Author

nevans commented Feb 13, 2023

@arnt BTW, if you want to simply make a fake "PR" against the earlier version of the code, I'd be happy to take a look at that, too.

@arnt
Copy link
Contributor

arnt commented Feb 13, 2023

I'll be glad to review it, preferably next week, when I have two eleven-hour flights. This week I have lots to do. I'll push the branch again, though.

@arnt
Copy link
Contributor

arnt commented Feb 22, 2023

Looks good to me. Better than my code, which was perhaps overly optimised for being a small diff.

I see 'git push' didn't recreate the branch I had deleted. No matter. I made a PR with the bits I had that weren't in your code. Really just details.

Thanks for doing this. I think I'll check mikelmail+arabic next.

@arnt
Copy link
Contributor

arnt commented Feb 22, 2023

One other thing. I can't race you on qresync. I can do favours, and will ;) but my job description says I should make EAI/IDN work better, find and fix interop problems, all that kind of thing. There's enough of it to keep me busy for a long time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IMAP4rev2 Requirement for IMAP4rev2, RFC9051
Projects
None yet
2 participants