Skip to content

Consider "Reversing" Email Verification Flow #14048

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dstufft opened this issue Jun 30, 2023 · 1 comment
Closed

Consider "Reversing" Email Verification Flow #14048

dstufft opened this issue Jun 30, 2023 · 1 comment
Labels
needs discussion a product management/policy issue maintainers and users should discuss security Security-related issues and pull requests usability

Comments

@dstufft
Copy link
Member

dstufft commented Jun 30, 2023

Currently we implement a pretty typical email verification flow:

  1. User types in an email address.
  2. Warehouse sends an email to that address.
  3. User clicks on a link that opens up Warehouse and instructs it to mark that email as verified.

This flow works, it's well known, and it's secure.

Unfortunately it has one major problem-- it requires sending emails to addresses that you've never verified that you are allowed to send emails too. This becomes a problem because people can enter in emails that they don't own, causing us to email that person, which then often times get marked as spam when the person who really owns it receives it.

Traditionally the way that you prevented this from happening is by trying to raise the bar for automated uses of your verification form, typically by putting some sort of captcha in place, which brings it's own problems (captchas aren't full proof, often have accessibility concerns, are not always available in every country, etc).

This has been a problem for us, we don't have a particularly high volume of email that we send, so the email verification emails take up a dis-proportionally large amount of our outgoing email. When they get marked as spam that ends up hurting our reputation, and it has even gotten to the point in the past where AWS has disallowed our ability to send new emails because our spam rate was too high.

#13234 attempts to minimize the impact of these spam reports by just sending more email so that our email verification emails are a smaller % of our overall emails, so the couple that get marked as spam don't hurt as much. I think that issue is overall a good idea, as long as the new emails we're sending are emails that provide value (which I think most or all of the suggestions are). However it doesn't really solve the underlying problem of sending emails to unverified addresses, it just tries to reduce the impact of the spam reports we do get.

@woodruffw had previously blogged about a an idea he had about reversing the email verification flow, that could solve this problem.

The whole post is good, but a rough idea of the flow would be:

  1. User types in an email address.
  2. Warehouse generates a mailto link (probably with some temporary email inbox).
  3. User clicks on the link, their MTA opens up and they hit send.
  4. Warehouse receives the email, validates it, and marks their email address as verified.

This is an interesting idea, though it's not without it's own shortcomings:

  • It's a different flow than what users are used to, so the chances of confusion are higher.
    • Not everyone has their browser or OS setup to correctly handle mailto: links for instance.
  • It's subtly asserting something different, rather than "can the user receive at this address" it is instead testing "can the user send email from this address". This could be problematic for people who employ email forwarding or other tactics.
  • Email is extremely spoof-able.

The last one is probably the biggest problem. By default email has basically no way to determine or prevent anyone from sending an email from any email address, so a malicious user who doesn't control @example.com could craft an email that claims to be from [email protected] and the email protocol itself has no real mechanism to validate it.

This is obviously terrible, so there have been things bolted onto the side of email to try and fix it (as Will mentions in his post): namely SPF and DKIM.

Using SPF and DKIM, it is possible to determine that the sender is sending from a mail server that is authorized to send for some particular domain. Unfortunately at best I can tell, in the general population something like 30% of email domains have SPF/DKIM setup properly and like 60-70% do not. This means, to keep security, that we would have to hard fail in those cases, but the user can't actually do anything about the problem unless they happen to operate their own mail server. Fortunately the big mail providers are all within the 30% of domains that do it correctly, so likely the vast bulk of addresses would be fine, but likely not all of them.

I suspect that this problem kills this idea dead in the water (particularly when you add the other problems), but I wanted to open the issue just to give a space to consider it and see if I'm wrong, and at the very least to document that we thought about it and decided not to do it.

@dstufft dstufft added usability needs discussion a product management/policy issue maintainers and users should discuss security Security-related issues and pull requests labels Jun 30, 2023
@woodruffw
Copy link
Member

I'm glad my half-baked ideas are being entertained 🙂

To summarize some of what I sent to @dstufft after he posted this: I think the "reverse" flow is a very cool idea, but I agree with his evaluation about its limitations. In particular, the "reverse" flow relies much more heavily on the spoofable parts of email than the "normal" flow does, and in turn relies much more heavily on integrity/authenticity extensions to email that are not widely deployed (unfortunately).

There's also another argument, which is that doing the "reverse" flow arguably makes a given service a bad participant in the email network: by kicking to the MUA, it pierces any aliases or relay addresses that the user might have and might intend to use with that service. This is arguably a violation of user expectations (e.g. many users may have [email protected] registered on PyPI, which in turn is aliased to their real email), and in some cases may even inadvertently undo a user's attempt to remain pseudonymous. This is arguably not PyPI's problem to begin with (it's purely a function of how janky email is), but it's also arguably a case where the sheer amount of user expectation around email means that any change in behavior needs a really strong justification 🙂

TL;DR: Full agreement about it being dead in the water; in a better world where email was less terrible, I think a version of it would have been very cool to use here.

@dstufft dstufft closed this as not planned Won't fix, can't repro, duplicate, stale Sep 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs discussion a product management/policy issue maintainers and users should discuss security Security-related issues and pull requests usability
Projects
None yet
Development

No branches or pull requests

2 participants