I've accepted that it's best to treat people like grown-ups and if there's '@' and '.' and it's retyped then it passes. Someone can easily submit a fake name or phone number or street address, and e-mail's no different.
If they get it wrong, intentionally or not, then they don't get their receipt, confirmation, validation link, etc. and I believe in most cases the incentive is there for them to get it right.
In the rare case where there's some incentive to circumvent the system and this has some measurable impact on a site, then more validation may be warranted. Otherwise, why worry about it?
Your recommendation is almost as wrong as the regexp in the article, as both will reject the perfectly valid postmaster@ai address (check it out, "dig -t mx ai" will return a result, so that address must exist). The only thing you can be sure about in a non-local email address is that it contains at least one @ sign. More are allowed (for source routing), but hopefully no server is still configured to honor that, so you might get away with requiring exactly one @. Everything else is evil.
Oh, and don't forget to make sure that one component in your spam^W email processing chain correctly encodes unicode charaters in the domain part into punycode.
In which you are assuming you will receive a proper bounce, which is not warranted. Checking bounces is something you can do in addition to checking whether the user didn't screw up his email address.
True, but you have to balance that against the small but nonzero number of people put off by an extra text field. Plus, I would find email repetition more annoying if I didn't always do Cmd-A/Cmd-C/Tab/Cmd-V, and in this case the repeated field won't catch any errors.
The fact that you know the shortcuts for select all, copy, and paste puts you in the top percentile of users. Most people don't even know that's possible, and certainly not with keyboard shortcuts.
(The point being that for most users the faster approach that requires less thinking is to type it twice. Sometimes I do things the "slow" or "long" way when coding because it doesn't require a mental shift from the task at hand.)
The point being that for most users the faster approach that requires less thinking is to type it twice.
Spoken like a touch-typist with a short email address. I feel confident in claiming that annabelle.t.johnson@woodandplaster.co.uk will be looking for that copypaste button. It's not like copypaste is a new technological innovation - it's practically the most used feature of personal computing, after backspace.
That's true. My main point was the first one. The second point was mainly to illustrate that annoyance isn't hypothetical; I'm annoyed by an extra text box.
I agree they are annoying. I guess I'm just less annoyed than you are. Cmd-a Cmd-c Tab Cmd-v has a very negligible impact on the amount of keystrokes I make in a day. Typing my address again doesn't even make a real difference now that I think about it, 3 seconds or so.
Have you ever abandoned a sign-up because of the e-mail confirmation?
You'd need something more along the lines of a spell checking algorithm to do that sanely. Don't abuse regular expressions for things they're not good at.
Yeah, that python script (from StackOverflow London - http://norvig.com/spell-correct.html) that learned what was correct from a large number of words would work. Analyse your DB of users' email address url (after @) that have registered fully and provide a "Did you mean.." type response if a new users email url was not found and something close (within 2 alterations) to it was.
To let users revisit their email addresses without adding a second field for copy-pasting, Russ Unger of UserGlue and Jonathan Knoll found a nice solution:
My routine for dealing with a second e-mail field is shift+tab, ctrl+a, ctrl+c, tab, ctrl-v. Usually auto-fill takes care of the first fields anyway, never of the second one. It doesn't help me in any way.
Most forms asking for an email address use the same field name (or one of a limited number of field names), so the browser remembers what you typed last time, and it'll suggest it to you in a pop-up for autocompletion. All it takes is to type one letter, press down, and enter.
My thoughts exactly. There are so many websites that tell me my valid email address is invalid that it's not even funny. These people then have to deal with phonecalls and lost business, because their form won't even submit without a valid email address (and why should I change if they're the ones that suck).
BTW, the email address that doesn't work is "jon-whatever@jrock.us". The .us confuses people and the - confuses people. WTF!?
The have a lot more problems to deal with if they have to deal with all the people that include a stray space in the address.
There are regexes out there that catch all valid addresses that people reasonably use (including those with dashes and ending in .us) and the fact that all kinds of incompetent developers use a homebrewn regex is no reason to lower the best case scenario to "just don't validate". Just do it right.
If they get it wrong, intentionally or not, then they
don't get their receipt, confirmation, validation link,
etc. and I believe in most cases the incentive is there
for them to get it right.
Well, I hope you have a large enough support team, because with any reasonable amount of customer growth, you'll soon be swamped with support emails that say "I didn't get my ..., where is it? You suck!". Validating the email address (which includes checking for common typos in domains) is a service that reduces frustration and the amount of customer support needed.
Even if they've typed it correctly it still can be blocked for various reasons.
You will still need a support mechanism (doesn't have to get all the way to a person) to sort out undelivered verification mails. It's just how it goes.
Still, some rudimentary validation can prevent cases of someone entering the wrong value (e.g., just their username) or an incomplete value (e.g., scott.trudeau@gmail ) -- which I see with some regularity ~10k signups/month. If this lowers support email volume by even just 10%, that's not insignificant.
Along those lines, I've settled on the following overly permissive regex: /^[^\s@]+@[^\s@]+\.[^\s@]{2,}$/ -- it makes sure it looks something like an email address (a@b.cd)
RFC 5321 section 2.3.5 specifically prohibits TLDs from receiving email. Other RFCs back that up, actually including RFC5322 (not directly, but in wire format).
That said, we see some TLDs run MX's. I think at least some portion of them sell the mail received there to spammers. Seriously.
You obviously know a lot more on this topic... but where exactly does it say in that section that TLDs are prohibited from receiving emails?
From my reading, it seems to suggest the opposite, e.g. "In the case of a top-level domain used by itself in an email address, a single string is used without any dots."
I've re-read that section many times and forgive me if I missed it — it's late! =)
People have been posting conflicting responses throughout this thread suggesting that TLDs can/cannot receive emails, and I'd really like to know which should be valid... thanks!
interestingly i count 19 TLD's with mx records. but judging from your earlier post you know as well as i do (i use <single character>@w.tf for many purposes) that there are pretty much no web forms on the planet that would take such an address. tis a shame, but such addresses while pretty nifty are also frequently unusable. not to mention trying to explain to someone that that is really your email address and if it doesn't work they should talk to their ISP/mail client author/etc...well, suffice it to say they won't be getting in touch with you that way anyhow. (be that a blessing or curse, it's just reality)
There are many other uses for this regex than registration systems. We have at least two situations where a staff member would type in a customer mail address, and some get it wrong. Confirming won't help, and getting it wrong costs lots of time to fix. Using a better regex costs very little.
There are more reasons to check for an email than simply validation. Perhaps you want to detect emails in a field and let people click/tap on them to send that person an email? Don't want to let people send email off to foo@bar.baz now, do we?
You assume an email regex is only used for validating addresses. It can also be used to parse text for email addresses (for formatting or mining purposes maybe).
If they get it wrong, intentionally or not, then they don't get their receipt, confirmation, validation link, etc. and I believe in most cases the incentive is there for them to get it right.
In the rare case where there's some incentive to circumvent the system and this has some measurable impact on a site, then more validation may be warranted. Otherwise, why worry about it?
Also: HTML5. ;)