I just tried to sign up for Notifo and wasn't allowed to use my real name because it has a - in it. So now I'm waiting for that to be fixed.
And don't get me started on how profanity filters consider my last name profane. Years ago I had a Hotmail account registered to the name Ivana C. Teens-Give-Head because they wouldn't accept John Graham-Cumming.
edit: Notifo telling me this will be fixed shortly.
We should start a club of people whose names vex software.
I could talk your ear off with how many ways I've broken systems in Japan, but one of my Asian American friends takes the cake. Her name is not Kim Kim, but it could be. Apparently some systems check to make sure you don't enter your first name twice...
Fake: Yes. I can't tell you how many times I've booked an air ticket only to get to the airport and find out they killed my ticket because it goes into the system and the program tosses a ticket that says "fake" on it. Twice I've gone to the counter for a KLM flight through Northwest and have been rejected. They say, "You don't have a ticket." I give them a confirmation and after some investigation I learn my ticket has been cancelled because the system deleted it. For a while I couldn't join Facebook because of my last name. During the registration process I was asked for my real name and when I wrote "Fake" it rejected me. Finally a friend working for Facebook took care of me.
I love Hotmail because my surname (Message) is an "illegal word". Apple shouldn't have targetted IBM in the 1984 adverts; it's actually Microsoft that is instituting Newspeak.
Thank goodness we don't live in a P.G. Wodehouse novel, we'd be getting cranky emails from Cyril Bassington-Bassington, whose name would probably crash three quarters of the systems for having a hyphen and the other three quarters for being too long.
A good friend of mine was blessed by her parents with three middle names and the last names of both parents. Her parents divorced and her mother remarried, adding another last name. In 2008 she married and added her husband's last name.
Her full legal name (in first-middle-middle-middle-last-last-last-last form) no longer fits into any database known to man, nor even on her driver's license, and is a consistent source of amusement.
For any "human" data: trim and escape, and you're done. If you want to validate it, just ask the party that knows for sure (send an email, run a transaction, visit the URL).
This includes names, addresses, phone numbers, emails, URLs, CC/account numbers, user names, passwords (maybe tell them that caps-lock is on or any other weird keyboard state if you can).
Yes! For years I thought I was the crazy one for telling our clients, "we don't really need to validate names. if they give you the wrong name, it's their problem; and it's more work for us; and we'll probably fuck it up and make someone mad". The answer was always "do it anyway, it's what we agreed on." The client feels like we are screwing them if we make the work easier, even though the end result is a higher-quality more usable website. Sigh.
The comment about developers making work for themselves is also spot on. I answer a lot of programming questions, and the questions are always asked because the programmer has reached the end of a twisty maze of his own creation. Turn around, walk, spin around, and try again. You'll find a better solution.
And oh yeah, I do this all the fucking time. Pick any random github project of mine, and you'll see 8 revisions of the API before I finally pick one that's not retarded. Even then. (Side note: I don't change the API after I release.)
> The comment about developers making work for themselves is also spot on. I answer a lot of programming questions, and the questions are always asked because the programmer has reached the end of a twisty maze of his own creation. Turn around, walk, spin around, and try again. You'll find a better solution.
This deserves to be repeated a thousand times.
How many times bad code and bad ideas stick around simply because those that came up with them can't even imagine that they could do without them.
I have run into this many times with people that try Plan 9, 'where is my pet unix "feature"?!?', guess what? It was not a 'feature' and it causes untold pain, and that is why it is not in Plan 9.
Just last week somebody was in the Go mailinglist asking why there is no preprocessor! sigh
You know you have done too much work with regular expressions when you think "Hey, wait a second, that can't possibly work" and start trying to debug it in the Ruby console for 10 minutes prior to realizing "Oh, HN is italicizing it because of the asterixes it is silently stripping."
I had this booking a flight. The system mangled a hyphenated surname, so the "pay now" page was wrong. With no way to go back and modify it, we had to return to the start and try all over again. On the third failed attempt the clever system had detected a certain interest in our flights, and socked up the price by $200!
On the third failed attempt the clever system had detected a certain interest in our flights, and socked up the price by $200!
Or the inventory you were trying to buy was no longer available. Seats are not all the same price; they are divided into "fare buckets" that are usually lettered. There are very few cheap fares on each flight, more medium fares, and even more full-fares. You should check the code as you are booking; if the fare code changed, they ran out of inventory. If the fare code stayed the same, then they raised the price of that inventory.
Just saying -- it wasn't some conspiracy. Someone just bought them out from under you. Most airlines let you hold a reservation for a few hours, so if this ever hits you again, just hold the reservation, call the web services desk (not the general reservations desk), ask web services to fix your name, and then continue the ticketing process online.
(This is the procedure for AA, anyway. Dunno about other airlines, as I've never used them.)
Personally, I always hold, triple-check my plans, and then buy. So I have never had inventory disappear out from under me, and I have never needed to change a non-changeable fare :)
Dammit, Reg -- I thought you were back in business when I saw this. Call me a greedy bastard if you must, but I've been suffering major withdrawal and the methadone I've found out there just ain't cuttin' it anymore.
Very interesting. Normally I don't read blog comments, because usually they're dumb, but these aren't too bad. I think you wrote a rant that everyone can agree with. We have all been burned by validation before, and we have all been forced to write it. It's boring and annoying for everyone. (Watching the clients test the website usually consists of them typing stuff to test the validation rules. They don't check the spelling, they don't check that it works as they specified, but they do check that they can't put 999999 as their zip code. Sigh!)
You also have the right readership -- the people that will disagree with your post don't even know what a "raganwald" is.
A perfect storm, if you will, for constructive blog comments :)
Not only that, but they're only called ZIP codes in the US (this is a peeve of mine). In Switzerland they're "post codes" ("code postal" in French, don't know the Swiss German name). They also write addresses in a different order. As an example, here's the address of a Kebap shop I used to frequent:
Avenue de la Sallaz 29
1010 Lausanne
Suisse
1010 is the post code, identifying La Sallaz (or a part of it?), in Lausanne. My point? All this stuff is very local and if you want to do it right you should just ask for addresses free-form and if you need to extract information from it then you should use a geocoding library (e.g. http://geocoder.rubyforge.org/) to normalize it for you.
(the german term is "Postleitzahl", usually abbreviated as PLZ)
Considering this and all the other issues related to addresses, I really wonder why we are still trying to store them in separated fields.
Why can't the address just be one big multiline text field where the user types whatever would be needed to receive a postal letter? If we need the data in structured form, we could always write a locale-aware parser that extracts the needed information.
Splitting the address in multiple fields (sometimes even labelled address_line_1, address_line_2 and so on) is probably a relic from the times where databases had nothing but CHAR (with a maximum length of 100 or something) and where applications were created for the local market only.
I would need to do some a/b testing, but I really doubt that it'll be easier for a user to fill out the traditional
Street 1: _______________
Street 2: _______________
ZIP: _____
City: _____________
form instead of just one big text field labelled "Postal Address" - personally, I'd probably be WAY faster filling out that one.
We really should go one-big-text box. I've tried to do it as often as clients will let me get away with it.
A recent client demanded the address be broken out into fields, but I at least swayed them into accepting a big-text-box for "international" addresses after showing them a few examples.
My pet peeve is the field labelled "State or province" which can't be left empty. This is surprisingly common.
Most countries on Earth are not federations; dozens of nations are small enough that dividing into "provinces" would be meaningless; and even larger countries that do have regions may not include their names in mailing addresses.
Americans back in the 50s used to do it almost the same, but flipped: for example, the old way to write the address (w/ zip code) for City Hall in Manhattan NYC
260 Broadway
New York 17
New York
but then the 17, originally a numbering system for big cities, became 100-07, the postcode 10007. (Compare the postcodes around London, EC1/N1/SW1/etc, which were originally just for sorting mail around the central city of England.)
Oh interesting. I had assumed that the transition to zip codes always added digits at the start (like San Francisco, which prepended 941 to the old codes) and didn't know there were places that inserted new digits in the middle instead.
If the email address was literally "foo+bar@domain", they may not have gone out of their way to screw you; there are lots of web apps that treat "+" as a special character, so all they had to do was pass it over another HTTP connection.
I use these as labels to auto-file registrations in Gmail, and I've seen a few anti-patterns:
1. Reject at data entry even though they make me type it twice and they're going to do a round-trip click-to-verify. Too many to name fail this way.
2. Accept at data entry, convert it to a space (+ on URL means space). Pray that the login routine accepts an email address with a space (it probably won't). Tirerack fails this way.
3. Accept at data entry, then fail to create the account in other internal systems. Allow login using the +, but once logged in, data from internal systems is unavailable, and the portal errors in unusual ways. VMWare fails this way.
Failing using anti-pattern 1 is preferable to pattern 2 (replace with space then fail login) or 3 (accept and allow login but fail to interoperate with other systems internally).
After 45+ emails and calls about anti-pattern 3 over 18 months, VMWare still hasn't successfully delivered a VMWare license to me. By now they're on version 3 (a free upgrade if v2 was bought recently) but still haven't delivered me version 2 or 3. Next email, perhaps I should send them raganwald's article.
This is pretty close to an argument for writing the extra line of code to reject email addresses with "plus" characters in them: the front-end team might not know how the backend team will screw up.
No, they probably didn't actually get up in the morning and decide to do me in. I suspect that particular problem was either giving it to an overzealous cleaning algorithm or--as you say--passing it over an http connection without escaping it properly. A third possibility is that it was given to a mainframe application written in the 1970s that uses brutal hackery to deal with email addresses. Such kludges are often redolent with broken edge cases.
The problem is that the actual spec for validating email addresses is preposterously long and complex, and can't even be implemented as a regexp since it requires nested parsing. So everyone just writes /^\w+@\w+\.[\.\w]+$/ or something lame.
No, the problem is people trying to validate email addresses when they shouldn't. Similar to the credit card name example in TFA, you're trying to save a call to the MTA, when in reality, a user that doesn't want to be contacted will enter foo@foo.com, and you have to send it anyway.
You want to validate email addresses, because a surprising number of users is incapable of typing their email address correctly in one try. Validation saves a lot of rework.
The 'spec' for validating an email address is to send mail containing a token to it and require the user to respond with it.
Bang paths are valid but aren't going to be routable these days. Different hosts allow different characters in usernames, and have different meta-replacement rules for stuff like periods and pluses.
You don't validate a domain name with a regex, you use a goddamn DNS resolver. Email addresses are a superset of that! Don't use a fucking regex.
Ragan is totally right about this being wrong and needing to be fixed.
However, the idea that websites should use the bank's payment gateway for validation is misguided. Your fee's will be increased (or your account will be suspended) on many payment gateways if you do this.
I think it depends on what you're validating. If you are trying to validate a name, you had better get it right! For example, my Visa still says Reginald Braithwaite-Lee, but I might register on your site as Reg Braithwaite. Did I misspell my name or is that what's on my Visa? Is the hyphen a typo?
OTOH, some things seem to be more certain, like rules about check sums. I like the approach suggested by many folks: Use JS to validate on the client, and put up a "Are you really, really sure?" message for things that seem unusual like a single name or a funny character.
I wonder how much you could get out of their support as an apology, ie free flights and so on? I never really try that, but I hear some people extract bonus offers routinely.
Apology? Mwahahaha! What happened was that they sent me an email saying the electronic ticket would follow by email within 24 hours. This was on a Friday for a Monday morning flight at 8:00. When no ticket arrived by Sunday morning, I called them to discover their office was closed. I called again at 5AM on Monday morning and they were still closed, so I waited until their office opened at 8:00.
Wrong move. They charged me for the flight, saying that even though they promised an e-ticket and didn't send it, and even though their offices were closed, I should have shlepped out to the airport where the airline would have resolved my problem. I appealed to Visa but lost.
Many mail daemons will "accept" mail no matter who the recipient is. Internally, unmatched mail may be discarded or forwarded to a "catch-all" account, but all the sender sees is "Recipient ok"
I think this started as a response to SPAM bots that used to "RCPT TO:" random strings and save a list of valid address.
And don't get me started on how profanity filters consider my last name profane. Years ago I had a Hotmail account registered to the name Ivana C. Teens-Give-Head because they wouldn't accept John Graham-Cumming.
edit: Notifo telling me this will be fixed shortly.