Falsehoods Programmers Believe About Names

Vivtek · on June 17, 2010

This is almost as bad as the address question.

I go by my middle name (my parents thoughtfully gave me my father's first name and even middle initial, which has caused no end of confusion for many a credit bureau over the years). Unfortunately, the State of Indiana's birth certificate system assumes beyond any possibility of override that (1) everybody with children has a first name, (2) that first name has no spaces in it, and (3) the middle name is insignificant. So my kids' birth certificates have my dad's names on them as the father.

But hey, who am I, a mere parent, to say what my name is?

The real world is full of organic detail that is difficult or perhaps impossible to capture in full in a software system. A name should be just a string. If there are business requirements for sorting or name-of-address (e.g. "Dear Mr. Jones") they should be done with heuristics, with human intervention invoked in sticky situations if necessary. Sure, it's difficult, but you know, a hundred years ago that stuff was all done on an individual case-by-case basis by human beings; if your business assumption is that it can't be done cheaply enough without eliminating human intervention, perhaps you should rethink your assumptions.

Otherwise, let me assure you you're irritating every customer whose name doesn't meet your arbitrary rules - in exactly the same way that Google pisses off everybody who needs human support. Everybody thinks that sucks. So at least you should get people's names right, using human intervention if you have to. (And addresses, too, but we just had an extended thread about that, like last month.)

frossie · on June 17, 2010

A name should be just a string.

Indeed. The real question is, why are you asking for their name in the first place?

Consider these three scenarios:

(a) You are an airline, and the name you collect has to match their government-issue ID

(b) You need to mail something to the user

(c) You wish to create an account for them on your blogging website

The requirements for you to understand what they are trying to tell you their name is very much depend on the situation (context matters, as I need to have tattooed on my forehead). For (c), screw what is their first name, or last name, or middle name, or whether they have such a thing. For (b) the only requirement is "write something down what the post office in their country will understand" - again, never mind whether you understand it. For (a), how is this going to be matched by TSA? Do what is right to conform to the API.

Even in the US, people have very complicated names - here is a not unusual name from my local paper's birth announcements: Kalamanamananaueikalani Tomoko Namakauluhonuamea'ililamalama Wengler-Ioane

Does anybody really, really need to deal with this in its full glory? It should come down to "What would you like us to call you" and "are you M/F" (if relevant).

gisenberg · on June 17, 2010

On the topic of address questions, I went through a form on Fidelity that disallowed post office boxes from being supplied as an address. At the time, I lived on Newport Way, and the "po" in Newport qualified as a PO box.

So, for a while, I lived on Newp0rt Way.

rmc · on June 18, 2010

In many countries in Africa, the postal service doesn't do delieveries to people's houses. Every address is a "PO BOX ...."

halostatue · on June 17, 2010

I am the third person by my full name. My grandfather goes by our shared first given name; my father by our shared middle given name. When the distinction does not matter, I just use my first given name and surname. I never use the middle given name (if someone called me just by my middle given name, I would not respond).

If the distinction matters, then whatever system is in place had better be prepared to use my ordinal (III) as significant, but had better not put it as part of my last name.

So I will go by: "A-"; "A- Z-"; "A- H- Z-"; or "A- H- Z- III". I never go by "H-"; "H- Z-"; "H- Z- III"; or "A- Z- III".

Vivtek · on June 17, 2010

Well, in my case, in addition to giving my my dad's first name, my parents called me by my middle name to distinguish us (a not-unusual practice in families originating in Kentucky/Tennessee; I'm damned lucky they called me Mike instead of Jim-Mike). So for all intents and purposes, my legal middle name is my actual name. It's what I identify with me. That programmers don't get that is a continual source of irritation in my life.

njharman · on June 17, 2010

Falsehoods users believe about my programs/service.

1) A simple, ascii A-Za-z0-9_ identifier unique to program/service is not required to use my program/service.

1a) My program/service cares deeply about your real world name and all it's nuances and variations. It requires full and complete understanding of your name rather than just a simple authentication mechanism.

2) My program/service will correctly handle the worlds naming conventions past, present, and future to the exclusion of other features / actually shipping some day.

3) Instead of following 80/20 "rule", my program/service will names 100% perfect!

4) My program/service will cater to all languages/cultures/subcultures/niches past, present, and future rather than any sort of target audience.

pyre · on June 17, 2010

What about when your program/service is something where identity matters, like for a credit bureau, a government agency or even a retailer?

Just read the complaints here about airliners screwing up people's names by assuming that everything is ASCII/LATIN1/MISC_CHARSET. That makes a difference when you get to the airport and they won't let you on the plane because your ticket and your id don't match.

It's also an easily solvable problem. Just include a 'name' field. Make sure that it's sanitized correctly and supports UTF-8 (make sure that your database is properly set up to use UTF-8 as well, or else all manner of problems can happen, least of all queries will get bogged down as the database does charset conversions for string comparisons). If you care so little about the user's name, then why bother to have separate firstname/lastname fields?

{edit} With respect to the 'other complaints,' I was referring to the comments on this story: http://news.ycombinator.com/item?id=1438355 (The story which is what Patrick's post was in response to).

MichaelSalib · on June 17, 2010

Just read the complaints here about airliners screwing up people's names by assuming that everything is ASCII/LATIN1/MISC_CHARSET.

Having worked in the airline industry, I can tell you, this is not a case of airlines making a poor assumption. Airline systems are generally incapable of dealing with ASCII, let alone Unicode text. They typically use EBCDIC. That's why you never see lower case characters in your name on a boarding pass.

That makes a difference when you get to the airport and they won't let you on the plane because your ticket and your id don't match.

Does this actually happen? Because given the fact that airlines have been incapable of printing names that don't can't be described by EBCDIC since, well, forever, I'd assume that they're willing to ignore cases where the name on your ID contains characters that their computer system literally cannot represent.

ido · on June 17, 2010

For what it's worth, I've been allowed on planes with misspelled tickets (multiple times).

A travel agent I asked about it once told that if the names match 80% or more of the character they assume a typo was made.

pyre · on June 17, 2010

> Does this actually happen? Because given the fact that airlines have been incapable of printing names that don't can't be described by EBCDIC since, well, forever, I'd assume that they're willing to ignore cases where the name on your ID contains characters that their computer system literally cannot represent.

Do the screeners know to this? I'm willing to bet that a screener would deny you based on a name mis-match. What do the screeners know of EBCDIC?

MichaelSalib · on June 17, 2010

There are two classes of people who can deny you access to aircraft: (1) airline employees (gate agents, check in agents) and (2) TSA inspectors. No one at an airport will know anything about EBCDIC, but every single airline employee will know about the fact that they're crappy mainframe system isn't smart enough to handle uppercase characters, let alone spaces, hyphens, umlauts, accents, etc.

As for TSA staff, it is important to realize that this EBCDIC issue is not a problem that only affects one airline: it affects almost every single airline, including all flag carriers that I know of. The airlines have a pretty tight working relationship with TSA; note that they send electronic messages to the TSA describing their customers when people book tickets and just before departure so as to give the TSA the opportunity to prevent them from flying. Given that tight relationship, I'm pretty sure that the TSA knows a great deal about the limitations of EBCDIC and has informed their staff about the limits of what a boarding pass name field can possibly represent.

Keep in mind that screeners will let you fly even if you have no ID on you.

mrduncan · on June 17, 2010

Well said.

Anecdotally, I've never had issues with tickets that had my partial name on them in the past either ("Matt" vs. "Matthew") although this may have changed now that TSA requires full names and birthdays (I always use my full name now).

IgorPartola · on June 17, 2010

I like it. I'll take two.

edw519 · on June 17, 2010

Oh Patrick, as usual you got me to thinking. Then it hit me: the "Name" issue isn't that much different from the "SKU" issue. (SKU is short for Stock Keeping Unit, aka Part Number or Product Number). I have had to deal with this everywhere that has SKUs. No one does it well, but by slowing down and thinking about it, there's almost always a decent solution.

There are 2 ways to assign SKUs, sequentially (start with "1" and increment 1 for every new SKU) or not sequentially. I have never seen anyone do it sequentially. (Although some excellent systems keep 2 SKUs, one of which is sequential and is used as the primary key and for all indexing. This is the best way I've ever seen to handle SKUs that change, but that's another story.)

Almost everyone wants a smart or semi-smart SKU. So that by simply looking at the SKU, anyone can tell what it is without reading the description. You know, the first digit is Commodity Code, 1 for shoes, 2 for pants, etc. Then another digit for color, another for size, etc. This works well until you have ten colors; then you need 2 digits or alphas.

But wait, there's more. Let's put hyphens (or some other delimitter) between the product descriptors and the vendor data, manufacturer data, and customer data.

So now you've covered any possible product with your super slick smart SKU naming system.

Until something comes along that isn't covered. (Now we have military items with 14 other considerations.) So we come up we a second totally different scheme. Then a third. Then a 4th, etc. So now you can tell anything about an item if you know which scheme it falls under.

But wait, there's more. You should be able to enter any SKU into a form field regardless of Smart SKU scheme. (If the first digit is "9", then use Smart Scheme 3. If there's a hyphen in position 3, then use Smart Scheme 8, etc.) Your form logic should be able to intelligently guide the user based on the rules of the template or scheme.

I have built apps where users can design and build their own Smart SKU templates, which are then used to enforce compliance and guide operators. These have generally worked pretty well.

Is there some way to do the same thing for human names? I dunno, but now you got me to thinking about it. A combination of standard templates and custom templates oughta cover most possibilities. Some basic logic with optional pop-up forms which uses the templates as parameters should work. Something to think about...

rikthevik · on June 17, 2010

This sounds exactly how instruction sets evolve.

32 opcodes? That's plenty! We'll never run out. :)

baltoo · on June 17, 2010

Aren't URI:s an attempt to encode this type of hydra-like schema?

Not that URI:s are that great for showing to humans, but if we added some visualizations template plug-in system for parts of it (similar to how many web servers map URLs to renderings of data resources).

Or is this kinda what you mean with templates?

bruceboughton · on June 17, 2010

Actually, this sounds like more of a problem. It's a name... Just give the user a free text field.

coderdude · on June 17, 2010

Which does you no good if you need the specific components of that person's name. That is the issue with using a free text field for addresses.

bruceboughton · on June 18, 2010

The whole point of the article is that there is no such thing as the specific components of the person's name... that is the assumption that is wrong. A person's name is whatever they or their parents chose. You cannot reliably derive meaning from any part of it.

Nekojoe · on June 17, 2010

A question mark can be part of your name -

http://news.bbc.co.uk/1/hi/sci/tech/8206280.stm

Certain characters like the defunct yogh have had issues in Unicode -

http://en.wikipedia.org/wiki/Yogh

http://news.bbc.co.uk/1/hi/magazine/4595228.stm

Some names can be changed if they're cruel or too unconventional -

http://news.bbc.co.uk/1/hi/7522952.stm

http://news.bbc.co.uk/1/hi/magazine/6939112.stm

http://news.bbc.co.uk/1/hi/world/asia-pacific/6937327.stm

kroger · on June 17, 2010

For those confused what an article about zombies (first link) has to do with questions marks in one's name, here's the relevant quote:

"Professor Robert Smith? (the question mark is part of his surname and not a typographical mistake)"

jcl · on June 17, 2010

Coincidentally, I ran across Scott Shaw!'s homepage today: http://www.shawcartoons.com/bio.php

(...although there are plenty of other names with exclamation points, due to languages like Xhosa.)

kroger · on June 17, 2010

And, of course, his name looks funny when someone is asking a question:

"Is that you Professor Robert Smith??"

And I wonder if he introduces himself as "Robert Smith Question Mark" ;-)

mattm · on June 17, 2010

I wonder how you pronounce ?. Is it a click like Zulu or do you just have a rising intonation?

epochwolf · on June 17, 2010

Rising intonation gets my vote. http://www.youtube.com/watch?v=JTbrIo1p-So (Tim? the Enchanter)

vidarh · on June 17, 2010

I have an uncle that has one name on his birth certificate, one on his passport and uses a third - all different spellings of his first name.

Particularly confusing to people when his passport has the name spelled one way and signed another.

Nobody has been able to get a straight story from him about why.

wisty · on June 18, 2010

Then there's William Shakespeare, who spelled his name about 8 (IIRC) different ways. I guess it was a mark of pride for a literary man to be able to write his name however he wanted to, while less educated folk would have to write their names from memory.

danh · on June 17, 2010

For a good illustration of the possible weirdness of names, see http://en.wikipedia.org/wiki/Pablo_Picasso

RyanMcGreal · on June 17, 2010

I'm sorry, but your name has too many letters. Please enter a shorter name.

mseebach · on June 17, 2010

A contemporary example: http://en.wikipedia.org/wiki/Karl-Theodor_zu_Guttenberg

danh · on June 17, 2010

Even more contemporary is young Brfxxccxxmnpcccclllmmnprxvclmnckssqlbb11116 (pronounced Albin): http://en.wikipedia.org/wiki/Naming_law_in_Sweden

pmjordan · on June 17, 2010

This is all very interesting (and even humorous), but I'd appreciate something more constructive. How do we implement a name field on a website in spite of all this weirdness?

halostatue · on June 17, 2010

* If you have a requirement for a legal name, ask for the legal name as one field. Don't artificially split it into two fields. Make it long enough that you won't truncate. * If you have restrictions on what you can store, show the user what you will be storing their entered name as and make sure they're okay with that. * If you must have multiple fields for a user's name (given, familial), allow one of them to be optional. * If you want to use a familiar name as a nice touch, ask the user what they want to be known as (e.g., a "nickname", but I wouldn't call it that). * If you have to interface with other systems that do have artificial restrictions, explain this to the user and give them an opportunity to override the name used for those systems if it's important.

Obviously, you don't want to ask too much up front (to fight what I'm coining as "form-fatigue"; you know, where there's so many questions on the form that you just give up signing up), so you may need some decent heuristics to give you reasonable default values for these things that the user can override if they want.

billswift · on June 17, 2010

I would suggest using 2 distinct "names". The first is as you suggest - an unlimited text field for their "real" name which is simply stored as entered. Then have them choose a "unique identifier" for the site to use, like the names used in email systems.

jpr · on June 17, 2010

Huh, isn't it obvious? Accept anything as a name.

smackfu · on June 17, 2010

And then when you get a business requirement to sort by last name, tell them "there is no such thing as last names."

Then when they finish laughing, hack up some system to split the single name field that doesn't work very well.

(It gets even worse when you have unusual sorting rules that really only apply to Western names, like treating Mc and Mac the same.)

patio11 · on June 17, 2010

True story: I once told a customer that their requirement to "sort by last name" was likely impossible to satisfy without reworking the system, given what I knew about their user base. It turned out that I was right: the majority of their staff was Japanese, who expect Japanese lexicographic sort. However, they also had visiting professors from overseas, who largely could not understand Japanese lexicographic sort. (There are actually two common lexicographic sorts in Japanese, based on two ways to order the sounds of Japanese. We use the "table of fifty sounds" method, under which Tanaka comes after Aki but before Sato.)

My customers said "Fine, alright, two sort features. One for Japanese, one for foreigners."

They were less than happy when I told them that foreigners haven't agreed on lexicographic sort, either. (To use one example I'm familiar with, in Spanish, "ch" is one letter, so Chisako comes after Consuela.)

Oh, there is a separate right way to order prefectures. (Which I learned about in an email from a coworker saying "Patrick, come on, use some common sense next time. Have you ever seen prefectures listed lexicographically before?!")

suninwinter · on June 17, 2010

In Spanish, at least, the ch is no longer considered as a separate letter for sorting. It is still a separate letter in lists.

http://en.wikipedia.org/wiki/Ch_(digraph)#Collation

huherto · on June 17, 2010

Yeah, that was an interesting adaptation from the Spanish language academy to the use of computers. At the time there was some public debate similar to the "pluto is no longer a planet".

Semiapies · on June 17, 2010

So, what did you end up doing? I'm very curious.

jerf · on June 17, 2010

"hack up some system to split the single name field that doesn't work very well."

You've correctly identified a problem, but misassigned the responsibility. The problem with the single-name split is that it is impossible, full stop. When you forcibly try to do the impossible, you always get bad results.

You appear to be proposing that we force the users to enter first and last names, then we can trivially sort them by last name. But for the exact same reasons you can't write a name splitter after the fact, you can't have the user break their names into "first" and "last" either. You haven't solved the problem of there being "no last name" by changing your input form. You've just moved it around. Now it irritates the customer instead of you, which is often a bad trade, and sort by last name still doesn't work because customers for who a first/last split doesn't work have fed you one or another variety of garbage data. Garbage data which is now even harder to find; at least when you had a one-word name you had a clue that there was no first/last split.

rubinelli · on June 17, 2010

In this case, you should dig deeper. If the customer says "give me sort by last name," the actual requirement may be "give me an easy way to find a person by last name." In this case, a general full-text search may be the best solution.

fragmede · on June 17, 2010

This is the right way to do it. Due to user error. I've had problems checking into a hotel until the receptionist realize my first/last names are switched.

The receptionist SHOULD be shown a list but ALSO wants to be able to search for me if I don't show up in the list (and may ask for my ID/get a particular spelling).

(It was a comp'd room in Vegas, so I have no idea where the mistake was made, but this can't be a unique occurrence.)

snprbob86 · on June 17, 2010

My project has a "sort by last name" requirement, but we wanted to accomodate "funky" names. Here's what I did:

There are four fields in the database:

  first_name
  last_name
  display_name
  primary_email

All 3 name fields are optional. The email address field is required, but you could use a customer ID or username if that is more appropriate for your app.

The name are never accessed directly, except on the one form where you can edit these fields. In general, they are accessed by these two helper methods:

  @property
  def name(self):
      if self.display_name:
          return self.display_name
      if self.first_name and self.last_name:
          return self.first_name + ' ' + self.last_name
      return self.primary_email

  @property
  def last_first(self):
      if self.first_name and self.last_name:
          return self.last_name + ', ' + self.first_name
      if self.display_name:
          return self.display_name
      return self.primary_email

Then we do a culture specific case insensitive sort. This should cover most cases pretty well....... I hope :-)

jpr · on June 17, 2010

Well, as far as I can tell, you are screwed anyway you do it, so you might as well do the simplest thing.

smackfu · on June 17, 2010

For a Western company, that may be: use a first name and a last name field which works for 95% of our customers and inconveniences the other 5%. But that 5% is already used to it.

Which of course goes against the spirit of the original article.

jpr · on June 17, 2010

Yeah. I think I would personally go for two fields -- first name and last name -- with a hint that 'if unsure, use lastname' so that the rather common(?) 'sort by lastname' would make at least some sense in most cases, and the requirement that at least one of the names must be non-empty.

pmjordan · on June 17, 2010

It's the further processing after that which I'm interested in. Sanitising the input should probably be rejecting/replacing with a space any non-printable code points or weird whitespace characters. But what about sending emails? "Dear <name>"? There are probably other situations where this is a problem.

JoeAltmaier · on June 17, 2010

And the hated "Its ok to hash names into a 32-bit table, and then tell the user "Stieg Eugene Janakowski is already in use"

olliesaunders · on June 17, 2010

Can anyone explain some of the more esoteric points on this list? Like "People’s names are all mapped in Unicode code points", for instance.

dagw · on June 17, 2010

I'm not sure of all the details, but the Unicode people kind of screwed up when doing Japanese. From what I understand there are several older variations on characters not in common use, but that occasionally show up in names (and old Japanese literature), that can't be encoded in Unicode.

ars · on June 18, 2010

If that's the case, they can't enter it into a computer anyway, so don't worry about it.

rbanffy · on June 17, 2010

I cannot discard the possibility my name was originally expressed in Hungarian rovásírás. AFAIK, it has not been assigned to any Unicode region. I was taught rovásírás as a kid, by my grandpa.

I can live with the romanized version. It's been in use for more than a thousand years and it's a bit late to complain.

philwelch · on June 17, 2010

There's no Unicode code point for the unpronounceable symbol which served as the name of The Artist Formerly Known As Prince, before he again became known as Prince.

jimbokun · on June 17, 2010

Undoubtedly due to all the problems he had filling out forms on the Internet.

alextp · on June 17, 2010

I think he meant the musician formerly known as Prince for this one.

LogicHoleFlaw · on June 17, 2010

Well, once again known as Prince, anyways.

smallblacksun · on June 17, 2010

The artist formerly known as the artist formerly known as Prince.

jplewicke · on June 17, 2010

The former Once and Future Prince?

bokchoi · on June 17, 2010

There is also the guy in the credits from the movie Fargo who uses a Prince symbol on it's side.

As far as the Artist Formerly Known As Prince symbol lying on its' side with a happy face in the middle, in the credits: "I'm the storyboard artist formerly known as J. Todd Anderson. That's all I can say about that." It's a private joke between J. Todd and the Coens. Prince and the Coen brothers are both from Minneapolis. (Dayton Daily News: 3/22/96)

http://www.imdb.com/name/nm0026824/bio

Grinnmarr · on June 17, 2010

I'm not sure if you are being snarky or just ignorant, but the "The Artist..." is the commonly accepted usage and descriptor http://www.citypages.com/1999-06-23/news/the-people-formerly...

GFischer · on June 17, 2010

Some countries, like mine (Uruguay), have a unique identifier for people - in Uruguay it's called "Cédula de Identidad" which would translate to Identity Card.

http://en.wikipedia.org/wiki/Identity_document

It even has a nice check digit.

But even that breaks in some cases :) so Patio's point applies.

Still, it makes building local-only apps easier.

techsupporter · on June 17, 2010

Some case #46c: The person is a citizen of a country with a national identity card, yet resides in another country wherein he or she has lost that identity card, cannot return to the country of citizenship to replace said identity card, and country of citizenship will not replace the identity card outside of that country.

:)

(Hello, fellow Uruguayan)

brlewis · on June 17, 2010

For the admiration of your peers, identify the subset of this set of assumptions embodied in the "Leave a Reply" form at the bottom of the article.

patio11 · on June 17, 2010

Yeah, I concede, Wordpress has buggy name validation logic for the SQL injection boxes.

JoeAltmaier · on June 17, 2010

Miss Manners says "People's names are spelled and pronounced exactly as they say they are"

tomjen3 · on June 17, 2010

Yeah, but if it can't be expressed as unicode you are not going to be able to use it on my system.

And I am not even going to say sorry.

JoeAltmaier · on June 17, 2010

Which is quite consistent with being impolite.

pyre · on June 17, 2010

Does this mean that you view Unicode as perfect and encompassing of all the characters currently in use in the world?

ars · on June 18, 2010

Makes no difference. You have to work with what you have. It's not like the programmer can do anything about it.

pyre · on June 18, 2010

There's a difference between working with what you have, and blaming people for using the 'wrong' characters in their name. From the post I was replying to:

  > And I am not even going to say sorry.

With comments like this, one comes off as a douche, which isn't exactly going to engender trust with potential customers/clients. Whether it's 'your' fault or not, how you deal with your potential customers/clients is what really matters.

wisty · on June 17, 2010

++ People's last name is their "family name".

++ People's first name is their "family name".

++ People have first names and last names.

mattm · on June 17, 2010

I had a friend whose legal name here in Canada was Juliana Juliana. She came from Indonesia and didn't have a family name. They told her she needed to have two names for her visa application so she just doubled up her given name.

coderdude · on June 17, 2010

>>People's first name is their "family name".

I may be mistaken but I believe this is true in Vietnam (and probably other countries as well).

josh33 · on June 17, 2010

Mongolians use their Father's name as their surname, but they write it first. So if my name would be Brad's Josh. While you may think that they should just write it backwards, that isn't how they write it, and they shouldn't be required to translate their names for our systems necessarily.

NickPollard · on June 17, 2010

The point he is making is that people make one assumption or the other, deepending on the culture in which they live, when actually it varies.

In the western world, we assume that someone's family name is their last name. In countries such as Korea, people assume that someone's family name is their first name.

Either assumption can be wrong if you're dealing with international customers. Some people might not even have a family name.

coderdude · on June 17, 2010

Just so it's clear, I'm not refuting what he said but rather supporting it with a tidbit from my memory, as you have done as well. :)

wisty · on June 18, 2010

Also China, and another reply says Mongolia.

DrJokepu · on June 17, 2010

No, these are falsehoods people who define the requirements believe about names. That's very often not the same as the programmer, especially at large organizations.

Deestan · on June 17, 2010

If the programmers actually adding the constraint to their systems didn't also believe the falsehoods, they would have written better error messages than, in essence, "You typed your name wrong!".

derefr · on June 18, 2010

User-facing error messages are frequently dictated by the requirements as well.

LargeWu · on June 17, 2010

I think the real lesson here is to to identify those assumptions that may be incorrect for the culture your app is targeted towards and make reasonable accommodations for those cases.

The most common ones are probably the easiest to deal with...long names, only one name, punctuation in names. Just relax your validation requirements. We just took care of 99% of names not common to western culture. But I don't think if you are designing an application for, say, the Olathe, KS youth rec sports registration website that you need to be too concerned with folks who have names that have characters not mapped in Unicode. If a person's name is so unusual that it doesn't fit even culturally relaxed input requirements, then they've probably already dealt with that problem before.

patio11 · on June 17, 2010

I heartily agree that "think what you're assuming" is a good take away here, but can't support "think what culture your app is targeting", because I've seen that virtually invariably blow up straight in the teams' faces.

Many Japanese people think racial/cultural homogeneity is practically a national trademark, and this issue has bitten nearly every Big Freaking Enterprise system I've ever seen in Japan. Do you think your global-facing web app or small American town is going to be less culturally diverse than Japan is? That strikes me as highly improbable.

LargeWu · on June 17, 2010

I think you're missing my point. Having relaxed standards, such as I outlined, will account for nearly all special cases for a target audience that is reasonably homogeneous. Of course if you are designing a global facing application, then yes, you need to account for a wider range of possibilities. But for many applications designing around cultural norms, but being liberal about what you accept, is perfectly acceptable where the benefit of accommodating for low-probability cases does not justify the costs.

And, like I mentioned, in cases for applications targeted towards a homogeneous group, for those that have names so far outside the cultural norms that even liberal standards cannot accommodate them, then they've probably encountered the same thing already before, and they've come up with a way to adapt.

Allowing a liberal range is like me going to France and asking for ketchup with all of my food. But having a really unusual name, like one with unmappable characters, is like me going to Riyadh and expecting them to serve booze with my meal. Should I be allowed to drink there just because it's acceptable in my culture? No. It's illegal there, and I just have to adapt whether I like it or not.

Again, I emphasize, what I am advocating is for applications with a culturally homogeneous audience, not global, multicultural apps.

autarch · on June 17, 2010

Japanese people living in the US are no doubt perfectly comfortable writing their name out in Latin characters for the benefit of the barbarians they live with.

patio11 · on June 17, 2010

Many Japanese professionals who have call to be in the United States -- but not all of them! -- will write their name something like YAMASHITA Taro, which is your clue that you should be calling him Mr. Yamashita rather than Mr. Taro. Or Dr. Yamashita, in the case of my client who was not actually named that.

Some of them, including at least one prominent politician and several people I represented in a professional capacity, are adamant that reversing the order is incorrect.

(Unsurprisingly, many Americans are unaware of this convention. Sadly, many of them persist in being unaware of it even when it is written on the meeting briefing and explained verbally right before the meeting starts. sigh Twelve corrections in 3 days -- my client was not happy for that trip.)

Does your hotel, car rental service, university, etc, handle this case correctly? We blazed a path of frustration through Chicago and Michigan last time.

Further fun: a Taro Yamashita born and raised in the United States (or any other Taro, for that matter), and present at the same university for the same conference, might request the exact opposite treatment! And if you spell his name as YAMASHITA Taro on the meeting agenda in defiance of his preference, you're doing it wrong!

_delirium · on June 18, 2010

But who's really doing it wrong? If someone expects another culture to conform to their own culture's preferences when a guest there, perhaps they're the ones making the error. I actually attended a conference in Japan that wrote all names, including mine, as "FAMILYNAME Givenname" on the program. I personally write and prefer "Givenname Familyname", but I don't see it as a cause for offense. As a guest in their country, I consider it their prerogative to go by their local conventions.

I mean, wouldn't it be pretty chauvinist for me to get offended at it or demand that they follow Western name ordering? It feels like it'd be in the same category as complaining that they don't serve my favorite American food at the conference, or that the cars are driving on the wrong side of the road.

rmc · on June 18, 2010

> But who's really doing it wrong?

You are.

The canonical source for the correct way to spell/write someone's name is that person. You should never tell someone "No, you're writing your name wrong".

_delirium · on June 18, 2010

I disagree. People's preferences should get some deference, but not ultimate deference. I don't have to accommodate Prince's claim that his name is properly written as a graphical symbol, for example. It's perfectly fine to tell him to pick a name that isn't a graphical symbol.

Between countries, I think generally the right approach is to use the country's naming customs, and adapt foreign names to them, to the extent reasonably possible. In Greece, for example, it's customary to transliterate people's names into the Greek alphabet, especially if the source is going to be read by Greeks (newspapers, etc.). The person known as "George Bush" in the United States is more commonly referred to in Greece as "Τζωρτζ Μπους", for example--- despite it being a rather ugly transliteration in this case, due to a bunch of the consonants not existing in Greek.

Are you really arguing that this Dr. Yamashita (or Mr. Bush) can tell them they can't use their own alphabet in their own country, because that's not how he likes his name written?

GavinB · on June 17, 2010

My first name is spelled with a $10 bill. It's pronounced "Gavin." My last name is a picture of a flower. If you represent it at less than 1000x1000 pixels or with washed out colors, you are insulting my heritage.

My middle name is a musk ox.

How far does politeness dictate that you have to go to accommodate me?

pyre · on June 17, 2010

I think that the bar should at least be set higher than:

  "Your name does not fit into a Western firstname/lastname
  format with only ASCII characters, please choose another
  name."

This is on par with expecting everyone to speak English just because you speak English. By the very setup of the form, you are implying that your way is the 'one true way' or at least that you don't care about people that don't fit into your pre-defined set of expectations (i.e. "Don't have a firstname/lastname? I don't care about your business! Your money is no good here!").

How hard is it to just have a 'Name' field that supports UTF8? Sure it's not a 100% solution, but it's a lot better than the 20% (or less) solutions that we have out there right now.

santry · on June 17, 2010

http://vodpod.com/watch/1616920-funny-hugh-laurie-stephen-fr...

spectre · on June 17, 2010

Most of the items on that list are pretty realistic, but there are a few you can safely ignore. For example image the response you'd get trying to sign up for a bank account (in person), if you told them you didn't have a name.

patio11 · on June 17, 2010

Define "safely". If you write code for a hospital that handles this case poorly, when a premature infant is found abandoned in a toilet, you might end up debugging from the ER.

Semiapies · on June 17, 2010

Define "poorly". Hospitals I've worked with just tend to put in codes for unnamed babies (such as "NBM", "NBF", surname if known, etc.) for the name fields of legacy programs. If you over-heavily relied on the names in the first case, you'd have a big problem with multiple baby John Smiths in the maternity ward.

smallblacksun · on June 17, 2010

John/Jane Doe

ars · on June 18, 2010

No, they call them Baby Boy, and Baby Girl - it even goes on the birth certificate.

vidarh · on June 17, 2010

That's all well and fine if you're building a system for a bank. Not so much if you're working on a project for the UN High Commissioner for Refugees or similar agency.

So, yes, there are a few you can safely ignore. The problem is the set you can safely ignore is probably different for almost every job you'll have.

joe_the_user · on June 17, 2010

Calling a statement a "falsehood" implies there's some truth that can and should replace it. The statements he lists aren't "falsehoods" or "truths", they are approximations - operating assumptions.

If your operating assumptions are true enough for a given purpose, there's no reason to change them. If they deviate enough from to generate problems, they will need to be replaced ... with other approximations (not with "the truth").

-- This is why apparently simple and "finished" programs alway need "maintenance" - even the most seemingly simple assumption need tweaking as the world's conditions change.

fforw · on June 17, 2010

The puritans had "religious slogan names" like

Nicholas If-Jesus-Christ-Had-Not-Died-For-Thee-Thou-Hadst-Been-Damned Barbon

( http://en.wikipedia.org/wiki/Nicholas_Barbon )

JulianMorrison · on June 17, 2010

I'm sorry, but if your name doesn't fit into Unicode, you need a new name. You do not have the right to your own character encoding and font (on someone else's website).

patio11 · on June 17, 2010

Anticipation of this comment was, in a nutshell, the reason the Han unification debate in Unicode got so acrimonious, and why lots of Japanese people carry a chip on their shoulder about it to this day.

"Sorry, grandma, I know you've been sort of attached to your name for the last 80 years, but the white folks find it inconvenient for their computer systems. Don't worry, they promise they'll make something close for you."

Many of the clients of my ex-day job are married to legacy encodings like Shift-JIS precisely because they do think that their customers and students have a "right" to having their names written correctly. (Most of them also make a total hash out of foreigner's names, which I spent a good deal of time correcting. As far as I know my office probably still uses my name as test data, since it screws up about 80% of the systems we had, and it was cheaper to work around or patch than it was to fire me.)

bonaldi · on June 17, 2010

Unicode is incomplete, and this is the fault of people who it doesn't serve? Developers don't have the right to dictate how people write their names just because they have created a poor implementation.

The best solution is suggested in the other link: apologise for the technical limitations and offer a workaround. Don't demand that the world breaks solely to fit into your technical limitation.

nuxi · on June 17, 2010

Is this supposed to be a modern version of "I'm sorry but if your name doesn't fit into 7-bit ASCII, you need a new name"?

pjscott · on June 17, 2010

Similar, except that there are a lot fewer people whose names can't be written in Unicode. Hell, the vast majority of names fit in the Basic Multilingual Plane; you could probably get away with using a fixed 16 bits per character.

If your name doesn't fit into Unicode, get a nickname. Bonus points if your nickname fits into 7-bit ASCII.

nuxi · on June 18, 2010

Right, but what's the point of Unicode then? What it's supposed to be: "Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language." What it is: The above, plus "Well, almost every character. Sorry we couldn't fit your name there, perhaps you should consider a nickname."

Say you want to order a package from somewhere. How does getting a nickname help? How do you explain it to the post office? I know it sounds like nitpicking, and it probably is, until it affects you personally. I've had my share of "name-mangling" and my name does fit into Unicode (not into ASCII though).

pjscott · on June 18, 2010

You have a point, but what I'm most concerned with is balancing these two things:

1. I don't want to inconvenience people with "weird" names.

2. I don't want to burden application programmers too much.

Requiring everybody to have a simple ASCII name would be convenient for programmers, but would be a big hassle for people whose names don't meet those requirements. "Be in the Basic Multilingual Plane or get a nickname" is a policy that, I think, provides a reasonable balance. Of course, supporting all of unicode isn't really that much harder, so I think that's a better balance.

hopeless · on June 17, 2010

There are people who insist on a particular capitalisation of their names (i.e., all lower-case, irrespective of grammar rules) which is surely no different from using characters outside the Unicode set.

I'd be very careful telling people what "rights" they have around their names.

EDIT: you can't really control what other people call you but you certainly can control what you call yourself.

JulianMorrison · on June 17, 2010

"which is surely no different from using characters outside the Unicode set" - not when you are thinking in terms of making a program that will support it. A program will happily render Mr bob smith's name in lower-case if that's how he entered it. But Ms non-Unicode-squiggle is out of luck, and I don't have much sympathy.

dagw · on June 17, 2010

Yea, how could 80 year old Ms non-Unicode-squiggle's parents have been so cruel and thoughtless that they didn't check the unicode spec before naming their child. Screw them and their lack of foresight and time machines.

derefr · on June 18, 2010

I'm sort of wondering how Ms non-Unicode-squiggle manages to type her non-Unicode-squiggle into the form field in the first place.

dagw · on June 18, 2010

There are other encoding systems than Unicode out there.

derefr · on June 18, 2010

Sure; I was reacting to the specific concept of a "non-Unicode-squiggle", implying that the character has no known mechanism for encoding it (i.e. it's not simply an SJIS squiggle) and would likely have to be submitted as a custom bitmap/vector path. That's a good place to draw the line.

dagw · on June 18, 2010

The whole point is that there are characters in use in certain languages that have accepted mechanisms for encoding in some non-Unicode character encoding system used for that language, but not in Unicode.

stcredzero · on June 17, 2010

The artist formerly known as Prince?

Perhaps the actress from "Orlando" could start calling herself ~Swinton? Or maybe just "~"?

RyanMcGreal · on June 17, 2010

Obligatory: http://weblog.raganwald.com/2007/09/you-suck.html

jheriko · on June 17, 2010

This problem isn't handling names, its validating them and trying to do clever things with them, which is rarely, truly, necessary - especially if its a customer's name that they see on mail, bills etc and that they have the ability to correct... they will validate it for you.

I think the real bad assumption is that validation is necessary or even desirable - applying the technique brainlessly to names is the root cause of this problem - you don't really need to make any assumptions.

colinprince · on June 17, 2010

Also, FYI, place names contain all sorts of archaic spellings and characters, for the same reason: they are identifiers, and identifiers cannot and should not be normalized.

yellowbkpk · on June 18, 2010

Incidentally my friend RJ just experienced a related problem on Facebook. After he got married his name (which includes a hyphenated family name) has 4 capital letters on it. Facebook complains that his name has too many capital letters and will not let him use it. He tried all-lowercase but Facebook automatically capitalizes each word making it look even worse.

vsync · on June 22, 2010

It gets annoying for handles too. I usually just use "vsync" but certain systems smash the first character upcase and it looks ugly as "Vsync". So I try "VSync" and it smashes all characters but the first downcase.

Not to mention systems that require usernames to have 6 characters or more, sigh...

nradov · on June 18, 2010

The HL7 V3 standard has a fairly good data model for dealing with names. http://www.hl7.org/v3ballot/html/infrastructure/datatypes_r2... Combine that with Unicode and you can solve the majority of problems.

mmacaulay · on June 17, 2010

Only somewhat related, but first thing that came to mind:

http://xkcd.com/327/

Goladus · on June 18, 2010

It doesn't matter what programmers believe, it matters what programmers can do with the computers available.

tudorw · on June 17, 2010

what about a number, can't we all just be a number... I'll start things off, call me 1

stcredzero · on June 17, 2010

This is a reference to a short story? Or was it a New Yorker cartoon?

dagw · on June 17, 2010

Or "The Prisoner" TV series.

byoung2 · on June 17, 2010

Maybe a reference to this: http://xkcd.com/327/

roqetman · on June 17, 2010

Should the number be a collection of binary digits, decimal...

eru · on June 17, 2010

No, a number. Not a string that consists of digits. Numbers can have more than one representation.

The real question is: Do we stick to integers? Otherwise I want a supernatural number (http://en.wikipedia.org/wiki/Supernatural_numbers) or at least something like i (where i^2=-1), or perhaps a normal number (http://en.wikipedia.org/wiki/Normal_number).

Semiapies · on June 17, 2010

I admire how tudorw has largely avoided this problem by nabbing "1". :)

tudorw · on June 17, 2010

I might not be a prime, but I know my place; http://primes.utm.edu/notes/faq/one.html

bfung · on June 17, 2010

my name is 2a2318da-c1a7-4416-a5c5-8acb17d9e85d.

Sukotto · on June 17, 2010

Oh good, I always wondered who number 1 was

hugh3 · on June 17, 2010

You are number six.

KevBurnsJr · on June 18, 2010

Lets just tattoo a bar code to everyone's forehead and be done.

DannoHung · on June 17, 2010

Wow, so recording people's names is an absolutely miserable affair with no likelihood of success.

Okay, from now on, all humans will not have names, they will be numbered in ascending order, starting from 1.

Boosh. If you want a 'handle' or 'username', you've gotta type it in whatever the encoding system supports.

Glad that problem has been solved.

stralep · on June 17, 2010

"We want information, information, information."

"Who are you?"

"The new number two."

"Who is number one?"

"You are number six."

"I am not a number, I am a free man."

"HAHAHAHAHAHAHAHA."

Btw, why not use encoded pictures?