I go by my middle name (my parents thoughtfully gave me my father's first name and even middle initial, which has caused no end of confusion for many a credit bureau over the years). Unfortunately, the State of Indiana's birth certificate system assumes beyond any possibility of override that (1) everybody with children has a first name, (2) that first name has no spaces in it, and (3) the middle name is insignificant. So my kids' birth certificates have my dad's names on them as the father.
But hey, who am I, a mere parent, to say what my name is?
The real world is full of organic detail that is difficult or perhaps impossible to capture in full in a software system. A name should be just a string. If there are business requirements for sorting or name-of-address (e.g. "Dear Mr. Jones") they should be done with heuristics, with human intervention invoked in sticky situations if necessary. Sure, it's difficult, but you know, a hundred years ago that stuff was all done on an individual case-by-case basis by human beings; if your business assumption is that it can't be done cheaply enough without eliminating human intervention, perhaps you should rethink your assumptions.
Otherwise, let me assure you you're irritating every customer whose name doesn't meet your arbitrary rules - in exactly the same way that Google pisses off everybody who needs human support. Everybody thinks that sucks. So at least you should get people's names right, using human intervention if you have to. (And addresses, too, but we just had an extended thread about that, like last month.)
Indeed. The real question is, why are you asking for their name in the first place?
Consider these three scenarios:
(a) You are an airline, and the name you collect has to match their government-issue ID
(b) You need to mail something to the user
(c) You wish to create an account for them on your blogging website
The requirements for you to understand what they are trying to tell you their name is very much depend on the situation (context matters, as I need to have tattooed on my forehead). For (c), screw what is their first name, or last name, or middle name, or whether they have such a thing. For (b) the only requirement is "write something down what the post office in their country will understand" - again, never mind whether you understand it. For (a), how is this going to be matched by TSA? Do what is right to conform to the API.
Even in the US, people have very complicated names - here is a not unusual name from my local paper's birth announcements: Kalamanamananaueikalani Tomoko Namakauluhonuamea'ililamalama Wengler-Ioane
Does anybody really, really need to deal with this in its full glory? It should come down to "What would you like us to call you" and "are you M/F" (if relevant).
On the topic of address questions, I went through a form on Fidelity that disallowed post office boxes from being supplied as an address. At the time, I lived on Newport Way, and the "po" in Newport qualified as a PO box.
I am the third person by my full name. My grandfather goes by our shared first given name; my father by our shared middle given name. When the distinction does not matter, I just use my first given name and surname. I never use the middle given name (if someone called me just by my middle given name, I would not respond).
If the distinction matters, then whatever system is in place had better be prepared to use my ordinal (III) as significant, but had better not put it as part of my last name.
So I will go by: "A-"; "A- Z-"; "A- H- Z-"; or "A- H- Z- III". I never go by "H-"; "H- Z-"; "H- Z- III"; or "A- Z- III".
Well, in my case, in addition to giving my my dad's first name, my parents called me by my middle name to distinguish us (a not-unusual practice in families originating in Kentucky/Tennessee; I'm damned lucky they called me Mike instead of Jim-Mike). So for all intents and purposes, my legal middle name is my actual name. It's what I identify with me. That programmers don't get that is a continual source of irritation in my life.
Falsehoods users believe about my programs/service.
1) A simple, ascii A-Za-z0-9_ identifier unique to program/service is not required to use my program/service.
1a) My program/service cares deeply about your real world name and all it's nuances and variations. It requires full and complete understanding of your name rather than just a simple authentication mechanism.
2) My program/service will correctly handle the worlds naming conventions past, present, and future to the exclusion of other features / actually shipping some day.
3) Instead of following 80/20 "rule", my program/service will names 100% perfect!
4) My program/service will cater to all languages/cultures/subcultures/niches past, present, and future rather than any sort of target audience.
What about when your program/service is something where identity matters, like for a credit bureau, a government agency or even a retailer?
Just read the complaints here about airliners screwing up people's names by assuming that everything is ASCII/LATIN1/MISC_CHARSET. That makes a difference when you get to the airport and they won't let you on the plane because your ticket and your id don't match.
It's also an easily solvable problem. Just include a 'name' field. Make sure that it's sanitized correctly and supports UTF-8 (make sure that your database is properly set up to use UTF-8 as well, or else all manner of problems can happen, least of all queries will get bogged down as the database does charset conversions for string comparisons). If you care so little about the user's name, then why bother to have separate firstname/lastname fields?
{edit} With respect to the 'other complaints,' I was referring to the comments on this story: http://news.ycombinator.com/item?id=1438355 (The story which is what Patrick's post was in response to).
Just read the complaints here about airliners screwing up people's names by assuming that everything is ASCII/LATIN1/MISC_CHARSET.
Having worked in the airline industry, I can tell you, this is not a case of airlines making a poor assumption. Airline systems are generally incapable of dealing with ASCII, let alone Unicode text. They typically use EBCDIC. That's why you never see lower case characters in your name on a boarding pass.
That makes a difference when you get to the airport and they won't let you on the plane because your ticket and your id don't match.
Does this actually happen? Because given the fact that airlines have been incapable of printing names that don't can't be described by EBCDIC since, well, forever, I'd assume that they're willing to ignore cases where the name on your ID contains characters that their computer system literally cannot represent.
> Does this actually happen? Because given the fact that airlines have been incapable of printing names that don't can't be described by EBCDIC since, well, forever, I'd assume that they're willing to ignore cases where the name on your ID contains characters that their computer system literally cannot represent.
Do the screeners know to this? I'm willing to bet that a screener would deny you based on a name mis-match. What do the screeners know of EBCDIC?
There are two classes of people who can deny you access to aircraft: (1) airline employees (gate agents, check in agents) and (2) TSA inspectors. No one at an airport will know anything about EBCDIC, but every single airline employee will know about the fact that they're crappy mainframe system isn't smart enough to handle uppercase characters, let alone spaces, hyphens, umlauts, accents, etc.
As for TSA staff, it is important to realize that this EBCDIC issue is not a problem that only affects one airline: it affects almost every single airline, including all flag carriers that I know of. The airlines have a pretty tight working relationship with TSA; note that they send electronic messages to the TSA describing their customers when people book tickets and just before departure so as to give the TSA the opportunity to prevent them from flying. Given that tight relationship, I'm pretty sure that the TSA knows a great deal about the limitations of EBCDIC and has informed their staff about the limits of what a boarding pass name field can possibly represent.
Keep in mind that screeners will let you fly even if you have no ID on you.
Anecdotally, I've never had issues with tickets that had my partial name on them in the past either ("Matt" vs. "Matthew") although this may have changed now that TSA requires full names and birthdays (I always use my full name now).
Oh Patrick, as usual you got me to thinking. Then it hit me: the "Name" issue isn't that much different from the "SKU" issue. (SKU is short for Stock Keeping Unit, aka Part Number or Product Number). I have had to deal with this everywhere that has SKUs. No one does it well, but by slowing down and thinking about it, there's almost always a decent solution.
There are 2 ways to assign SKUs, sequentially (start with "1" and increment 1 for every new SKU) or not sequentially. I have never seen anyone do it sequentially. (Although some excellent systems keep 2 SKUs, one of which is sequential and is used as the primary key and for all indexing. This is the best way I've ever seen to handle SKUs that change, but that's another story.)
Almost everyone wants a smart or semi-smart SKU. So that by simply looking at the SKU, anyone can tell what it is without reading the description. You know, the first digit is Commodity Code, 1 for shoes, 2 for pants, etc. Then another digit for color, another for size, etc. This works well until you have ten colors; then you need 2 digits or alphas.
But wait, there's more. Let's put hyphens (or some other delimitter) between the product descriptors and the vendor data, manufacturer data, and customer data.
So now you've covered any possible product with your super slick smart SKU naming system.
Until something comes along that isn't covered. (Now we have military items with 14 other considerations.) So we come up we a second totally different scheme. Then a third. Then a 4th, etc. So now you can tell anything about an item if you know which scheme it falls under.
But wait, there's more. You should be able to enter any SKU into a form field regardless of Smart SKU scheme. (If the first digit is "9", then use Smart Scheme 3. If there's a hyphen in position 3, then use Smart Scheme 8, etc.) Your form logic should be able to intelligently guide the user based on the rules of the template or scheme.
I have built apps where users can design and build their own Smart SKU templates, which are then used to enforce compliance and guide operators. These have generally worked pretty well.
Is there some way to do the same thing for human names? I dunno, but now you got me to thinking about it. A combination of standard templates and custom templates oughta cover most possibilities. Some basic logic with optional pop-up forms which uses the templates as parameters should work. Something to think about...
Aren't URI:s an attempt to encode this type of hydra-like schema?
Not that URI:s are that great for showing to humans, but if we added some visualizations template plug-in system for parts of it (similar to how many web servers map URLs to renderings of data resources).
The whole point of the article is that there is no such thing as the specific components of the person's name... that is the assumption that is wrong. A person's name is whatever they or their parents chose. You cannot reliably derive meaning from any part of it.
Then there's William Shakespeare, who spelled his name about 8 (IIRC) different ways. I guess it was a mark of pride for a literary man to be able to write his name however he wanted to, while less educated folk would have to write their names from memory.
This is all very interesting (and even humorous), but I'd appreciate something more constructive. How do we implement a name field on a website in spite of all this weirdness?
* If you have a requirement for a legal name, ask for the legal name as one field. Don't artificially split it into two fields. Make it long enough that you won't truncate.
* If you have restrictions on what you can store, show the user what you will be storing their entered name as and make sure they're okay with that.
* If you must have multiple fields for a user's name (given, familial), allow one of them to be optional.
* If you want to use a familiar name as a nice touch, ask the user what they want to be known as (e.g., a "nickname", but I wouldn't call it that).
* If you have to interface with other systems that do have artificial restrictions, explain this to the user and give them an opportunity to override the name used for those systems if it's important.
Obviously, you don't want to ask too much up front (to fight what I'm coining as "form-fatigue"; you know, where there's so many questions on the form that you just give up signing up), so you may need some decent heuristics to give you reasonable default values for these things that the user can override if they want.
I would suggest using 2 distinct "names". The first is as you suggest - an unlimited text field for their "real" name which is simply stored as entered. Then have them choose a "unique identifier" for the site to use, like the names used in email systems.
True story: I once told a customer that their requirement to "sort by last name" was likely impossible to satisfy without reworking the system, given what I knew about their user base. It turned out that I was right: the majority of their staff was Japanese, who expect Japanese lexicographic sort. However, they also had visiting professors from overseas, who largely could not understand Japanese lexicographic sort. (There are actually two common lexicographic sorts in Japanese, based on two ways to order the sounds of Japanese. We use the "table of fifty sounds" method, under which Tanaka comes after Aki but before Sato.)
My customers said "Fine, alright, two sort features. One for Japanese, one for foreigners."
They were less than happy when I told them that foreigners haven't agreed on lexicographic sort, either. (To use one example I'm familiar with, in Spanish, "ch" is one letter, so Chisako comes after Consuela.)
Oh, there is a separate right way to order prefectures. (Which I learned about in an email from a coworker saying "Patrick, come on, use some common sense next time. Have you ever seen prefectures listed lexicographically before?!")
Yeah, that was an interesting adaptation from the Spanish language academy to the use of computers.
At the time there was some public debate similar to the "pluto is no longer a planet".
"hack up some system to split the single name field that doesn't work very well."
You've correctly identified a problem, but misassigned the responsibility. The problem with the single-name split is that it is impossible, full stop. When you forcibly try to do the impossible, you always get bad results.
You appear to be proposing that we force the users to enter first and last names, then we can trivially sort them by last name. But for the exact same reasons you can't write a name splitter after the fact, you can't have the user break their names into "first" and "last" either. You haven't solved the problem of there being "no last name" by changing your input form. You've just moved it around. Now it irritates the customer instead of you, which is often a bad trade, and sort by last name still doesn't work because customers for who a first/last split doesn't work have fed you one or another variety of garbage data. Garbage data which is now even harder to find; at least when you had a one-word name you had a clue that there was no first/last split.
In this case, you should dig deeper. If the customer says "give me sort by last name," the actual requirement may be "give me an easy way to find a person by last name." In this case, a general full-text search may be the best solution.
This is the right way to do it. Due to user error. I've had problems checking into a hotel until the receptionist realize my first/last names are switched.
The receptionist SHOULD be shown a list but ALSO wants to be able to search for me if I don't show up in the list (and may ask for my ID/get a particular spelling).
(It was a comp'd room in Vegas, so I have no idea where the mistake was made, but this can't be a unique occurrence.)
My project has a "sort by last name" requirement, but we wanted to accomodate "funky" names. Here's what I did:
There are four fields in the database:
first_name
last_name
display_name
primary_email
All 3 name fields are optional. The email address field is required, but you could use a customer ID or username if that is more appropriate for your app.
The name are never accessed directly, except on the one form where you can edit these fields. In general, they are accessed by these two helper methods:
@property
def name(self):
if self.display_name:
return self.display_name
if self.first_name and self.last_name:
return self.first_name + ' ' + self.last_name
return self.primary_email
@property
def last_first(self):
if self.first_name and self.last_name:
return self.last_name + ', ' + self.first_name
if self.display_name:
return self.display_name
return self.primary_email
Then we do a culture specific case insensitive sort. This should cover most cases pretty well....... I hope :-)
For a Western company, that may be: use a first name and a last name field which works for 95% of our customers and inconveniences the other 5%. But that 5% is already used to it.
Which of course goes against the spirit of the original article.
Yeah. I think I would personally go for two fields -- first name and last name -- with a hint that 'if unsure, use lastname' so that the rather common(?) 'sort by lastname' would make at least some sense in most cases, and the requirement that at least one of the names must be non-empty.
It's the further processing after that which I'm interested in. Sanitising the input should probably be rejecting/replacing with a space any non-printable code points or weird whitespace characters. But what about sending emails? "Dear <name>"? There are probably other situations where this is a problem.
I'm not sure of all the details, but the Unicode people kind of screwed up when doing Japanese. From what I understand there are several older variations on characters not in common use, but that occasionally show up in names (and old Japanese literature), that can't be encoded in Unicode.
I cannot discard the possibility my name was originally expressed in Hungarian rovásírás. AFAIK, it has not been assigned to any Unicode region. I was taught rovásírás as a kid, by my grandpa.
I can live with the romanized version. It's been in use for more than a thousand years and it's a bit late to complain.
There's no Unicode code point for the unpronounceable symbol which served as the name of The Artist Formerly Known As Prince, before he again became known as Prince.
There is also the guy in the credits from the movie Fargo who uses a Prince symbol on it's side.
As far as the Artist Formerly Known As Prince symbol lying on its' side with a happy face in the middle, in the credits: "I'm the storyboard artist formerly known as J. Todd Anderson. That's all I can say about that." It's a private joke between J. Todd and the Coens. Prince and the Coen brothers are both from Minneapolis. (Dayton Daily News: 3/22/96)
Some countries, like mine (Uruguay), have a unique identifier for people - in Uruguay it's called "Cédula de Identidad" which would translate to Identity Card.
Some case #46c: The person is a citizen of a country with a national identity card, yet resides in another country wherein he or she has lost that identity card, cannot return to the country of citizenship to replace said identity card, and country of citizenship will not replace the identity card outside of that country.
There's a difference between working with what you have, and blaming people for using the 'wrong' characters in their name. From the post I was replying to:
> And I am not even going to say sorry.
With comments like this, one comes off as a douche, which isn't exactly going to engender trust with potential customers/clients. Whether it's 'your' fault or not, how you deal with your potential customers/clients is what really matters.
I had a friend whose legal name here in Canada was Juliana Juliana. She came from Indonesia and didn't have a family name. They told her she needed to have two names for her visa application so she just doubled up her given name.
Mongolians use their Father's name as their surname, but they write it first. So if my name would be Brad's Josh. While you may think that they should just write it backwards, that isn't how they write it, and they shouldn't be required to translate their names for our systems necessarily.
The point he is making is that people make one assumption or the other, deepending on the culture in which they live, when actually it varies.
In the western world, we assume that someone's family name is their last name.
In countries such as Korea, people assume that someone's family name is their first name.
Either assumption can be wrong if you're dealing with international customers. Some people might not even have a family name.
No, these are falsehoods people who define the requirements believe about names. That's very often not the same as the programmer, especially at large organizations.
If the programmers actually adding the constraint to their systems didn't also believe the falsehoods, they would have written better error messages than, in essence, "You typed your name wrong!".
I think the real lesson here is to to identify those assumptions that may be incorrect for the culture your app is targeted towards and make reasonable accommodations for those cases.
The most common ones are probably the easiest to deal with...long names, only one name, punctuation in names. Just relax your validation requirements. We just took care of 99% of names not common to western culture. But I don't think if you are designing an application for, say, the Olathe, KS youth rec sports registration website that you need to be too concerned with folks who have names that have characters not mapped in Unicode. If a person's name is so unusual that it doesn't fit even culturally relaxed input requirements, then they've probably already dealt with that problem before.
I heartily agree that "think what you're assuming" is a good take away here, but can't support "think what culture your app is targeting", because I've seen that virtually invariably blow up straight in the teams' faces.
Many Japanese people think racial/cultural homogeneity is practically a national trademark, and this issue has bitten nearly every Big Freaking Enterprise system I've ever seen in Japan. Do you think your global-facing web app or small American town is going to be less culturally diverse than Japan is? That strikes me as highly improbable.
I think you're missing my point. Having relaxed standards, such as I outlined, will account for nearly all special cases for a target audience that is reasonably homogeneous. Of course if you are designing a global facing application, then yes, you need to account for a wider range of possibilities. But for many applications designing around cultural norms, but being liberal about what you accept, is perfectly acceptable where the benefit of accommodating for low-probability cases does not justify the costs.
And, like I mentioned, in cases for applications targeted towards a homogeneous group, for those that have names so far outside the cultural norms that even liberal standards cannot accommodate them, then they've probably encountered the same thing already before, and they've come up with a way to adapt.
Allowing a liberal range is like me going to France and asking for ketchup with all of my food. But having a really unusual name, like one with unmappable characters, is like me going to Riyadh and expecting them to serve booze with my meal. Should I be allowed to drink there just because it's acceptable in my culture? No. It's illegal there, and I just have to adapt whether I like it or not.
Again, I emphasize, what I am advocating is for applications with a culturally homogeneous audience, not global, multicultural apps.
Japanese people living in the US are no doubt perfectly comfortable writing their name out in Latin characters for the benefit of the barbarians they live with.
Many Japanese professionals who have call to be in the United States -- but not all of them! -- will write their name something like YAMASHITA Taro, which is your clue that you should be calling him Mr. Yamashita rather than Mr. Taro. Or Dr. Yamashita, in the case of my client who was not actually named that.
Some of them, including at least one prominent politician and several people I represented in a professional capacity, are adamant that reversing the order is incorrect.
(Unsurprisingly, many Americans are unaware of this convention. Sadly, many of them persist in being unaware of it even when it is written on the meeting briefing and explained verbally right before the meeting starts. sigh Twelve corrections in 3 days -- my client was not happy for that trip.)
Does your hotel, car rental service, university, etc, handle this case correctly? We blazed a path of frustration through Chicago and Michigan last time.
Further fun: a Taro Yamashita born and raised in the United States (or any other Taro, for that matter), and present at the same university for the same conference, might request the exact opposite treatment! And if you spell his name as YAMASHITA Taro on the meeting agenda in defiance of his preference, you're doing it wrong!
But who's really doing it wrong? If someone expects another culture to conform to their own culture's preferences when a guest there, perhaps they're the ones making the error. I actually attended a conference in Japan that wrote all names, including mine, as "FAMILYNAME Givenname" on the program. I personally write and prefer "Givenname Familyname", but I don't see it as a cause for offense. As a guest in their country, I consider it their prerogative to go by their local conventions.
I mean, wouldn't it be pretty chauvinist for me to get offended at it or demand that they follow Western name ordering? It feels like it'd be in the same category as complaining that they don't serve my favorite American food at the conference, or that the cars are driving on the wrong side of the road.
The canonical source for the correct way to spell/write someone's name is that person. You should never tell someone "No, you're writing your name wrong".
I disagree. People's preferences should get some deference, but not ultimate deference. I don't have to accommodate Prince's claim that his name is properly written as a graphical symbol, for example. It's perfectly fine to tell him to pick a name that isn't a graphical symbol.
Between countries, I think generally the right approach is to use the country's naming customs, and adapt foreign names to them, to the extent reasonably possible. In Greece, for example, it's customary to transliterate people's names into the Greek alphabet, especially if the source is going to be read by Greeks (newspapers, etc.). The person known as "George Bush" in the United States is more commonly referred to in Greece as "Τζωρτζ Μπους", for example--- despite it being a rather ugly transliteration in this case, due to a bunch of the consonants not existing in Greek.
Are you really arguing that this Dr. Yamashita (or Mr. Bush) can tell them they can't use their own alphabet in their own country, because that's not how he likes his name written?
My first name is spelled with a $10 bill. It's pronounced "Gavin." My last name is a picture of a flower. If you represent it at less than 1000x1000 pixels or with washed out colors, you are insulting my heritage.
My middle name is a musk ox.
How far does politeness dictate that you have to go to accommodate me?
I think that the bar should at least be set higher than:
"Your name does not fit into a Western firstname/lastname
format with only ASCII characters, please choose another
name."
This is on par with expecting everyone to speak English just because you speak English. By the very setup of the form, you are implying that your way is the 'one true way' or at least that you don't care about people that don't fit into your pre-defined set of expectations (i.e. "Don't have a firstname/lastname? I don't care about your business! Your money is no good here!").
How hard is it to just have a 'Name' field that supports UTF8? Sure it's not a 100% solution, but it's a lot better than the 20% (or less) solutions that we have out there right now.
Most of the items on that list are pretty realistic, but there are a few you can safely ignore. For example image the response you'd get trying to sign up for a bank account (in person), if you told them you didn't have a name.
Define "safely". If you write code for a hospital that handles this case poorly, when a premature infant is found abandoned in a toilet, you might end up debugging from the ER.
Define "poorly". Hospitals I've worked with just tend to put in codes for unnamed babies (such as "NBM", "NBF", surname if known, etc.) for the name fields of legacy programs. If you over-heavily relied on the names in the first case, you'd have a big problem with multiple baby John Smiths in the maternity ward.
That's all well and fine if you're building a system for a bank. Not so much if you're working on a project for the UN High Commissioner for Refugees or similar agency.
So, yes, there are a few you can safely ignore. The problem is the set you can safely ignore is probably different for almost every job you'll have.
Calling a statement a "falsehood" implies there's some truth that can and should replace it. The statements he lists aren't "falsehoods" or "truths", they are approximations - operating assumptions.
If your operating assumptions are true enough for a given purpose, there's no reason to change them. If they deviate enough from to generate problems, they will need to be replaced ... with other approximations (not with "the truth").
-- This is why apparently simple and "finished" programs alway need "maintenance" - even the most seemingly simple assumption need tweaking as the world's conditions change.
I'm sorry, but if your name doesn't fit into Unicode, you need a new name. You do not have the right to your own character encoding and font (on someone else's website).
Anticipation of this comment was, in a nutshell, the reason the Han unification debate in Unicode got so acrimonious, and why lots of Japanese people carry a chip on their shoulder about it to this day.
"Sorry, grandma, I know you've been sort of attached to your name for the last 80 years, but the white folks find it inconvenient for their computer systems. Don't worry, they promise they'll make something close for you."
Many of the clients of my ex-day job are married to legacy encodings like Shift-JIS precisely because they do think that their customers and students have a "right" to having their names written correctly. (Most of them also make a total hash out of foreigner's names, which I spent a good deal of time correcting. As far as I know my office probably still uses my name as test data, since it screws up about 80% of the systems we had, and it was cheaper to work around or patch than it was to fire me.)
Unicode is incomplete, and this is the fault of people who it doesn't serve? Developers don't have the right to dictate how people write their names just because they have created a poor implementation.
The best solution is suggested in the other link: apologise for the technical limitations and offer a workaround. Don't demand that the world breaks solely to fit into your technical limitation.
Similar, except that there are a lot fewer people whose names can't be written in Unicode. Hell, the vast majority of names fit in the Basic Multilingual Plane; you could probably get away with using a fixed 16 bits per character.
If your name doesn't fit into Unicode, get a nickname. Bonus points if your nickname fits into 7-bit ASCII.
Right, but what's the point of Unicode then?
What it's supposed to be: "Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language."
What it is: The above, plus "Well, almost every character. Sorry we couldn't
fit your name there, perhaps you should consider a nickname."
Say you want to order a package from somewhere. How does getting a nickname help? How do you explain it to the post office?
I know it sounds like nitpicking, and it probably is, until it affects you personally. I've had my share of "name-mangling" and my name does fit into Unicode (not into ASCII though).
You have a point, but what I'm most concerned with is balancing these two things:
1. I don't want to inconvenience people with "weird" names.
2. I don't want to burden application programmers too much.
Requiring everybody to have a simple ASCII name would be convenient for programmers, but would be a big hassle for people whose names don't meet those requirements. "Be in the Basic Multilingual Plane or get a nickname" is a policy that, I think, provides a reasonable balance. Of course, supporting all of unicode isn't really that much harder, so I think that's a better balance.
There are people who insist on a particular capitalisation of their names (i.e., all lower-case, irrespective of grammar rules) which is surely no different from using characters outside the Unicode set.
I'd be very careful telling people what "rights" they have around their names.
EDIT: you can't really control what other people call you but you certainly can control what you call yourself.
"which is surely no different from using characters outside the Unicode set" - not when you are thinking in terms of making a program that will support it. A program will happily render Mr bob smith's name in lower-case if that's how he entered it. But Ms non-Unicode-squiggle is out of luck, and I don't have much sympathy.
Yea, how could 80 year old Ms non-Unicode-squiggle's parents have been so cruel and thoughtless that they didn't check the unicode spec before naming their child. Screw them and their lack of foresight and time machines.
Sure; I was reacting to the specific concept of a "non-Unicode-squiggle", implying that the character has no known mechanism for encoding it (i.e. it's not simply an SJIS squiggle) and would likely have to be submitted as a custom bitmap/vector path. That's a good place to draw the line.
The whole point is that there are characters in use in certain languages that have accepted mechanisms for encoding in some non-Unicode character encoding system used for that language, but not in Unicode.
This problem isn't handling names, its validating them and trying to do clever things with them, which is rarely, truly, necessary - especially if its a customer's name that they see on mail, bills etc and that they have the ability to correct... they will validate it for you.
I think the real bad assumption is that validation is necessary or even desirable - applying the technique brainlessly to names is the root cause of this problem - you don't really need to make any assumptions.
Also, FYI, place names contain all sorts of archaic spellings and characters, for the same reason: they are identifiers, and identifiers cannot and should not be normalized.
Incidentally my friend RJ just experienced a related problem on Facebook. After he got married his name (which includes a hyphenated family name) has 4 capital letters on it. Facebook complains that his name has too many capital letters and will not let him use it. He tried all-lowercase but Facebook automatically capitalizes each word making it look even worse.
It gets annoying for handles too. I usually just use "vsync" but certain systems smash the first character upcase and it looks ugly as "Vsync". So I try "VSync" and it smashes all characters but the first downcase.
Not to mention systems that require usernames to have 6 characters or more, sigh...
I go by my middle name (my parents thoughtfully gave me my father's first name and even middle initial, which has caused no end of confusion for many a credit bureau over the years). Unfortunately, the State of Indiana's birth certificate system assumes beyond any possibility of override that (1) everybody with children has a first name, (2) that first name has no spaces in it, and (3) the middle name is insignificant. So my kids' birth certificates have my dad's names on them as the father.
But hey, who am I, a mere parent, to say what my name is?
The real world is full of organic detail that is difficult or perhaps impossible to capture in full in a software system. A name should be just a string. If there are business requirements for sorting or name-of-address (e.g. "Dear Mr. Jones") they should be done with heuristics, with human intervention invoked in sticky situations if necessary. Sure, it's difficult, but you know, a hundred years ago that stuff was all done on an individual case-by-case basis by human beings; if your business assumption is that it can't be done cheaply enough without eliminating human intervention, perhaps you should rethink your assumptions.
Otherwise, let me assure you you're irritating every customer whose name doesn't meet your arbitrary rules - in exactly the same way that Google pisses off everybody who needs human support. Everybody thinks that sucks. So at least you should get people's names right, using human intervention if you have to. (And addresses, too, but we just had an extended thread about that, like last month.)