"For instance, a database of hobbyist club members could include uniqueness on the two columns first_name, last_name of members. Duplicates are likely unintentional at this scale, and if necessary the constraint can be dropped. Until an actual conflict happens the key is a reasonable precaution."
Absolutely do not do this.
People have names that are duplicates. A situation where someone is unable to join a club because their name clashes with an existing member is not OK.
Expecting a club administrator to be able to drop a uniqueness key from their database in order to resolve this situation is not a reasonable solution!
That's not the only point of the article I really disagree with:
> Here are some values that often work as natural keys:
> login names
This makes user names static. Which would annoy people if they change name (eg marriage, gender change, nationalization change (Chinese name vs English name)) or they just want to update their online handle.
> email addresses
Same problem as above. What happens if someone wants to change email addresses (eg they used a work one and changed jobs, or they got a new email to combat spam on the old one, or a new email address to reflect a name change (see above))
> mac address on a network
This makes more sense than above, but MAC addresses can be spoofed or even just used as a proxy, so you would need to be careful that the information you're uniquely storing against the MAC address is intended to be unique. Most use cases that immediately springs to my mind wouldn't follow his unique rule but I'm sure there will some examples that do.
> (lat,lon) for points on the earth
That might work if you also included an elevation point as well but office and apartment blocks (for example) will often have different tenants at the same latitude and longitude coordinates.
> Some people seem to struggle with the choice of “natural” key attributes because they hypothesize situations where a particular key might not be unique in some given population. This misses the point.
Actually no it doesn't. That's entirely the point as it demonstrates good planning and future-proofing rather than setting arbitrary business rules that are hard to rectify if and when you do run into those particular edge case scenarios.
RFC 2392 Message-ID values are supposed to be unique, but they may not be if the software generating them is buggy. I remember reading somewhere about some email program that assumed they were unique (in effect using them as a natural key). If you were unlucky enough to get messages with identical Message-ID values, all but one would disappear.
> This makes more sense than above, but MAC addresses can be spoofed or even just used as a proxy, so you would need to be careful that the information you're uniquely storing against the MAC address is intended to be unique. Most use cases that immediately springs to my mind wouldn't follow his unique rule but I'm sure there will some examples that do.
As one example, there was a bug in one version of CyanogenMOD that caused many phones to broadcast the same MAC, even when connected to Wifi. In some cases, this would cause different devices to kick each other off of a connection. See e.g. https://forum.xda-developers.com/tmobile-lg-g3/help/2x-lg-g3...
Does it? I can understand why making it a foreign key would make it static, but why would making it a normal key make it static? It seems to me that making login names unique would be preferable, same with emails.
If i understand the article correctly it's discussing using natural keys as primary keys. Eg if you're using login_name as a natural key then that's presumably replacing a more traditional user_id. So it would be hard not to use login_name as a foreign key in that scenario.
Where as if you still had a user_id as your primary key, you could still have user_id as a unique key (most RDBMS I've used support "unique keys") to enable the business logic discussed but without encouraging it's usage as a foreign key.
I think the article did touch on the fact that you can have multiple keys, but given it's heavy emphasis on finding natural keys over the more traditional logic of incremental integers, I can only assume the point of it wasn't about additional unique keys but rather alternatives for primary keys.
> If i understand the article correctly it's discussing using natural keys as primary keys.
I'm not sure about the author's intent here. In the prior section they dismiss the necessity of primary keys. However, the author rails on about unique keys. I'm not familiar with PostgreSQL, but I assume every reference to "key" is effectively an index with a unique constraint. If that is true then all of the discussed issues with unique names are the same even if the author doesn't care about primary key in particular.
From how I understand it the author does not make any suggestions to use natural keys as primary key. He just defines what is natural key. And he does mention that they may change and it has to be considered.
Actually in the summary he suggests to create column with uuid in each table and use it as primary key.
I agree, I can't think of any system off the top of my head that lets you change your handle, or why you would really want to. If it's a forum or has commenting capabilities, your handle is your absolute identity. If it's not a forum, no one ever really sees your username, so there's little motivation to change it.
> I can't think of any system off the top of my head that lets you change your handle, or why you would really want to.
Pretty much any system with a real name policy both does and needs to let you change your handle, because preferred real (and legal) names change. Now, these systems often use email addresses as login names, but they sometimes also support changing the primary email address as well. In effect, identity is most likely managed by a surrogate key that isn't exposed.
Twitter does (though I don't know of anyone who has). As does Google and Facebook. Github too.
Away from web services, gaming platforms like Nintendo's Switch allow for user name changes. As does the more enterprisey databases like ActiveDirectory (Windows logins), ldap and Linuxes / Unix passwd table also allows for login name changes.
To be honest, I've encountered more systems that do let you change your user name than systems that don't.
For extra fun there was talk today on twitter about several people that have to file Japanese tax returns but have originally english names. There are many ways to convert the name to english, and different systems actaully enforce you to convert it in a different way with various limitations.
This results in them having the system believe they are two people, one of who hasn't filed returns and no easy way to fix it (apparently the easiest fix is to possibly create a company?).
This despite there being some kind of unique ID submitted in both systems, but that ID is not currently reconciled.
Not sure 100% on the details myself but I guess the moral of that story is, duplicate checks on the name itself may not even be a guarantee of uniqueness - even assuming there was only 1 person with that name!
Completely agree. Adding a proper primary key e.g. from a sequence involves very little extra overhead and protects against this. Asking them to drop the constraint in the future sounds a bit silly as that may mean two members get deleted when you intended only one. Plus your foreign keys now have to be maintained. So if a member changes their name you have to update all foreign keys. Also if the hobbyist club then builds a website to view members the URLs can use the primary key e.g. a long / bigint, rather than names. The names may introduce further complexity with having to escape the URL etc. I am all for keeping it simple, but all relations should have a guaranteed unique primary key.
I know the author is talking about a small scale project here, but shortcuts like this (in my experience) just lead to a mess later on; irrespective of the complexity of the problem being solved.
To expand a little on your last point: one of the guide lines I try to follow is expect your system to be successful. If your club is popular then it will eventually have two people with the same name. Don't waste people's time by putting in artificial constraints that will eventually become invalid.
> Expecting a club administrator to be able to drop a uniqueness key from their database in order to resolve this situation is not a reasonable solution!
Especially since if you have any FK relationships, and that was the only candidate key for the members relation (and why have such a bad key if it's not the only one), you then need to add new attributes to create a new key, populate it to all existing roles, change all existing FK relationships and associated queries, on top of the easy part, which is dropping the old unique constraint.
Yes, that's the problem. I understand what he wants to say (basically YAGNI), but imagine the reception desk employee having this problem, escalating it until it reaches a developer just to register John Doe.
I agree with you, but I think you did overthink his metaphorical example a little.
I think the point was that for a 40 people club odd are very very poor that two people would have the same name. And even if so a club manager could still differentiate by adding a middle name or a nickname (in a 40 people scenario).
Of course IF your neighborhood club expand and you need to manage a lot of people you'll have to switch to a better technique. His point was that you must fit to you business case. (But hey some people change name when they marry so... problems can happen fast, but that's mutability issue not uniqueness).
Overall very good article but keeping critical mind as you did is needed.
I'm not convinced that the odds are "very very poor". Some names (like Robert Smith or Maria Garcia) are quite common. And a hobby club is not a random selection of people, members would typically tend towards a specific geographic area, social class or culture depending on the subject. Some cultures have a very limited pool of names, depending on tradition.
Just from looking at a 200 member club I'm part of, I can see at least two name collisions in our current club roster. So absolutely agree with your assessment.
I don't have exactly the same name as my father, and we are not members of the same squash club. So, in the sample so far, there's only a 50% chance of that happening. I suspect that if we enlarged the sample, it would fall much further.
A single counterexample does not falsify the theory that the odds are very very poor!
Ah, this brings me back to the time when I was trying to convince a product manager that it was a bad idea to "validate" email addresses with regular expressions. I failed, and the product was rolled out with the following regex: ^[a-z0-9.-]+@[a-z0-9.-]+\.[a-z]{2,4}$
I used to hold this opinion, but this Stack Exchange post https://dba.stackexchange.com/a/165923/34006 changed my mind. In short, HTML5 defines its own specification of an email address here [2], and notes:
> This requirement is a willful violation of RFC 5322, which defines a syntax for e-mail addresses that is simultaneously too strict (before the "@" character), too vague (after the "@" character), and too lax (allowing comments, whitespace characters, and quoted strings in manners unfamiliar to most users) to be of practical use here.
> The following JavaScript- and Perl-compatible regular expression is an implementation of the above definition.
To quote the DBA Stack Exchange post, "if it's good enough for HTML5, it's probably good enough for you". And if you're using inputs with `type="email"` in HTML5, you already are constraining your emails to that format.
(That said, it sounds like the regular expression you were asked to use was not a good one)
That's an interesting choice of rules. When cutting down the complexity so drastically, do you really need to double the length of your regex just to enforce the rule that dashes only appear in the middle of a domain segment? This version is so much more legible:
I'm guessing he put that in just to annoy people that are easily annoyed. Kind of like how every woodworking video on YouTube is invariably slammed with hundreds of commenters accusing the maker of being literally Hitler over some very small safety detail. After awhile they stop trying and just poke fun.
It does make me appreciate the one guy who can rise above all that, Paul Sellers.
While technically accurate, this is non-standard to such a degree, and represents such a minority percentage of the population, that "don't do this" really is an appropriate response to the user.
I'm curious how often people share e-mail addresses. I used to see it a lot for people who used their ISP's mail system. Most people who are even slightly tech oriented have gmail, hotmail, etc these days, but now I can't help but wonder how many people just stick with the default ISP e-mail.
Mailing lists and shared mailboxes might be used by several people (support@foo.com). Noone uses them to sign up to online services I guess, but the address might be stored in a database and be assocoated with several people somehow.
True people do share. But you have to believe it isn’t. If your system already assumes email must be unique, then your system will make email field as a unique constraint. Otherwise what will you considered as unique?
Phone number, however, should be avoided as an ID. Far more likely to have phone number ownership changed than Google reclaiming your email address and give it to someone else realistically.
If your system already assumes email must be unique...
...then you're not building a system that works with the way users work. Some (very small) percentage of your users will have problems.
I first encountered this problem when I asked my parents to test my first startup. They shared an email account. When they signed up to things they just used the shared account. It was fine until they wanted separate accounts on the same service and it failed. It's not OK to assume everyone has a unique email address and that works as a reference to an individual. The assumption is incorrect. Email addresses are not unique to individuals.
A lot of services fail at this. If you're targeting older people then you shouldn't use a users email address as a username, and nor should you use it as a security factor.
> It was fine until they wanted separate accounts on the same service and it failed.
When it comes to designing a system, I try my best to prioritize this way: security, privacy, and finally usability. A product with the best user-experience may have the most vulnerable design. I say this because we need to balance between security and UX. Pick some constraints which make sense to your system.
If we have to accommodate every corner case, the system will become extremely insecure and unreasonable. In your case, perhaps solve it with "sub-account" if that makes sense for your system. To Netflix, sub-account seems reasonable so Netflix gives that as an option for user to share his/her account with his/her family/buddies.
Actually, sorry if being a little out of bound, you probably should help your parents setting up unique email address for them. They want a separate account for some specific reasons, I assume, so why not help them?
Email was created TO be unique. It is we the users who decide how to use that email address, BUT it is not the system/service designer TO accommodate every use case.
They want a separate account for some specific reasons, I assume, so why not help them?
That's the point - they didn't want different email accounts. They wanted to share one. There will always be edge cases that don't conform to your expectations of how users should be doing things, and your applications probably need to work even in those less-than-ideal circumstances. If your app is going to be used by older people who might share an email address then I wouldn't make the account process require a unique address for every user.
I actually go a step further these days and try to make things that don't even require any email address. The less personal information required to create an account the better. So long as users are aware that they won't be able to recover an account if they forget their password there's no real need to ask for an email in the first place.
How did they want to identify themselves upon using the service? Email plus name? Is it worth making login more complex for everyone else?
Or do you expect old people that share an email account to gracefully handle unique usernames? I'm pretty skeptical of that one.
And "So long as users are aware that they won't be able to recover an account" sounds like a joke. Maybe one in twenty people will truly internalize that, if you're lucky.
The bit about duplicates likely being "unintentional at this scale" is an important part of the use context too though. There are many cases where allowing duplicate names increases errors and confusion for users -- not only because of unintentional duplication, but because users themselves often think of names as unique.
To me, surrogate keys are just about always preferable over natural keys, but it's also important to think about the "naturalness" of natural keys from a user's pov. In some contexts, using a surrogate key while also enforcing unique names (with the rare possibility of a case where an admin has to go in and do something weird like add a genuinely identical name with a "2" after it, say) can be a better trade-off.
Absolutely do not do this.
People have names that are duplicates. A situation where someone is unable to join a club because their name clashes with an existing member is not OK.
Expecting a club administrator to be able to drop a uniqueness key from their database in order to resolve this situation is not a reasonable solution!