A while back, when this came up on HN previously, I wasn't sure that all of them were actually falsehoods, so I went through each and listed reasoning for why I though they were or weren't:
A few explanations you can add for more-falsehoods.rst:
> 15. Unix time is the number of seconds since Jan 1st 1970.
The section you quote from Wikipedia shows why this one is false: leap seconds. Similar to how the number of hours between 00:00 and 3:00 in most of the United States depends on whether the timezone remains constant, springs forward, or falls back, the number of seconds between two Unix timestamps depends on how many leap seconds were inserted or deleted in that time frame.
> 16. The day before Saturday is always Friday.
This one is true only within the assumption but I'm not sure if the falsehood is literally false. When Alaska did the calendar change it had two consecutive Fridays (October 6th & 18th, 1867). This gives us the day after Friday not being Saturday but technically not the day before Saturday not being Friday. The falsehood still holds in spirit (outside of the assumption), of course, and it's possible somewhere else had it be true in practice as well.
> 27. The weekend consists of Saturday and Sunday.
The weekend differs across the world. Friday and Saturday is another popular one but Brunei gets a special shout out for having a Friday and Sunday "weekend" with Saturday being a working day.
> 59. DST is always an advancement by 1 hour
As far as I know, this is currently true if you aren't dealing with historical datetimes but Singapore will mess you up with historical data. They had a really odd case[1] where they went on DST by advancing 20 minutes and then ended DST without changing the clock at all.
Oh, god. It's like a perfect storm for a crippling epistemological dilemma. "Nothing can be known, not even this" about sums it up. I guess we're stuck with working from probabilities. Carneades would be proud.
Going further, if at all possible, leave validation up to others. For example, don't validate delivery addresses at all if your shipping company has a validation API.
If you must do validation yourself, put in the time to research it and get it right, don't just bang something out based on your own experience of the problem domain.
Edit: one more: if you must validate, keep this separate from how you store the data, so you can more easily fix your mistakes or adapt to changes.
So... what are we supposed to do? Every abstract model of reality has to make some simplifying assumption. Or should we curate some sort of general libraries that can handle all the exceptions to the rules if they are given enough context? Many of these libraries already exist.
This is like trying to accept all valid email addresses according to the RFC. Even when we have a spec, there's a maddeningly complex array of possibilities, but it's ok if for almost everything we accept some subset of it. It would be madness, for example, if we forced government employees of a particular country to be able to deal with all possible scripts of the world, including Prince's symbol, just to account for all possible ways that humans name themselves.
This is remarkably easy if you aren't writing an MDA/MTA: Treat the email address like a blob. The only verification necessary, if any, is whether the user clicks the enrollment link or whatever.
The same could be said about addresses. I love printing out a Fedex shipping label on-line, and then seeing how it mangles French diacritics. Then I have to hope the package will make it to the right place.
The way American (or maybe English speaking) companies handle diacritics is shameful... but don't postal codes serve as a kind of check against incorrect or ambiguous place names?
This annoys me, as I've had to deal with a lot of corrupted data. So when I moved from the UK to "København Ø" I experimented on all the change of address forms, for banks etc.
Many could handle the characters, but where I sent a paper form the person entering the data didn't know how to type it.
This must be a problem for somewhere like Armenia, where I have seen people really struggling with the English alphabet. Education was Armenian and Russian for a long time.
(There are postcodes in Denmark, but Ø is the first letter of East, so is a very broad code, in a way.)
Postal code in France is an effectively an area code (which can be quite large) so it does not help you to find the house/building, the main goal of it is to dispatch mail.
That documents that not all addresses have postal codes. This is another example of the real world existing in order to give programmers kicks in the head.
Most of these are fundamentally rooted in "That understanding how to issue instructions to a computer in some way correlates with high intelligence and excellent general knowledge."
I think that's a bit harsh; it's more like "If you know how to issue instructions to a computer, you can derive anything else about the world from first principles."
I actually think these lists should be called "Falsehoods designers/PO's/etc. believe about x". As a programmer, you may have to neglect these complexities sometimes, even though you know about them and have made the case.
I think some of it too is herd logic. Oh the number of times I've gotten a PM or QA to say "You should validate this field in this particular way..." and the only response to "Why?" is "Because everyone does it that way."
Yes, French indeed, well-spotted ! I also enter the last name in capital case, that's part of writing it like this 1000s of times in school.
By the way, for people reading this, from my own experience, at least 30/40% of French people will put the last name first in the form, whatever the text says, so to get better data, the best thing is to put them in the order (last name / first name) for them to reduce the number of errors.
I've always loved the "Falsehoods Programmers Believe About Xxxx" articles. There's always that junior developer who says, "We need a Xxxx handler! That's easy, I'll write it myself in an hour!" (where Xxxx is either a time, address, name, etc., something with a huge number of tricky potential edge cases). You can just watch the bugs jump into the program as he types. Then you get to testing:
Me: Try this test.
Dev: Oh, didn't think about that one, let's add that case.
Me: How about this?
Dev: What? OK OK, let's support that too.
Me: And this one...?
Dev: Wait, what? How is that even...?
The code is about 200 lines now, and we've blown the afternoon, and it still doesn't properly handle Xxxx, so I'll forward him the "Falsehoods Programmers Believe About Xxxx" article.
Dev: Oh crap, maybe I should have looked for a tried and true library that handles Xxxx....
In general, in Europe "0" is national dialing prefix, leading to numbers written in the format of:
+44(0)870123456, meaning dial 0870123456 from within the UK, and +44870123456 internationally to reach the same number (i.e. "0" goes away in international format).
However in Italy, leading "0" in the national dialing format is an integral part of the phone number itself (only for non-mobile numbers); for example:
(+39)051234567 meaning 051234567 when dialing from within IT, and +39051234567 internationally (i.e. "0" stays in international format).
EDIT: Mobile numbers in Italy don't have leading "0", and could only be dialed in full/international format either within the country, or internationally.
I think the +4412345... form, or +39012345... work in all situations. But, outside small countries many people will be unfamiliar with a number in this format.
Additionally some people believe that dialing a +XX, or even regional number will cost more even if they are in the same country/region. Which, at least in the UK, at least with common service providers, is not true.
This is my pet hate. There are too many places which require a phone number and won't take no for an answer. Some of them you can lie to, entering something that's obviously invalid like 0, but others do rigourous validation and simply won't let you enter something that's not a phone number (Google Play is an offender here). I usually end up entering 012345678, but that's actually a valid UK number, and I don't like doing that. But you have to wonder about any online form which requires you to lie to complete...
My usual solution here for the most stubborn web forms that won't take no for an answer is to find the contact number of the organisation that is demanding a phone number and I write their own number on the form. If you want to phone someone, try phoning yourself!
In North America, I've always used "<valid area code>-555-1212", which is one of the numbers for directory assistance. The area code may be varied based on whatever location seems plausible. Unless the service insists on sending a text or verifying the number with your credit card company (the latter seems less common now than it used to be) there are no problems. I wouldn't worry about "012345678"; if they didn't want that number they'd probably have changed it by now. Maybe they want "867-5309"?
The problem with fictitious numbers is that they normally don't look invalid. I don't want to supply them with a contact number which merely doesn't work; I want to supply them with a contact number which is obviously not a real number at a glance. Hence why I use 0 where possible.
Ideally I'd prefer them to allow me to leave the number blank, of course.
The bandwidth of POTS is limited to about one decade, from 300 Hz to 3.3 kHz.
Often you don't notice the poor quality while using POTS. But if you ever listen to an NPR show where a journalist is calling into the studio over POTS, it's quite jarring how bad it is.
Similarly with credit card numbers: most credit cards have spaces between groups of digits in the account number (as it appears on the card), but many web forms will reject the number if it has spaces in it. (I find it much easier to verify that the number I typed into the form is correct if it looks like the number on my card.)
Or web forms will reject a phone number if it has non-numeric characters like "-", which many people would probably think of as an integral part of their number.
Post codes aren't quite the same thing as Zone Improvement Plan codes. ZIP codes are just the peculiar implementation in the US of the general post code concept.
An important difference between ZIP codes and Canadian post codes is that, since there are many more possible post codes than ZIP codes, post codes tend to cover much smaller regions than ZIP codes. In many cases, only a handful of individuals up to a single individual may have a Canadian post code. Thus, publishing Canadian users' post codes can severely harm Canadians' privacy, as you may as well be publishing their exact address.
I was scrolling through, feeling largely unsurprised, when:
14. In Israel, certain advertising numbers start with a *.
Woah! Definitely didn't see that coming; I can certainly imagine my( prior )self storing/passing phone numbers as some integer type.
But along with another user's comment that in Italy, leading zeroes are required, it's clear that (at least in any international application) phone numbers need to be strings.
If you store phone numbers as integers (you normally should not), remember that leading zeroes are significant. 0123 is not the same as 123. But as integers they normally are (And make sure your parser doesn't parse 0123 as octal).
In general, only store things as integers if it might be meaningful to do math on them.
I saw a database once that stored phone numbers as ints. elsewhere in the software, they tried to take the mean of the wrong column. If the phone numbers had been stored as strings the error would have been obvious. As it was, the error survived in production for a disturbingly long time. The users persisted in thinking the output was meaningful even after the problem was discovered.
Also my U.S. phone number used to be 703-6XX-XXXX (digits omitted for privacy). That's not going to work if you store it as a 32-bit integer (when I was a dumb college student I made a similar mistake, and spent a couple of minutes scratching my head over why my number kept changing to 214-748-3647 every time I saved it to the DB).
"Somewhere in Dallas, some poor bastard is wondering why his phone rings off the hook with calls for the Nevada Division of Mental Health & Developmental Services, the Jackson County Florida Chamber of Commerce, a yacht club in New York….."
And NASA, apparently
http://sbir.gsfc.nasa.gov/award_firm_list/selection_nid/3513...
I wrote a VB app named Fax Assistant that used Rightfax codes in MS-Word and had to use phone numbers. I stored the numbers as strings/text because sometimes in order to get a fax machine you had to dial a comma and a 2 for some sort of switchbox that shares a voice line.
In St. Louis had 314 and 636 area codes, sonetimes they needed a 1 in front of them based on where you called from. 636 was west county like St. Charles 314 was St. Louis city and North, East, and West county. They used a different area code to make more numbers due to demand.
I'm from Israel and I never realized it's unique, funny. I think it started about 15 years ago as way for mobile providers to create unique numbers for their own services like support. And since those are also very short numbers, usually 4 digits, it eventually became sort of a 1-800 replacement (on mobile only AFAIK) because it's easy to memorize.
US Carriers used to have this and they may still. I have definitely seen signs to call 77 for roadside assistance. I also remember some radio station having a code(star number?) that related. To their frequency.
> A phone number uniquely identifies an individual
As someone who travels a lot I can attest to the annoyance of messaging apps which identify users by phone number. WhatsApp is a good example.
Another terrible design is one time passcodes which are sent over SMS. When a number gets recycled it can become impossible to recover access to a service which you forgot was associated with the number you had at the time of signing up in another country.
Since I often change cell phone numbers because I move frequently (nationwide free long-distance has never caught on in Canada), I like having one number I can re-direct as needed. When I go overseas and buy a local SIM card, I'll re-direct it to that so I can keep in touch.
Telecoms have said they can't take it. The government tax agency can't accept it. Banks won't accept it.
Now when I call in and they ask what my phone number is to verify my identity, I have to rhyme off a string of old phone numbers.
I wouldn't say that nationwide free long-distance hasn't caught on. It seems like most plans from the big three telecom companies now (and have for the past few years) included Canada-wide calling. That's just my anecdotal experience, though.
You also can't assume that the user knows how the number should be formatted/split. When Sheffield moved from six to seven digit numbers, '2' was prepended to all existing numbers, so 0114 XXX XXX became 0114 2XXX XXX. A fairly large number of people interpreted that as an area code change instead of a actual number change (see, e.g. [0]), and I still see signs giving business numbers formatted like "(01142 XXX XXX)".
The first two are actually valid numbers (some of my old numbers, which are still active, but unused).
Btw, an area code doesn't have to actually refer to that area — you can keep phone numbers if moving across area code boundaries in Germany, so a 040 number might actually be used by someone living in 0431.
Or that people format numbers the same way in the same city. In Oslo I see some people writing numbers as XXXX XXXX and others writing them as XXX XX XXX.
Ugh, yes. Just from a UX perspective, I hate it when I enter my number and it's reformatted as I type into a strange (to me) looking arrangement due to this assumption.
Just ran into this the other day. A popular ride sharing app has a huge security hole wherein if you get a recycled mobile number you can easily gain access to the account details of the previous owner of the number and run up a bill on their credit card.
It's just easier to treat phone numbers as a 100 character string in your DB. For validation & formatting, I have just sent it through Twilio's loookup api (which i believe is powered by google's libphonenumber).
Another way that a phone number may become invalid is that its area code can change. For example, if an area code runs out of assignable numbers, it can be split into two or more area codes and some numbers in the old area code will be assigned a new area code.
Alternatively, new area codes can be overlaid on top of existing ones - in NYC, Manhattan has 212, 646 and 917. A phone number that's invalid one day can become valid the next day if a new area code is introduced.
When I was a kid, Portland, Ore. outgrew area code 503 and was issued an overlay area code, 971. To try to prevent confusion, the phone provider, which I believe was still USWest at the time (later Qwest later CenturyLink) made the area code mandatory. All the sudden dialing the 7 digits on most advertisements got you a recording: "Ten digit dialing is now required in the Portland area..."
Many years later, a friend borrowed my cellphone in an area where phone numbers were often dialed as seven digit. Not thinking about my being from out of state, he only dialed seven digits, and happened to get some rather confused person back in Portland. I had no idea my cellular carrier would even allow you to do that!
So overlay area codes can combine with number portability to have some odd results. A number may or may not be considered valid by the carrier, and may or may not get you who you expected, and in edge cases this may depend on the carrier.
At least the main part of phone numbers is true: they are scarce, which makes them the main way to prevent sybil attacks. Facebook accounts are a distant second. Throwaway emails make sybil attacks very common on sites that let you sign up with an email.
Phone numbers in most countries are definitely not scarce. US numbers can be obtained in virtually any area code (excluding 212 and a few more) from between $0.40 and $1.00/month. There are even services that provide temporary throw-away numbers.
Yes but you are ignoring the overall picture. If you factor in all those numbers that can be obtained, it is still de facto close to 1 number per person on average. We are talking about numbers for which an SMS can be successfully delivered and the end-user gets some unique code. You'll find all of those "throwaway" SMS services re-use the same numbers, because they are scarce.
I challenge you to produce any country (link supporting what you say) where phone numbers with SMS DELIVERABILITY are not scarce eg dont need a device or endpoint to be registered with some carrier or can be obtained by the hundreds.
US and UK numbers with deliverable SMS are under $0.40/month each, and can be ordered in bulk for virtually any area code through many providers such as:
An issue I don't see raised here is the complications from Mobile Number Portability, where one may use different services to send text messages to different networks in a country and wrongly rely on the number prefix to identify the mobile operator.
I remember being greatly annoyed at a version of Google Hangouts that would pop up a message about certain phone numbers being too long whenever I opened SMSs from them, and then flat-out refuse to let me reply to them.
In South Africa, we have a lot of SMS services that tack five or so digits onto the ends of normal numbers as unique identifiers, so what I assume was a programmer's false belief about phone numbers did some real harm there.
A related falsehood that is built into many systems is the believe that zip codes don't change and that new ones won't be created. A co-workers area got a new zip code resulting in many months without properly working health insurance because computer systems kept flashing his zip code as invalid despite the new zip having been issued years ago.
Years ago, North American area codes always had a middle digit of 0 or 1. This led to the practice of storing a phone number efficiently in 32 bits by swapping the first and second digits. It yielded a value that would fit in a 32-bit integer; for example, (912) 350-8000 became 1923508000.
> phones in the disputed territory and partially recognised state of Kosovo may be reached by dialing the country calling code for Serbia (+381), Slovenia (+386), or Monaco (+377)
Monaco? WTF? And Slovenia doesn't make much sense either. Why wouldn't it be Serbia and Albania?
For Slovenia it's because the largest operator Mobitel (now part of Telecom of Slovenia) had a rather large mobile base station and service deployment in the region. The base stations were directly managed from Slovenia even before the split and were now assigned the slovenian prefix.
> In Brazil, to dial numbers internally but across a certain geographical boundary, a carrier code must be explicitly dialed to say which carrier you will use to pay for the call.
Any Brazilians here who can confirm whether this is true?
I know that in Brazil you can select a preferred carrier for a long-distance phone call by dialing a prefix. For example, you'd dial 014 ahead of your phone number to choose Brasil Telecom or 021 to choose Claro. But this is optional as far as I know. If you don't dial any prefix, you get a default carrier.
It's a shame despite the title there were no tips for using libphonenumber in the article. It's a very finicky library. You have to clean your inputs a lot before using it.
There are also a lot of edge cases where I've found it doesn't work. Phone numbers change all the time, so if you are using this library you need to update it constantly, and even then it may still be outdated for new changes.
If I need to validate a phone number is correct I'll send an SMS and ask the user to enter a code, otherwise I'll assume they know their phone number better than me.
This one got me when I ran a Japanese website. Unicode (deriving from Japanese standards) has "fullwidth forms" for Roman letters and numbers. Thus your users can enter a phone number as 0123456789 or 0123456789 (or even a mixture), and your code had better be able to cope with it.
If you're dealing with phone numbers, always normalize to E.164. It doesn't solve every problem, but it does a pretty good job of consolidating phone number issues to one place in your code (namely, the place where the E.164 normalization occurs).
If you're dealing with short codes or other highly regional things... Good luck...
Those falsehoods looked less problematic than the prior ones about names and addresses. Maybe because phone numbers still are technical things, numeric, and less culturally coded over hundreds and thousands of years.
1. Phone numbers that are valid today will always be valid. Phone numbers of a certain type today (e.g., mobile) will never be reassigned to another type. [...] Tip: Don’t store properties for a phone number such as validity or type. Check this information again from the library when you need it.
Not sure this suggestion is a truly pragmatic approach, because there is often very little else to determine SMS reachability up front, which is a common want. One might assume mobile until an adequately reliable SMS provider determines that the number is in fact not a mobile reachable SMSC-managed number. Fun fact: in China, there used to be multiple mobile messaging systems including a non-SMS system with weird gateway restrictions and no UCS-2 Unicode support. Another fun fact: UCS-2 kills bytes rapidly, you are better off (ie. get more length per message) using legacy local encodings like TIS in Thailand, etc. from a cost/length perspective. If you want to achieve that today, almost no mobile messaging provider can help you, you need to build custom SMS PDUs through their binary PDU submission interface, which is documented but still something of a black art and typically has very different reachability constraints to standard messaging (whole operators may suddenly become inaccessible).
2. A phone number uniquely identifies an individual [...] It wasn't even that long ago that mobile phones didn't exist, and it was common for an entire household to share one fixed-line telephone number. In some parts of the world, this is still true, and relatives (or even friends) share a single phone number.
This may be true, but on a related note, it is the trend now in many countries to require formal government identity document submission in order to obtain a SIM or make it functional through user registration. Therefore, in some parts of the world, you can skip a whole lot of mucking around with mobile-connected services by re-evaluating the probable bad-actor risk metric for customers from those countries through considering probably relatively strong police identification efficacy of abusive accounts in these markets.
3. An individual has only one phone number. [...] Obviously, this isn't necessarily true.
An individual has zero or more numbers which are not necessarily unique to that user, long term accessible, or of any certain or constant type or origin. Someone else may take these over at any time, so they cannot be relied upon solely as a trusted channel to the user.
5. Each country calling code corresponds to exactly one country [...] The USA, Canada, and several Caribbean islands share the country calling code +1. Russia and Kazakhstan share +7.
This is only half true. They generally have longer prefixes. For example https://en.wikipedia.org/wiki/Khazakstan shows +7-6xx, +7-7xx and much of the +1's can probably be reliably split through prefixes.
6. Each country has only one country calling code
Further example: calling Taiwan via China.
Another point: some endpoints need PABX suffixes to reach, for example voicemail or via corporate systems. There is no real standard for these programs, which tend to incorporate functions like "wait for initial connection", wait x seconds", "wait for a sound", or "wait for a second dialtone". In some countries people will write 123456 x7 for extension 7, but often multi-level inputs may be required (classic case: almost any telephone-based customer support system), in which case there is no standard.
> ITU says things like "national numbers can not be longer than sixteen digits" but valid numbers in Germany have been assigned that are longer than this.
German doesn't really have long phone numbers. Those are just phone number phrases made of individual small phone numbers, but strung together by the German writing system.
I find the title of this article condescending. The falsehoods given in this list are surely not believed by all programmers (much less the ones who need to work with phone numbers), and moreover I'm sure belief in them is not exclusive to programmers. I might be off-base in this criticism, but to me starting off with "let me show you how you're wrong" is not the best way to engage your readership. Perhaps a better title would be "common misconceptions about phone numbers" or "things to keep in mind when programming with phone numbers".
I can appreciate that it is an aggressive (arguably clickbait) title and can be off-putting, I think the reason there's a focus on programmers is because most other professions will not need to ever know or consider most of these facts about phones ever.
The focus on programmers is because virtually every point is about handling phone numbers as data to be processed and worked with, which is a fairly unique perspective that, as far as I can think, only programmers have. Most of the points are pitfalls that might trip up a programmer when working with code that takes a phone number as an input and needs to validate it.
While I'll grant that your alternative titles are much less aggressive, the original title really isn't condescending unless you read into it as such, in my opinion. It doesn't say "all programmers", just programmers, and all points seem like fairly common misperceptions.
It would probably be more accurate if it were titled "Common Assumptions Programmers Often Mistakenly Make." But, meh, the actual article was not very condescending.
http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-b...