I've always thought that the "naming things" refers to the more subtle problem of giving things unique identifiers in distributed systems, rather than coming up with variable names.
"Naming things" includes systems like MAC address allocation, IP address allocation, DNS, URLs for documents, process IDs, name to inode mapping in filesystems, autoincrement primary keys in databases, etc.
Any ideas on what Karlton really meant with the quote?
It seems to me that this quote makes a rather standard usage of the "Zeugma" [1] figure of speech. It consists in putting next to each other two things of very different nature. In that case we have the very technical problem of invalidating caches on the one hand, and the much higher level problem of naming things on the other hand.
That produces a rather stylish quote.
But if Phil Karlton was actually referring to the technical problem of assigning unique identifier as you suggest, the figure of speech is lost and the quote becomes less stylish.
So I prefer tho read "naming things" as the very general problem of naming things.
I always assumed it was the more general "coming up with names" - for variables, hostnames, project names etc.
The non-human-readable things like MACs, IPs and auto-increment keys are easy. But when I have to come up with a short, memorable and non-confusing name for something like a script that can take me longer than writing the code.
Unique IDs are easy on a single machine, but (just like cache invalidation) become surprisingly complex in a distributed system.
For example, how to assign unique MAC numbers to network cards without a centralised database that has to be modified for each card manufactured? Vendors get address space blocks, and probably split these further into blocks for factories and production lines.
Generating unique autoincrement row numbers in a distributed database or process IDs in a cluster is also a complex problem if it has to be faster than any communication between the nodes.
Designing the layout of IP addresses, architecture of DHCP and DNS, and the very idea of URLs are fantastic pieces of Computer Science work. Easy to use, but a hard problem for the original designers!
But of course, I have no idea what Karlton had in mind with the quote. "Naming things" as "assigning unique identifiers" always felt appropriate with cache invalidation, as both are problems that are easy for single cores/machines, but very complex in distributed systems.
>The difficulty only arises if the body of a nested function refers directly (i.e., not via argument passing) to identifiers defined in the environment in which the function is defined, but not in the environment of the function call.
Also, the version I know has much more bite:
>There is also a variation on this that says there are two hard things in computer science: cache invalidation, naming things, and off-by-one errors.
what's the name for the four hard things concurrency, in computer science: cache invalidation, off-by-one errors, naming things, and whatever the last one is called?
> -- Phil Karlton