> some DNS clients will arbitrarily randomize the case of names to add an elemen...

wahern · on July 9, 2016

From RFC 1034 S. 3.1: "[D]omain name comparisons for all present domain functions are done in a case-insensitive manner, assuming an ASCII character set, and a high order zero bit. When you receive a domain name or label, you should preserve its case. The rationale for this choice is that we may someday need to add full binary domain names for new services; existing services would not be changed."

First, we can't speak of it being undefined in the same manner as we do undefined in the C standard. The DNS standards weren't this rigorous, and didn't use consistent terminology like MUST and SHOULD universal in today's RFCs.

Second, they were explicit that while the existing services (e.g. IN class and A record type) were ASCII-based and case-insensitive, the binary protocol was meant to be 8-bit clean, that some labels might be 8-bit in the future, and it was expected and mandated that this capability be preserved. So strictly speaking, the RFC allowed a server to, e.g., modify the case of an A record label on the wire, but not of some unknown label. In practice it's easier to simply treat all labels in an 8-bit clean manner, and that's in fact what major implementations do. You literally have to go out of your way to do otherwise while still obeying the standard.

Caching name servers like BIND and unbound will reply with the identical question label. For example, notice in the following how the TTL is decremented (and thus being pulled from cache) but the query case is preserved:

  % dig -t A google.com                               
  ; <<>> DiG 9.8.3-P1 <<>> -t A google.com
  ;; global options: +cmd
  ;; Got answer:
  ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 20838
  ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
  
  ;; QUESTION SECTION:
  ;google.com.			IN	A
  
  ;; ANSWER SECTION:
  google.com.		105	IN	A	172.217.4.206
  
  ;; Query time: 0 msec
  ;; SERVER: 192.168.2.1#53(192.168.2.1)
  ;; WHEN: Sat Jul  9 00:45:57 2016
  ;; MSG SIZE  rcvd: 44

  $ dig -t A GoOgLe.com                               
  ; <<>> DiG 9.8.3-P1 <<>> -t A GoOgLe.com
  ;; global options: +cmd
  ;; Got answer:
  ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 7947
  ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
  
  ;; QUESTION SECTION:
  ;GoOgLe.com.			IN	A
  
  ;; ANSWER SECTION:
  GoOgLe.com.		95	IN	A	172.217.4.206
  
  ;; Query time: 0 msec
  ;; SERVER: 192.168.2.1#53(192.168.2.1)
  ;; WHEN: Sat Jul  9 00:46:07 2016
  ;; MSG SIZE  rcvd: 44

In reality, the core DNS infrastructure was perfectly capable of fully supporting raw UTF-8 labels (Though a DJB page suggests that some older versions of Unix gethostbyname stripped 8-bit labels.) Unlike other infrastructure, the implementations were fairly homogenous (until a few years ago BIND absolutely dominated), so ad hoc (and broken) implementations were few and far between. And unlike other infrastructure, there was very little incentive to violate 8-bit cleanliness. The biggest problems were not that some ad hoc implementations modified case, per se, but that some ad hoc caching proxies would reply with the case of a cached record. That's out of sheer laziness, or because they didn't read the standard closely enough. It's telling that BIND, unbound, and other major caching proxies are careful to preserve case in the reply even though that's not necessarily the easiest solution.

The real problem was edge software, like browsers, e-mail clients, etc, that baked in way more assumptions than warranted. Arguably IDNA and punycode took more effort to roll out than would have alternatives based on raw UTF-8. The core infrastructure software wasn't a real barrier, and the IDNA solution required more code at the edges. While the major browsers were facing lots of work regardless, most ad hoc software would have been fine just fixing 8-bit cleaniness problems and then punting on things like glyph security issues, especially if they weren't directly user facing. The vast majority of edge software would have just required some slight refactoring, not huge rewrites with library dependencies for the new compression scheme, etc.