> As an aside, is there a name for this purposeful perspective of strict literalism?
Correctness?
> I say "purposeful" because -- while you're obviously knowledgeable about the subject matter -- it can't have failed to occur to you that this approach cannot succeed outside of a very structured context.
Erm ... which is why I am applying it to the extraordinarily structured context of formal languages, protocol specifications, and computer software?!
> Is this a subcategory of the formal language / langsec efforts? Just standard standards-writing practice? Something else?
I would say the langsec efforts are an attempt to raise awareness that sloppy thinking about semantics is the root of a major proportion of vulnerabilities, to establish a label for this problem, and to try and establish some sort of best practices for avoiding such problems. Good standards-writing for protocols is, of course, extremely literal, as protocol implementations necessarily will be, so any ambiguity in the standard will result in interoperability problems and possibly vulnerabilities as a result, and in the long run to unnecessary complexity as people try to plaster over the differences in interpretation between implementations to improve interoperability, thus increasing the probability for vulnerabilities even further.
But you're not correct. You're doggedly and dogmatically wrong, in the only context that matters, which is the one in which this conversation was spawned.
Again, if for the purposes of the spec, you want to apply the label "FQDN DNS name" to "8.8.8.8", that's fine and great. You can also call it a "finalized mapping token" (which has the advantage of being literally correct), or a "turtle" (which would be surprising but not misleading).
But applying a label to the data does not change the nature of the data. In the larger context, the data was created as a text representation of an IP address, was never used in a DNS context, and the concept of "fully-qualified" doesn't have a lot of meaning where there is no process by which to further qualify any partial tokens.
It remains a textual representation of an IP address, even if it is used in a different context. Just as "Alice" remains a first name even if it is mislabeled or misused.
Of course, data can fit the validation criteria for multiple types, and it can be misused. "Alice" is a first name, but it is also a valid hostname. It is not a hostname just by virtue of being validly parseable as such. And if I wanted to know the first names of people here at HN, but asked for their hostnames, I would generally not get the answer I wanted.
If some awful code somewhere misused first names as hostnames, the network guy with very limited context might say "I see a query for hostname 'Alice'", but the people with larger context would ask "Why is this firstname being misused as a hostname?".
This HN thread was never a SNI spec internal debate, and no one here benefits from assuming that highly restrictive context.
I have had discussions with some of the langsec folks in the past. I greatly respect their work, and they are wise enough to know that their context is not useful in general discussion.
Your initial statement was condescending and misleading. As the conversation continues, it becomes clear that this was intentional, and you are willing to die on the hill of tiny irrelevant context. Noted.
> But applying a label to the data does not change the nature of the data.
This is not about applying a label, this is about type-tagging. The label is irrelevant, the type tag is not.
> In the larger context, the data was created as a text representation of an IP address, was never used in a DNS context
Except that it was. As per the SNI specification, putting something into the SNI hostname field is "using it in a DNS context", which is why doing so is a bug. It's the exact same bug as putting plain text into HTML. The fact that "<" represents a "less than sign" in plain text is irrelevant to what "<" means when it appears in HTML. The semantics of HTML are not governed by the spec of plain text. Using plain text where HTML is expected is a bug. The fact that a human with a larger context might be able to recognize that the string "3 < a / b = 42" appearing in an HTML document was probably not intended to contain a malformed HTML tag does not change that it in fact does.
> Just as "Alice" remains a first name even if it is mislabeled or misused.
Essentialism, anyone?
> "Alice" is a first name, but it is also a valid hostname. It is not a hostname just by virtue of being validly parseable as such.
Exactly! It is by virtue of the context in which it appears. And the context of the SNI host name field makes whatever appears in it a DNS FQDN.
> And if I wanted to know the first names of people here at HN, but asked for their hostnames, I would generally not get the answer I wanted.
Yep. And equally, when the server implementing SNI asks you for a DNS hostname, but you supply an IP address, you are not answering the question being asked, and you should expect whatever your reply is to be treated as a DNS hostname.
> If some awful code somewhere misused first names as hostnames, the network guy with very limited context might say "I see a query for hostname 'Alice'", but the people with larger context would ask "Why is this firstname being misused as a hostname?".
Sure, they might. That doesn't change the fact that as far as the protocol is concerned, it still is a hostname, which is precisely why it will fail to work.
> This HN thread was never a SNI spec internal debate, and no one here benefits from assuming that highly restrictive context.
You claimed that you could specify IP addresses in SNI messages. You still can't.
> I have had discussions with some of the langsec folks in the past. I greatly respect their work, and they are wise enough to know that their context is not useful in general discussion.
So, this is not a discussion about whether or not you can specify IP addresses in the SNI hostname field?
> Your initial statement was condescending and misleading. As the conversation continues, it becomes clear that this was intentional, and you are willing to die on the hill of tiny irrelevant context. Noted.
You are still wrong and apparently massively confused.
Let's assume we have a server that has a certificate for the host name 1.2.3.4. That is, a certificate with a subject alternative name of type dNSName, value "1.2.3.4". Now, an HTTP client is instructed to request the URI https://1.2.3.4/foobar, POSTing a valuable secret to that URI. This HTTP client puts "1.2.3.4" into the SNI dns hostname field, as you seem to be believe to be correct behaviour according to the RFC, right? Now, the server will correctly respond to that with said certificate, right?
What happens next? Should the client accept that certificate or not? Why should it? Why not?
> So, this is not a discussion about whether or not you can specify IP addresses in the SNI hostname field?
In fact, no. This is a discussion about whether textual representations of IP addresses can be used as inputs to tools that speak SNI, be used to specify a particular cert on the server side, and be conveniently extracted out of the sniffed network traffic comprising that handshake.
As it happens, the input conversion, the processing for usage in code written for spec implementation, the network stack conversions, and the sniffer capture reconversion back to a text representation for human viewing are all manipulative of the data.
But all of these manipulations are predictable and reversible, and if you want to call one of those data stages a "DNS FQDN", that's great but it isn't inherently correct, outside of the context of the spec which deigns to treat all final mapping tokens as DNS FQDNs, and to label them such -- but does not actually make them fully-qualified, nor the results of DNS queries.
We might have different opinions about the context of this discussion, but I would suggest that if you were to reread the thread from the beginning, there's really not much opportunity for confusion.
In any event, it's clear that this discursion is not advancing anyone's understanding of anything. Good luck in your future endeavours.
> In fact, no. This is a discussion about whether textual representations of IP addresses can be used as inputs to tools that speak SNI, be used to specify a particular cert on the server side, and be conveniently extracted out of the sniffed network traffic comprising that handshake.
Well, then your original description was pretty misleading. I'm just wondering why you didn't include, I dunno, street names? You can use most street names as input for the same purpose, right?
> But all of these manipulations are predictable and reversible
Nope, that's precisely the problem. You can not distinguish the hostname "1.2.3.4" from the IP address "1.2.3.4" after they have been encoded in that manner into the SNI request, hence the encoding is not reversible (and leads to collisions).
> We might have different opinions about the context of this discussion, but I would suggest that if you were to reread the thread from the beginning, there's really not much opportunity for confusion.
Well, there is, as it's trivially true and at the same completely pointless to point out that you can use any string that fits the syntactic requirements to put it into a field of a protocol message to produce a syntactically valid protocol message.
So, yes, you can put a string that was produced as a textual representation of an IP address into the SNI hostname field. It just so happens that that is essentially guaranteed to lead to certificate validation failure during connection establishment with any non-vulnerable TLS implementations.
Was that really the point that you were trying to make?
Correctness?
> I say "purposeful" because -- while you're obviously knowledgeable about the subject matter -- it can't have failed to occur to you that this approach cannot succeed outside of a very structured context.
Erm ... which is why I am applying it to the extraordinarily structured context of formal languages, protocol specifications, and computer software?!
> Is this a subcategory of the formal language / langsec efforts? Just standard standards-writing practice? Something else?
I would say the langsec efforts are an attempt to raise awareness that sloppy thinking about semantics is the root of a major proportion of vulnerabilities, to establish a label for this problem, and to try and establish some sort of best practices for avoiding such problems. Good standards-writing for protocols is, of course, extremely literal, as protocol implementations necessarily will be, so any ambiguity in the standard will result in interoperability problems and possibly vulnerabilities as a result, and in the long run to unnecessary complexity as people try to plaster over the differences in interpretation between implementations to improve interoperability, thus increasing the probability for vulnerabilities even further.