The MIME guys: How two Internet gurus changed e-mail (2011)

niftich · on Dec 15, 2017

MIME was hugely influential. Besides the bulk of all email being in MIME format, it also popularized:

* MIME types, now called media types; a compound identifier for formats of data interchange. Media types are officially maintained by the IANA Media Type Registry [1].

* Base64. MIME lifted the encoding from the PEM RFC [2][3] but MIME brought it to prominence.

* Content-Disposition: a header to specify intended presentation semantics. Later lifted into HTTP.

* Other "Content-" headers, like Content-Location. These were also adopted by HTTP.

Furthermore, the various multipart mediatypes imbued adjacent parts with additional semantics, like "multipart/mixed" to signal that the child nodes are independent, but ordered, and "multipart/alternative" to signal that the child nodes are to be treated as alternate versions of the same content, in order of increasing richness.

The thorough design around multiparts meant that a MIME document is a tree, where every node is a subdocument with headers, preamble, body, body, and epilogue. It was suitable to use as a nesting data structure, albeit it most applications opted for binary formats instead. Later, when text-based formats became popular, they often invented XML-based formats to capture a similar tree.

[1] https://www.iana.org/assignments/media-types/media-types.xht... [2] https://tools.ietf.org/html/rfc989#section-4.3 [3] https://tools.ietf.org/html/rfc1421#section-4.3.2.4

noncoml · on Dec 16, 2017

The whole email is based on archaic complicated system and it is long due for a disruption.

The RFCs are too many and too long and complicated to get them right at first try.

The SMTP protocol for example is based on ABNFs, which are not very machine friendly, and many implementations end up using regular expressions.

Then there is MIME and the quest for the main content of the message; there is no precise way to tell which of the parts of the MIME message are the main contents to be displayed to the user. Some clients use "multipart/alternative" and under that they have a "text/plain" and a "text/html" part. Others use "multipart/mixed" and add them there. Yet some others use "multipart/related". The you may have “multipart/alternative” under “multipart/mixed” and so on.

The way to achieve high-availability, is by using multiple MX records in the DNS for the mail exchangers. Imagine if today you had to get a list of host to try running your REST APIs against.

And of course achieving end to end encryption is not straight forward for the users.

IHMO it is time to come up with a new REST base API for email and short message exchange.

Spooky23 · on Dec 16, 2017

You’ll never get a replacement system that is truly universal. Every replacement produc has been proprietary.

noncoml · on Dec 16, 2017

Yeah, that's the problem. The forces that drive the internet today are more interested in locking in the users.

tonyarkles · on Dec 16, 2017

> Imagine if today you had to get a list of host to try running your REST APIs against.

That's... exactly what we do, via DNS:

    $ host google.ca
    google.ca has address 96.63.131.14
    google.ca has address 96.63.131.28
    google.ca has address 96.63.131.26
    google.ca has address 96.63.131.18
    google.ca has address 96.63.131.22
    google.ca has address 96.63.131.24
    google.ca has address 96.63.131.16
    google.ca has address 96.63.131.20

noncoml · on Dec 16, 2017

I don't think so. The client will do a DNS lookup, try the address and if it fails will give up.

Look at the node.js code for example: https://github.com/nodejs/node/blob/master/lib/net.js. Look for "lookupAndConnect".

The high-availability responsibility has been moved to the server side.

stingraycharles · on Dec 16, 2017

DNS is indeed being used for load balancing, not HA. However, piggybacking on the parent’s example of Google, they actually achieve HA on the IP level.

They have multiple data centers broadcast the same IPs using BGP, and if one goes down Dijkstra’s algorithm will re-route to another data center that listens to the same IP.

Having said that, I actually think that using multiple MX records for HA is a very elegant and robust solution for mere mortals that do not own multiple data centers and their own AS. Server side HA is way more expensive to get right than client side HA, and if it’s important for the protocol (which, in the case of email, it is), you better make damn sure it’s not expensive to implement for service providers.

saurik · on Dec 16, 2017

How is an ABNF grammar specification somehow not "machine friendly"?!

noncoml · on Dec 16, 2017

Why you think ABNFs are machine friendly? Is it that easy to parse ABNFs that we are flooding with libraries?

Just because it is a formal system it doesn’t mean it’s machine friendly.

saurik · on Dec 16, 2017

How to work with BNF grammars is seriously second year computer science... if you wanted to parse that precise format it should probably take something like twenty minutes using an existing parsing framework, and then you can generate anything you want.

I could see making "parse SMTP" a homework assignment for a compilers class... if only it were challenging ;P.

FWIW, when I implemented the SMTP/IMAP specs, I spent a few hours first writing my own parser combinator framework (to be 100% clear: from scratch, with no reference) and then made short work of getting these specific grammars fully implemented.

noncoml · on Dec 16, 2017

> when I implemented the SMTP/IMAP spec

Link to the code?

> I spent a few hours first writing my own parser combinator framework (to be 100% clear: from scratch, with no reference)

I am sure you did. Like when you rolled out your own crypto in 5 minutes. Or the time you rewrote FB in a weekend.

emmelaich · on Dec 16, 2017

You can read more history from NSB's mime page on his home domain:

http://www.guppylake.com/nsb/mime.html

Including a video of the " ...Telephone Chords, the world's premier (=only) all-Bellcore barbershop quartet, singing about MIME."

(note: auto-plays)

dvt · on Dec 16, 2017

Awesome history lesson, I had no idea how MIME originated. A few years ago (gosh I guess it's been like 4 now), I wrote a CORS-enabled MIME-type checker[1][2].

The main issue with MIME (which the article barely touches on, unfortunately) is that the type can be spoofed. It can be a dangerous attack vector and trust should never be given to external systems that claim x.jpg or y.mov is actually a "image/jpeg" or "video/quicktime."

[1] http://lecoq.herokuapp.com/

[2] https://github.com/dvx/lecoq

zAy0LfpBZLC8mAC · on Dec 16, 2017

You have it all backwards. A MIME type cannot be "spoofed" because a MIME type is a processing instruction, not a certification. If you receive an entity that is labeled as "image/jpeg", that means that you are instructed to treat it as a JPEG image. That is to say, you should only hand it to a JPEG decoder. As with any processing of untrusted input, that JPEG decoder should obviously not have any vulnerabilities, and it should reject any syntactically invalid input. If someone takes a video file and sends it to you labeled as "image/jpeg", there is no spoofing going on, you simply received a syntactically invalid JPEG image which you consequently should reject.

Unfortunately, browsers, in particular IE, had this habit of ignoring the relevant standards and do what has become known as "content sniffing": They ignore the declared MIME type and instead try to guess the correct decoder based on the content. That was (well, still is, for backwards compatibility reasons) a huge vulnerability in browsers. If browsers simply followed the relevant standards, there would be absolutely no problem with serving an uploaded "image/jpeg" entity without any verification, as no browser should do anything with it other than notice that it's an invalid JPEG image. That you have to care about this at all is because you have to work around vulnerabilities in browsers, not because anything is being "spoofed".

dvt · on Dec 16, 2017

A MIME type can be spoofed because spoofing doesn't only apply to certifications (e.g. IP spoofing, caller ID spoofing[1], etc.). Spoofing merely means "tricking" or "lying" -- and this can introduce all kinds of complications. There are literally dozens of bugs (in all browsers, not just IE) that exist due to the fact that MIME types can often be misleading[2][3][4].

[1] https://en.wikipedia.org/wiki/Caller_ID_spoofing

[2] https://www.mozilla.org/en-US/security/advisories/mfsa2005-1...

[3] https://blog.mozilla.org/security/2016/08/26/mitigating-mime...

[4] https://bugzilla.mozilla.org/show_bug.cgi?id=1295945

zAy0LfpBZLC8mAC · on Dec 16, 2017

But the thing is that there is no tricking or lying. There is simply a syntactically invalid entity. And idiotic software that does completely irresponsible things when confronted with such syntactically invalid entities, such as feeding syntactically invalid JPEG images to the javascript interpreter.

MIME types also can not be misleading. MIME types are the authoritative declaration of what something is. If it's not that, then there is nothing misleading, it's simply invalid.

Framing this as "MIME spoofing" is about as sensible as calling a buffer overflow in some font renderer "machine code spoofing". If your font renderer under some circumstances takes pieces of the font description it is interpreting and feeds them to the CPU for execution, that is not "machine code spoofing", it's simply a buffer overflow vulnerability in your font renderer. And just as a font renderer shouldn't feed pieces of the font to the CPU for execution, a JPEG parser shouldn't feed pieces of the image to a javascript interpreter for execution.

interfixus · on Dec 15, 2017

klodolph · on Dec 15, 2017

I knew 1997 sounded too late. I don’t really miss Usenet, but I do feel nostalgic sometimes.

jandrese · on Dec 15, 2017

Yeah, I was thinking that couldn't be right either, because I was struggling with the first MIME encoded emails in the mid-90s and being very annoyed at mailers that would BASE64 the text of the email so they could put in those Microsoft proprietary "smart" quotes.

interfixus · on Dec 15, 2017

Care to explain the downvote? The article is from 2011, dammit.

unwind · on Dec 15, 2017

You might have worded it a bit too tersely. I'd go for something like:

Mods: please add [2011] to the title, since the anniversary was in 2011 (MIME was released in 1991).

That makes it clear what should be adjusted, while sounding a bit more polite. That's my guess about the downvote, anyway.

interfixus · on Dec 15, 2017

Probably. I keep forgetting there are ordinary social niceties in play on HN :)

dang · on Dec 15, 2017

Your comment was just fine because minimalism is a value here too. But of course many readers don't know that. Anyhow, we added 2011 above. Thanks!

pocketarc · on Dec 15, 2017

It’s what makes HN the most civilized community I’ve encountered in a long, long while. These small niceties, though insignificant on their own, make a significant difference in setting the tone for this place.

u801e · on Dec 16, 2017

On a tangential note, the entire subthread starting at [1] was collapsed when I clicked on the comments link for this article. But when I expanded the thread, the text in the post was not greyed out.

Are posts (and their associated subthreads) that were downvoted and subsequently upvoated still collapsed by default?

[1] https://news.ycombinator.com/item?id=15933563

interfixus · on Dec 16, 2017

Yes, it's a bit strange. I'm the author of the 2011 comment, and it's collapsed for me as well, even though by now it has a definitely positive score of +8. And it was never (to the best of my knowledge) downvoted to below zero.

I assume it's an editorial decision, since the pointed out omission has been rectified, and the thread could thus be deemed noisy and irrelevant.