Some thoughts on security after ten years of Qmail 1.0

pilif · on Jan 17, 2018

Something to keep in mind with regards to qmail is that it's extremely feature-poor and it never got features beyond its initial design goal.

This makes it much easier to keep the bugs out, to the point that making software under such constraints is much more similar to traditional construction projects.

I mean: Nobody ever tells you after you have built a bridge that they are now going to upgrade gravity to gravity 2.0 with 100% more pull. And nobody will ever tell you that your bridge will now get a shopping mall in the middle of it where people can purchase products of their favorite brands.

Software starts to break down when it has to be taken above initial design constraints and when there is not enough time to rewrite subsystems (or all of it) but instead when you have to make the abstractions leaky and compromise.

But back to qmail:

qmail itself is so feature-poor that traditionally, nobody was and is actually running qmail. Instead everybody is running "qmail" which is qmail plus some patches. Sometimes home-grown, sometimes taken from third parties.

But more often than not they are unmaintained and very far removed from the high quality standards of the underlying software.

This is the downside. Yes. You have a bug-free core that totally meets its designers (limited) use-case, but in reality nobody is actually running that.

geocar · on Jan 17, 2018

In contrast, almost nobody runs "exim plus some home-grown patches" or "postfix plus some home-grown patches".

Having the correct architecture, and being able to rewrite the subsystems piecemeal means it is possible for users to experiment with new features organically[1]. That's part of why some qmail installations have features not available in any other mail server even today.

Or to put another way, software purity and homogeny isn't a "good thing", but a trade off: You get to share risk with everyone else who chose like you did, but you're also stuck with the same features and risk everyone else has.

I'd choose "feature-poor download, but highest-feature in production" over "a-few-more-features download and limited-ability-to-upgrade" any day.

[1]: If you're curious, some of the experiments I did are briefly mentioned here: https://news.ycombinator.com/item?id=16166530

vidarh · on Jan 17, 2018

I built a webmail system on top of Qmail back in the day, and I loved it. We rewrote component by component as our needs changed, but the beauty of Qmail was that we could rewrite component by component by sticking to very simple contracts between them and/or start with the Qmail components themselves that for the most part are extremely simple.

Our web frontend for example worked on top of a slightly modified Qmail POP3 server that relied on encoding some message state in the filename, so we need only scan the Maildir without opening the files to be able to e.g. get message read state, and various flags. We also added caching of metadata etc. The changes were tiny and self-contained, and added one extra non-standard command to the pop3 server to retrieve a message list with much more data. The design of Qmail let us start with just changing the pop3 server, and later some tweaks to local delivery, and know we could test those programs in isolation.

The flexibility of Qmail even got us to use it as a messaging middleware later coupled with tinydns - all the routing and retry logic made it very convenient and extremely simple to troubleshoot.

geocar · on Jan 17, 2018

That's great. Do you still have details of it?

I did a webmail "based on qmail" as well. It's still running[1] if you're curious.

[1]: http://demo.internetconnection.net/netmail/

vidarh · on Jan 18, 2018

I have an ancient "backup" sitting somewhere. It was a provider called Nameplanet which provided vanity addresses (lastname.[assorted TLDs] for example), and morphed into Global Name Registry when we launched the .name TLD... The webmail system itself was sold to NetIdentity in 2001 or 2002...

It had a few interesting details - we ran it on ReiserFS for the fast/efficient small file support (at the time it really stood out) which made it great for Maildir's.

We also eventually used a small daemon to poll backends for which server a user belonged to, which had a mechanism to let us mark a user as "busy" so that we could balance accounts between backends by marking it as "busy" on both servers, sync the files over, and then mark it as available again without triggering errors anywhere. qmail on our MX's was modified to look up the right server that way.

The biggest changes were the POP modifications I mentioned. The ones I remember off the top of my head were:

* We modified qmail-local and the pop server to append size changes (from writing a new message or deleting one) to a file used to manage quotas. We appended rather than rewrite because it reduced the need for file locking (we took care to do single writes). We'd lock and coalesce the changes when the file got over a certain size.

* qmail-local was also changed to append the message size, and read-status to the filename. That let us avoid stat() calls for the files for the filesize, and opening and reading the files for unread counts etc. It was one of the first optimizations we did.

* Then we added a cache file that contained subject, sender, size, attachment status etc., for the web frontend, which would be dynamically re-generated automatically as needed.

* We made "+[something]" sort directly into folder "something".

* When we sold it I was most of the way through adding Sieve support to our qmail-local replacement.

These changes were quite small, and each successive changes lowered IO load dramatically (we handled about 2 million accounts before it was sold). Today, we could probably handle the IO load and storage we had with a single NVMe card...

The web frontend would try to use our extended POP3 command, and then fall back to scan the messages (and store a cache locally on the frontend) if it wasn't available, so we could use it as a POP3 client for other backends too. (The frontend is a story in itself - C++ CGI statically linked to shorted load time (it made a big difference at the time) and with "delete" only for really large allocations, to avoid wasting time on deallocation since we knew each process would at most live for a few seconds.

geocar · on Jan 18, 2018

Neat!

It's a shame that more people don't develop software with this kind of... change-for-purpose.

cesarb · on Jan 17, 2018

And you need these patches to fix what is IMO qmail's worst problem: backscatter. As far as I remember, when receiving an email with a forged return address to a non-existent mailbox, qmail first accepts the email, then sends a bounce message to the forged return address. Other MTAs (and patched qmail) reject the email directly in the SMTP session, preventing this issue.

I personally consider this backscatter issue a design bug in qmail.

NoGravitas · on Jan 17, 2018

I worked at a web hosting company 1999-2005 that used qmail, and while there were many things wrong with qmail, due to it not being designed for the realities of email circa 2001, backscatter was by far the worst. We were processing significantly more backscatter than valid email, and to the best of my knowledge, the patches to address it didn't yet exist.

We certainly should have switched mail servers, but qmail was deeply ingrained in our home-grown hosting automation system, and it would have been a big deal to change.

stevekemp · on Jan 17, 2018

That was definitely true, and it was the reason that I personally stopped using qmail in a previous (very long time ago now) job.

There were patches to fix the problem, along with offering useful features, but for whatever reason we went with exim (exim 3.x from Debian).

aplorbust · on Jan 17, 2018

"qmail itself is so feature-poor that traditionally, nobody was and is actually running qmail.

Instead everybody is running "qmail" which is qmail plus some patches."

I ran and continue to run qmail without any patches.

The above quoted statements thus cannot be true.

But maybe "nobody" and "everybody" are figures of speech?

NoGravitas · on Jan 18, 2018

So...are you running an open SMTP relay, or are you refusing to relay mail for your own users? Because I'm pretty sure that with an unpatched qmail, you've got to be doing one or the other.

Also, how are you dealing with backscatter?

aplorbust · on Jan 19, 2018

Not using qmail the way you are (incorrectly) assuming.

I am an end user not an email provider.

For example I use qmail to provide "inter-device email" on a local network of devices all belonging to the same user, and not connected to the internet. Not that I love email but these devices are sometimes "locked down" by default and email is one of the few ways to move files between devices without using the internet.

Another example is using qmail on a tap-based layer 2 overlay (not OpenVPN) to provide encrypted "peer-to-peer email". Each peer is running qmail-smtpd bound to a tap device. This was an experiment to prove encrypted email is easy.

qmail running under curvecpserver is another experiment.

technion · on Jan 18, 2018

qmail won't even compile on modern glibc without the errno patches - you must have at least some patching done.

aplorbust · on Jan 19, 2018

Not using glibc.1

Not using Linux.

Advice for all commenters who make presumptions about others computer use: Please kindly check your assumptions.

1 Is this an issue for musl and the various other alternatives to glibc? I have no idea but seems like only referring to glibc 2.3.x and up is a bit myopic. Its possible some users might not be using that library. I am one such user.

0xbadcafebee · on Jan 17, 2018

I wrote a qmail masquerading plugin when I was 17. Would it have been nice as a feature? Sure. Did my plugin suck? Definitely. But it worked, and the core software stayed secure, while I watched others patch sendmail every year.

pilif · on Jan 17, 2018

> But it worked, and the core software stayed secure

are you sure? How can you be sure that your custom patches didn't affect the security of the core product? qmail wasn't designed to be extensible. It had no plugin interface.

Of course it's possible that you didn't make a mistake back then.

Just as it's possible that I didn't make a mistake when I was 18 and wrote a patch to Cyrus imapd to allow authenticating against an SQL database.

But TBH, when I look back at the code I wrote back then, at least in my case, I'm quite sure I f'ed up in various ways.

Thankfully, I never shared these patches with other people.

0xbadcafebee · on Jan 17, 2018

Oh I'm sure it was bug ridden. But even if my feature introduced a security hole, you would have to find and exploit it, and it would then have to find a way to attack the rest of the app (which qmail makes difficult).

It's kind of like using OBSD as your app platform. You can definitely make it insecure! But it's more secure by default than others, perhaps because of a lack of features, as well as very good security design.

irundebian · on Jan 17, 2018

Are you sure that you understood djb's statement about the principle of least privilege? It's not about attacking the rest of the app, but about violating the user's security requirements.

0xbadcafebee · on Jan 17, 2018

I don't understand what this has to do with my comment.

irundebian · on Jan 17, 2018

> it would then have to find a way to attack the rest of the app (which qmail makes difficult).

It's not necessary to attack the rest of the app as soon as user's security requirements are violated. So if an attacker had been able to have an impact on confidentiality, integrity or availability because of your masquerading patch, user's requirements would have been broken. For an impact on availability controlling control flow isn't necessary, you just need to crash components.

0xbadcafebee · on Jan 18, 2018

Yes, you are right. An exploit is still viable even if it doesn't attack other parts of a system.

My point was that with a default of secure design, even small exploits added via plugins are better defended against than my alternative option, which was sendmail (i'm sure Exim wouldn't have been quite as horrible as sendmail, and Postfix wasn't quite mature yet).

Florin_Andrei · on Jan 17, 2018

> nobody was and is actually running qmail

I had Qmail running circa 1998 and I'm pretty sure it was vanilla, no patches. But after all this time I could be wrong.

Anyway, shortly thereafter I discovered Postfix and that was the end of my relationship with Qmail.

NoGravitas · on Jan 18, 2018

In 1998, it was probably reasonable to run qmail without patches. By 2000 or 2001, it definitely wasn't.

adrianN · on Jan 17, 2018

It happens that bridges get more lanes, or more cars.

pilif · on Jan 17, 2018

In most cases that won't require a fundamental change in the architecture though. And when it does, then the bridge is rebuilt in accordance with the new requirements.

Fnoord · on Jan 17, 2018

Ridiculous example: Qmail refused CRLF e-mails because the headers couldn't be read. It only accepted LF e-mails. In practice, this meant an e-mail client like Outlook did not work. It was solved by a very small fix with a few lines of C, but the patch I found did not apply cleanly either because it was for an older version or because of other patches, so I had to port it. This makes something like autoconf where you gotta specify all kind of options to ./configure a breeze.

geocar · on Jan 17, 2018

You have that backwards. Qmail rejected emails with bare LF because it was in violation of the email spec.

https://cr.yp.to/docs/smtplf.html

pilif · on Jan 17, 2018

It's still a real issue though: As an ISP you can't run a mail server that doesn't accept mails from the one client that's most used among your customers.

Yes. Outlook isn't conforming to the spec, but it's also being actively used and even if you could put pressure on Microsoft to fix it (good luck doing that back in the early 00s), you can't possibly force all your customers to update.

Now you have three options:

1. you switch MTA to one that can deal with the non-conforming clients.

2. you add a proxy server that interfaces between the non-conforming clients and your MTA

3. you patch your MTA

Unfortunately, because of qmail's good reputation ("hey! we're running qmail that never had security issues!") and because of the lack of abilities of your run of the mill ISP to write a scaling SMTP proxy, what people have traditionally have done is option 3.

What they forget about option 3 is that the one big advantage of that solution ("hey! we're running qmail") isn't valid any more because you're not exactly running qmail any more. You're running qmail plus some additional patches that actually touch the public interface of your MTA and are thus exposed to the network. To unauthenticated users.

So IMHO, they should have gone with option 1) or, nowadays where it's easier to write a well-scaling SMTP proxy thanks to the raise of asynchronous event based communication, option 2, but you'd better be sure you're not introducing security flaws in your reverse proxy.

geocar · on Jan 17, 2018

It's not a real issue because those clients don't exist anymore.

Nearly twenty years ago when they did, (4) I could use fixcrio (which is hardly a proxy server): I simply ran it one of the ports that accepted mail from MUAs directly.

JdeBP · on Jan 17, 2018

Indeed.

* http://jdebp.eu./FGA/qmail-myths-dispelled.html#MythAboutBar...

diafygi · on Jan 17, 2018

As we cast about trying to figure out ways to make software more secure or reliable, please remember that in other engineering fields (civil, chemical, mechanical, etc.) prioritizing safety and reliability is a _solved problem_.

(1996) https://www.fastcompany.com/28121/they-write-right-stuff

> It is perfect, as perfect as human beings have achieved. Consider these stats: the last three versions of the program — each 420,000 lines long-had just one error each. The last 11 versions of this software had a total of 17 errors. Commercial programs of equivalent complexity would have 5,000 errors.

> The process isn’t even rocket science. Its standard practice in almost every engineering discipline except software engineering.

The problem is consequences. We had centuries of people dying in bridge collapses before we got our shit together and started prioritizing safety in civil engineering (i.e. engineers and managers going to prison if they don't).

The same will be true for software. As more people get harmed by thrown together software (e.g. mass panic in Hawaii, state psychological exploitation on social media), we'll start regulating it like other engineering fields.

As a former chemical engineer, I welcome this transition, but I realize it will likely also take centuries of hard lessons.

cageface · on Jan 17, 2018

Follow the money. If people were willing to commit to a specification with the same level of precision as a civil engineering blueprint and then stick to that specification then you’d see a dramatic uptick in software quality, along with a dramatic uptick in price and time to develop. But since hardly anybody wants to commit to that kind of precise design or pay the cost we get what we have today instead.

Come to SE Asia and I’ll show you plenty of civil engineering projects with similarly poor planning and execution.

cuckcuckspruce · on Jan 17, 2018

There are software engineering fields where people commit to precise specifications - they typically interface with humans on some biological level (medical devices, etc), and they are quite expensive.

pas · on Jan 18, 2018

Of course, the same goes for military/infrastructure (self-driving trains) software.

And there are solutions for improving software safety, it starts by improving the process of software development. And for that there's at least one sort of official framework: CMMI.

(Even though DoD stopped requiring CMMI, but it still considers it a big plus, and companies for important stuff all use it.)

Of course, it'll be interesting to see what goes on with the F-35, as a lot of testing is now "skipped". (Sure, that is not software testing, but still, I can't imagine the software dev process is in a better shape after all these delays, cost overruns and probably still ongoing requirement changes.)

LudoA · on Jan 17, 2018

I read an article many many years ago explaining why software engineering can't be compared to mechanical engineering.

I can't find it now, but one of its points of comparison was: if a nut isn't tightened enough on a bridge, you can tighten it a bit more -- turn the nut 3 or maybe 6 millimeters more. In software engineering, if you tighten the nut only 3 millimeters instead of 6, one of the bridge's endpoints now ends in a different dimension.

I think there is a valid point to it. I'm always amazed in housing construction how things are done "approximately". A wall should be vertically straight... give or take a few centimeters. That kind of approximation doesn't work in software engineering. It has a whole other type of complexity to it.

jerf · on Jan 17, 2018

Yeah, I've completely stopped buying this idea that software engineers just need to do what all the other engineers do to improve. Let me know when someone engineers an n-dimensional bridge.

For that matter, let me know when your engineers have built something as complex as the AWS infrastructure, by which I mean, every single service they offer, by information content (something like Kolmogorov complexity). There are some disciplines like power management that certainly have some very large and impressive things that I don't wish to diminish, but the idea that software engineering needs to go learn from all the other engineers is an idea stuck in a very 1970s view of what a "large project" looks like.

I am abundantly confident that if you took the amount of engineering done to build something like the Golden Gate bridge today, then took all those man-hours and saw how much software you could get for the same amount, that you'd be surprised how small the result is. Not that it wouldn't be a good chunk of hours to play with, and it would be much larger than many phone apps or web sites, but if you stack it up to something like a browser or a usable OS kernel I'm pretty sure you'd find that the very visually-imposing bridge is actually orders of magnitude, plural, simpler. I say this not because I don't respect civil engineers, but because I also respect just how quickly big computer projects can chew through the man-centuries. No bridges would get built if each one individually required engineering effort commensurate to a new web browser.

It's past time for programmers to get over their inferiority complex. A sober look at what the programming world does vs the other engineering fields shows that all things considered, we're doing pretty well. Still a huge amount of room for improvement, but we're not doing so badly that we need to go running off to other completely different disciplines to drag in irrelevant, if not actively harmful, practices that are unconnected to the problems we face.

irundebian · on Jan 17, 2018

I think I agree that software engineering is more complex than building bridges if you take two comparable project. But I don't agree that software engineering is more complex than building rockets which requires - besides a lot physics know how - also deep understanding of electronics and hardware systems. And if we look at the Ariane 5, the software engineers failed.

imglorp · on Jan 17, 2018

An even more demonstrative example, the STS might (still?) be the most complex physical system ever.

The orbiter software group managed to not kill anybody with their software although there were some 17 bugs flown. The same cannot be said of the mechanical and human management systems, which killed one crew and possibly two.

Real software projects often fails for similar reasons, when somebody says "we know it's not designed for that condition, fly it anyway".

pas · on Jan 18, 2018

Rockets reliability is largely held back by their extreme trade off toward maintenance vs robustness. When you have to check everything and replace a bunch of things between flights, you're bound to miss something.

And that of course impacts the software side too, the rocket is constantly changing, the software has to change too, bugs are guaranteed.

KozmoNau7 · on Jan 17, 2018

If you're building houses, and your walls are a "few centimeters" off from straight, you'll get a real asswhoopin'.

Especially if you're building something with more than one story, and you expect people to actually pay for it.

someone454 · on Jan 17, 2018

I’ve never found a perfect 90 degree angle in any house, and I’ve built portions of quite a few. Wood expands and contracts with the seasons.

ams6110 · on Jan 17, 2018

Yeah, "a few centimeters" is a lot, but under 1/4 inch is close enough for most residential framing. Windows and doors are always shimmed to fit the rough framing for this reason.

_wldu · on Jan 17, 2018

That's why they make protractors for measuring and cutting external baseboard trim. I've seen a few 90 degree corners, but not many.

KozmoNau7 · on Jan 17, 2018

Which also makes it a really silly choice for actual construction.

Brick and concrete make a lot more sense.

pixl97 · on Jan 17, 2018

Expanding and contracting is good. Steel also expands and contracts. Bricks and un-reinforced concrete crack and crumble.

KozmoNau7 · on Jan 17, 2018

Odd how this 1930s brick building I'm in right now hasn't crumbled apart, or even cracked from settling to any noticeable degree.

pixl97 · on Jan 17, 2018

I'm guessing you've not been alive long enough to see the required maintenance that has been preformed on said building. People tend to miss and ignore things that aren't part of their profession

KozmoNau7 · on Jan 17, 2018

I have bricklayers and carpenters in my friends and family, and I've been on the association board for the building, for the last 10 years.

So I know quite well what kind of maintenance is needed. We replaced the entire 6000sqm tiled roof last year, it was the original roof, 80 years old.

The maintenance needed on a brick building is fraction of what's needed on a wood building.

wiredfool · on Jan 17, 2018

There are automated methods for determining if a bolt has been fully tightened, and they're generally specified for 100% of the bolted connections.

(The most common one is breakoff studs on the bolts, which actually speeds thing up because the wrench can grab the nut and the stud on one side of the connection and torque away till it pops)

Joeri · on Jan 17, 2018

The difference is that while you can’t make a bridge in your bedroom you can make an app.

Should we forbid people from writing code without the proper certification? Should we close down the open internet and replace it with a regulated zone where only compliant software can be run?

I agree that we need a higher standard of engineering in software, but I’m not clear on how to achieve it without draconian measures.

mulmen · on Jan 17, 2018

I made a lot of bridges and cranes in my bedroom. They all failed spectacularly but GI Joe didn't seem to mind that the shear strength of a Lego pin was inadequate to support his tank.

The difference is that I don't then promise a municipality that I can build them a bridge that can take their citizens across a river, even in case of a 100 year event. Or an early warning system that can save their citizens from nuclear holocaust. This is the difference in maturity between software "engineering" and real engineering disciplines.

You're free to experiment with your own resources but as soon as you make a promise to the public or your customers you should be required to meet your promises in all circumstances.

BurritoAlPastor · on Jan 17, 2018

This is a fine sentiment, but it's hardly a bright-line rule. We don't criticize Twitter's handling of abuse or fascists because they promised us a platform free of abusive fascists; Twitter didn't promise us shit except 140 character messages. All the problems were emergent.

So many of the great (financial) success stories of our sector are about startups stumbling into an untapped demand, and then running with it for as long as the money lasts. Nobody sets out to build a bridge – they build a 2x4 plank, and then realize that rather a lot of people want to walk on it.

mulmen · on Jan 17, 2018

[flagged]

oblio · on Jan 17, 2018

In a sense, software engineering is quite mature. It's probably matured a lot faster than most other engineering branches.

We know about static analysis, fuzzing, unit testing, integration testing, functional testing, Continuous Integration/Deployment/Delivery, etc., etc.

It's just that maturity costs a lot of money. 95% of the software produced just doesn't need that and the client can't even afford it.

pjmlp · on Jan 17, 2018

Most people also cannot afford to buy cars or houses in one go, yet regulations don't get ignored because of them.

oblio · on Jan 17, 2018

There is almost no way we can compare cars and houses with the average CRUD software, in terms of damage they can do to a person.

More than that, even in the case where damage does happen, most people would rate physical damage higher than non-physical damage, for example monetary loss.

pjmlp · on Jan 17, 2018

Surely we can, it is all a matter of what CRUD software is managing.

There are institutions that rate what the damage actually means.

pas · on Jan 18, 2018

> This is the difference in maturity between software "engineering" and real engineering disciplines.

That's a difference between contractors, people, groups, teams, projects, etc. It has nothing to do with the "disciple of engineering" software or otherwise.

I've seen shitty mechanical engineering projects, but hey, it was cheap, so I don't blame the customer! (We're fighting the shit someone churned out from the software side.)

No one expects to go to a mechanical engineer and ask for a new car, 4 wheels, blue color, 200 horsepower, and a week later "could you just make it 8 wheeled, thanks" and a week later "could you also make it so it'll be fast even when run on a mix of vomit and vodka", and so on, yet it's the norm in software.

And we somehow compare it to bridges, because a lot of software is "critical infrastructure".

Put the two together, and it's no surprise it seems that software engineering is not mature enough.

It simply means some kid (or a team of expensive consultants), who knew nothing about long term risks of the combination of attention fatigue and shitty UX, said yes. Lowest bidder and all that too. Or because this was mission critical it never occurred to the procurers that a UX guy/gal should look at it.

mulmen · on Jan 18, 2018

I think you're totally right about this but I also think it still speaks to the maturity of the industry and discipline of software engineering. We still have a lot of lessons to learn about building software, even in the cases where it is bid out like a bridge or built like an airplane. We just don't have as much experience under our belt. It's an oversimplification to say that software engineering is immature but I don't think it's wrong. I also don't think the current state of software engineering is even a bad thing or behind in any way, I just like to think we have a lot more to learn.

pas · on Jan 19, 2018

Who is we though?

I mean the problem in other fields were solved by a brutal trade off toward a draconian prohibition. You can't practice medicine/law/civil-engineering without a license in a lot of states/countries.

But of course you can give CPR, you can help your relatives sort out their pills. You can talk about laws/statutes/regulations with anyone.

So where would we draw the line? Maybe official/public procurement projects must have a process requirement clause, describing good engineering discipline (clear problem statement, analysis of the problem, analysis of aspects by the relevant domain experts [such as UI/UX, security, network, hardware, etc.], general specification, implementation plan with risks identified and testing methodology outlined, used development methodology [SCRUM, KanBan, whatever])?

This doesn't require setting up big and bureaucratic a trade guild, but if something bad happens we can look at the [pre]filed papers and determine who fucked up. The procurers, the experts, or the dev team.

diafygi · on Jan 17, 2018

Hmmm, I think it's too early to tell.

After all, we've witnessed move-fast-and-break-things social networks (Facebook, Twitter, Reddit) that had no desire to influence society in a big way get hijacked by state actors to achieve political goals.

We're also witnessing toasters having web servers pre-installed on them with no thought into what happens if the company that pushes patches (if they do at all) goes bankrupt, thus creating vast oceans of botnets that the rest of us now have to deal with.

Again, I think there's a long road ahead of hard lessons when we don't take our own power in software seriously.

geocar · on Jan 17, 2018

In the UK, you can build a bridge without certification as long as you have someone certified review the plans and the implementation before anyone else drives on it. I suspect this is similar for most civilised countries.

A (perhaps short-term) idea would be to make software vendors liable, and do not permit them to sign away that liability.

dcow · on Jan 17, 2018

I completely agree. But the vendor situation looks very different for software.

Imagine being the sole person responsible for migrating Linux from ip/nftables to ebtables. You don't know how your stuff will be used downstream. So you license it with text in all caps reminding people that your software is provided WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. And people still use it to build oil pipelines that bleed toxic goop when sent a really weird keepalive message.

I hope that the first place we see an uptick is security. The list has grown rather long: consumer/citizen identity breaches, hospital ransomware, digital asset theft, remote vehicle control, etc. Most security people I know are of the mindset that systems will be hacked. It's extremely.. pragmatic. You can't really fault the people responsible when many companies simply require good damage control over actual security in order to be successful.

But that's exactly the point. If the consequences were increased, not so many C[I]SOs would be okay with engineers using hot new software every few quarters. And those old boxes running Ruby 1.87 would sure as hell be patched or isolated to oblivion. Companies or projects with good security would flourish. Maybe some would be pressured to operate more like the archetype they're defending against (more red team, more organizational commitment to physical and operational security).

I worked at a security company that had to get rid of a slack bot screen lock game because it hurt some people's feels. So yeah I think some of the priorities in this industry are messed up.

geocar · on Jan 17, 2018

Those people who build oil pipelines should be able to sue whoever sold them the Linux distribution (probably Red Hat or IBM) which was vulnerable to that weird keepalive message.

In that case, a judge (and perhaps a jury) could hear how Red Hat did everything they possibly could to protect from the vulnerability as evidenced by their ISO QA processes and the fact that everyone else was vulnerable to the same "bug" … or from the other side how Microsoft and Apple weren't at-risk, so Red Hat should've caught it.

C[I]SOs would want to be patched, because ISO recommends they would be patched.

> You can't really fault the people responsible when many companies simply require good damage control over actual security in order to be successful.

Which is why I propose legislation, so "good damage control" wouldn't be enough.

You better believe that oil company would want some evidence of testing and proper specifications, and to have them reviewed by a couple independent parties if the government could take them for a percent of gross revenue for the security vulnerabilities alone.

ta76567656 · on Jan 17, 2018

I disagree, unless RedHat certified that the software they're selling is of safety-critical quality. If the oil pipeline company used, let's say, nuts and bolts that didn't meet safety requirements without checking that the parts met the requirements or being assured by the vendor that they did, I would say they're the ones liable and not the vendor.

cageface · on Jan 17, 2018

Maybe it would make more sense for people to be liable for data and not code. There's no need for my cute photo effects mobile app to undergo some kind of exhaustive code review but if I want to store credit card data or social security numbers maybe I should have to pass some kind of review.

viraptor · on Jan 17, 2018

> in other engineering fields (civil, chemical, mechanical, etc.) prioritizing safety and reliability is a _solved problem_.

One big difference between those and programming is that there (outside of military, physical security, and few other exceptions) there are no active adversaries for your project. You design a X that will serve a defined purpose. You design for some parameters (known wind speeds or local natural hazards).

You don't normally design with someone actively trying to destroy your building in mind. You don't design thinking how to avoid the current issues and to isolate the impact of potential future attackers. Your product doesn't have a "well funded attackers will spend undefined amount of time trying to make your project fail" requirement.

_pfxa · on Jan 17, 2018

> You don't normally design with someone actively trying to destroy your building in mind.

You do, actually. When designing a building, one considers many "attacks" it can be subject to, depending on the geography and demography of the site: what are the precipitation patterns like, what is the situation WRT sysmical activity, how is the soil, what is beneath, what is the crime rate in the area, what kind of animals live there, where does the wind tend to come from, how much sunlight is received, etc. etc., about all of these questions are in one way or another is relevant to security of a building, and you have to consider each and every one of them, and design accordingly. You don't design against destruction because that's hardly possible, there isn't much you can do to mitigate someone bombing a building or attacking with heavy machines. Think of that as analogous to using the ME in an Intel CPU.

jerf · on Jan 17, 2018

'When designing a building, one considers many "attacks" it can be subject to'

Your use of scare quotes is appropriate, because there is a qualitative difference between intelligently-driven and unintelligent attacks. Not to mention the intelligently-driven attacks when the attackers are manifestly more intelligent and skilled than the defenders! If computer security experts didn't have to worry about intelligent attackers, computer security would be very nearly a solved problem.

_pfxa · on Jan 17, 2018

I'm no security expert, but this article, and the FastCo article about NASA programmers suggest, at least to me, that computer security is a problem that emerges from incompetence in using tools and designing programs. We're yet to make the differentiation between hacking away amateurishly and building a software solution that will be commercially offered. A showerthough I just though is that only companies registered as software companies should be able to offer software commercially, with relevant regulations implemented on them including mandatory third-party auditing. This is not dissimilar to how any other sector of business works. I don't know US or EU, but in my country, for example if you want to sell foodstuffs, you need a certain certificate; if you want to produce foodstuffs, you need another certificate particularly for that purpose. So if you wanted to open a patissery, you would need a couple licences at least, you'd be subject to some control from the council's related unit, and there are institutions to handle any health problems or bad practices about this sort of commercial entities, however small or big they might be. I can't understand for the life of me why __at least__ the same level of standards are not in place for software that handles my money or my health data or my personal data. It hinders innovation? Well we saw what innovation can do both during the car boom and with Meltdown and Spectre, when nobody's actively, rigidly and methodically thinking about security and integrity.

WRT the house analogy, it's easy to extend that to "intelligent" attackers: intruders of any kind, e.g. robbers, animals, etc. Many install security cams, in my city (Istanbul) many condos have railings that protect the windows of the lower flats, we have locks on doors, alarms, barbed wires, safes, body guards, guard dogs etc., all to stop the intelligent attackers to actually using their intelligence. What's analogous in programming is using the best practices available, and the use thereof must be imposed on any critical systems (e.g. banks, medical institutions, communications tools [e.g. social media] etc.) by the governing bodies.

jerf · on Jan 17, 2018

"I'm no security expert, but this article, and the FastCo article about NASA programmers suggest, at least to me, that computer security is a problem that emerges from incompetence in using tools and designing programs."

While true that accounts for a large amount of the problem, probably the clear majority, computer security would remain a problem even if developers were uniformly highly competent. Competent use of existing crypto systems, which are broken three or four years later, would still be a problem. Meltdown and spectre would still be a problem. Building a safe execution sandbox is legitimately difficult.

But it would be a qualitatively different world than the one we live in.

Certification solutions to the software problem generally face the problem that it is very difficult to imagine any scenario other than one in which people grotesquely incompetent to write the certification rules are the ones writing them. We do not, for instance, want our certification authority to sit there and mandate waterfall design processes, which I would consider at least a 25%-probable outcome, and that's awfully large for something as catastrophic as that would be.

"WRT the house analogy, it's easy to extend that to "intelligent" attackers: intruders of any kind, e.g. robbers, animals, etc."

No, houses are never under such intelligent attack. Even when attacked by humans, they are not attacked by ninja stealth thieves who go in, photocopy your SS card, and get out again without leaving a trace or something that sounds absurd to even use as an example. There's no physical equivalent to breaking into millions of houses at a time and making off with such data. They're attacked by people who smash through the physical security. Anybody can do it. "Anybody" is who does it... above-average IQ people are not generally breaking into houses. (Above-average IQ criminals find much more lucrative and safe criminal hobbies.) Not just anybody can put a tap on a fiber optic line, feed it to a high speed data center, and process it in real time to extract out terrorism threat info, or even just exploit an XSS vulnerability on a website.

viraptor · on Jan 17, 2018

I specifically said that physical security is an exception and "You design for some parameters (known wind speeds or local natural hazards)." These are all initial assumptions and known ahead of time.

pizza234 · on Jan 17, 2018

> As we cast about trying to figure out ways to make software more secure or reliable, please remember that in other engineering fields (civil, chemical, mechanical, etc.) prioritizing safety and reliability is a _solved problem_.

The article makes an unfair comparison; if the solution to the problem is like the one below (extract from the article), the change, reasonably, will never happen in commercial programs (the ones "with 5000 errors").

> Take the upgrade of the software to permit the shuttle to navigate with Global Positioning Satellites, a change that involves just 1.5% of the program, or 6,366 lines of code. The specs for that one change run 2,500 pages, a volume thicker than a phone book. The specs for the current program fill 30 volumes and run 40,000 pages.

jballanc · on Jan 17, 2018

You over-estimate the degree to which the other engineering fields you mentioned have solved safety and reliability. Just to name a few:

- Tacoma Narrows bridge: https://en.wikipedia.org/wiki/Tacoma_Narrows_Bridge

- Thalidomide: https://en.wikipedia.org/wiki/Thalidomide

- Hyatt Regency walkway collapse: https://en.wikipedia.org/wiki/Hyatt_Regency_walkway_collapse

- Challenger disaster: https://en.wikipedia.org/wiki/Space_Shuttle_Columbia_disaste...

- Columbia disaster: https://en.wikipedia.org/wiki/Space_Shuttle_Columbia_disaste...

- Vioxx: https://en.wikipedia.org/wiki/Rofecoxib

...and plenty more. The only thing that separates software engineering with the other engineering disciplines is that there is a structure (usually in the form of code/spec enforcement) for internalizing and learning from disasters when they happen.

minikites · on Jan 17, 2018

The Tacoma Narrows bridge collapse happened 80 years ago, the Hyatt Regency was nearly 40. When did you last read about a software bug?

xenophonf · on Jan 17, 2018

Professional engineering has plenty of recent disasters to point to, whether failures in design, construction, operation, or security:

The Waco fertilizer plant explosion (2013), cause determined to be arson (2016)

San Francisco Bay Bridge seismic bolt failures (2013), caused during construction: https://www.nace.org/CORROSION-FAILURES-San-Francisco-Bay-Br...

The Fukushima Daiichi nuclear disaster (2011), design/regulatory failure

And so forth: https://en.wikipedia.org/wiki/Category:Industrial_disasters_...

And so on: https://en.wikipedia.org/wiki/Category:Engineering_failures

jcranmer · on Jan 17, 2018

Phones that suddenly combust came out in 2016. The Big Dig ceiling collapse was 2006. Rana Plaza collapsed in 2013.

Serious engineering failures are still happening. Sure, the most famous ones may be decades old, but those aren't the only ones.

incompatible · on Jan 17, 2018

From the article (1996):

And it’s the dominant image of the software development world: Gen-Xers sporting T-shirts and distracted looks, squeezing too much heroic code writing into too little time; rollerblades and mountain bikes tucked in corners; pizza boxes and Starbucks cups discarded in conference rooms; dueling tunes from Smashing Pumpkins, Alanis Morrisette and the Fugees. Its the world made famous, romantic, even inevitable by stories out of Sun Microsystems, Microsoft, and Netscape.

It's incredible how much software development has changed over the last 21 years. Change Gen-Xers to Millennials, update the playlist, and replace Sun Microsystems, Microsoft, and Netscape with Apple, Facebook and Google.

gjvc · on Jan 17, 2018

I can't tell if you're being sarcastic or not in your second paragraph. :-)

pjmlp · on Jan 17, 2018

That is my point of view as well, software needs to be liable just like in other fields of engineering, done by people that actually understand its consequences.

I don't do a car inspection at a guy that just happens to know some stuff about mechanics.

fh973 · on Jan 17, 2018

I think the difference is that in engineering you are legally obliged to follow this process and in software you are not.

The leads to a race to the bottom for which both the software vendors and consumers are responsible. Everybody wants the latest stuff as soon as possible, and wants to play less. If you release software only when everything's "right", the market will have moved on.

geocar · on Jan 17, 2018

> If you release software only when everything's "right", the market will have moved on.

... unless we the people demand legislation that protects us from software "vendors" selling us their bugs.

I think it's interesting that people can see a car that doesn't work, and a house that's falling apart, but they can't really see software. Can you imagine someone saying "my house is leaking" and the landlord saying, with all seriousness: "Have you tried turning it off and on?"

Of course, we have legislature for those things, so why not software?

riffraff · on Jan 17, 2018

I am sure other fields have a better development story these days, but isn't it a matter of perceived risk and relative effort?

I.e. we rarely hear about plane crashes caused by software, or power plants accidentally blowing up due to an integer underflow.

Civil engineers make solid bridges, but most software is not the golden gate, it's a shed thrown together by some guy who's not an engineer nor a professional mason.

mpweiher · on Jan 17, 2018

How many people die because of software? How many die because of cars?

We could make cars that don't kill people.

We could make software that doesn't break.

jlgaddis · on Jan 17, 2018

Damn, it's been nearly 20 years since qmail 1.03 was released (June 1998)? It sure doesn't seem like that long!

I recall setting up qmail "toasters" on FreeBSD to do virtual hosting. Maybe I was just too much of a "n00b" but I remember it being a big PITA to get all the services to play well together. There was this hip new outfit named Yahoo! that was using it for their new webmail service, though -- as opposed to sendmail, which pretty much every MTA on the Internet used at the time (and I was proficient enough with sendmail that I would edit my sendmail.cf by hand; pffft, who needs m4!?) -- so I assumed it was certainly capable of handling my volume of mail. (I wasn't running authoritative DNS servers at the time or I probably would've used djbdns over BIND as well.)

qmail, unfortunately, never did become too popular (relatively speaking, of course) and that's really a shame, because, as the quote in the article says:

> "We need invulnerable software systems, and we need them today, ..."

While that was certainly true then, it's even more true now.

On a side note, I'm surprised that the "qmail security guarantee" [0,1] wasn't mentioned in the article:

> "In March 1997, I took the unusual step of publicly offering $500 to the first person to publish a verifiable security hole in the latest version of qmail: for example, a way for a user to exploit qmail to take over another account. My offer still stands. Nobody has found any security holes in qmail. I hereby increase the offer to $1000."

[0]: https://cr.yp.to/qmail/guarantee.html

[1]: https://cr.yp.to/qmail/qmailsec-20071101.pdf (PDF)

geocar · on Jan 17, 2018

> qmail, unfortunately, never did become too popular

At one point, it was the second most popular MTA on the Internet. What pray tell would "too popular" look like?

> I remember it being a big PITA to get all the services to play well together.

When you were thinking about qmail correctly, it was an absolute pleasure to get everything to work together. Promise. Yet whilst the documentation was correct, it probably wasn't very good from the perspective of helping people think about it correctly. André Oppermann[1] (and perhaps Dave Sill[2]) did a much better job of this, so when they came available I would usually have pointed people there and see what kinds of questions they still had.

[1]: http://www.nrg4u.com/

[2]: http://www.lifewithqmail.org/lwq.html

zakki · on Jan 17, 2018

qmailtoaster.org maintained by Eric Broch remains updated regularly. The installation process is easy and you get current email server ‘requirements’ installed as well, i.e. spam filter, dkim, active sync, etc.

SwellJoe · on Jan 17, 2018

While qmail has faded in popularity as it has been sporadically maintained by a random bunch of folks over the years, there has been at least one other MTA written by someone with excellent security cred, and that has been continually maintained and has an excellent security record. We don't really need to mourn what could have been with qmail; we have Postfix, and it's really very good.

JdeBP · on Jan 18, 2018

Making a world-readable, world-searchable, and world-writable drop directory because of a decision to have no set-UID and set-GID executables in Postfix, even appropriate ones; failing to learn the even then well-known lessons of the batch job (at), UUCP, and printing (lpr) subsystems when it comes to world-accessible input directories; was a fairly large blot.

* https://cr.yp.to/maildisasters/postfix.19981221

* https://cr.yp.to/maildisasters/postfix.html

* https://groups.google.com/forum/#!msg/mailing.postfix.users/...

jlgaddis · on Jan 17, 2018

Yep. With a few exceptions, Postfix is the MTA I've used pretty much everywhere for the last 10 years or so.

djsumdog · on Jan 17, 2018

I remember qmail being the first MTA to really push Maildirs. I ran qmail personally back then on my Linux fom Scratch, but I also was a student lab admin and I think on our student e-mail server, we still ran sendmail at the time, on good old Redhat (back before it was split into RHEL and Fedora).

Software like qmail and the dev file system at the time really rubbed a lot of people the wrong way because of the drastic design changes they push. I'm glad that particular dev file system died as it had a lot of weirdly named nodes and a devfs daemon that had to run to create symbolic links to all the known names.

zAy0LfpBZLC8mAC · on Jan 17, 2018

Well, Maildir was invented by djb ;-)

arca_vorago · on Jan 17, 2018

DJB is probably one of my favorite people in the tech world. Ever since I read about the court case he won against the US government while representing himself, he's been a sort of hero of mine.

oblio · on Jan 17, 2018

That's quite impressive, but Wikipedia says that case was dismissed: https://en.wikipedia.org/wiki/Bernstein_v._United_States

tinus_hn · on Jan 17, 2018

The rules he was challenging had changed so what he wanted to do was no longer against the rules. His case was that the rules did not allow him to do something that was allowed by the constitution. As that situation no longer existed the case was dismissed.

pmoriarty · on Jan 17, 2018

My biggest takeaway from qmail has nothing to do with security, but rather that excessively restrictive licensing, highly opinionanted/unusual setup, and unwillingness to collaborate on its development squandered its potential.

If it wasn't for all that, we might well all be using qmail-based mail servers today, as qmail was really ahead of its time in so many ways.

It was kind of like the Amiga of mail servers, back in the day. It could have easily dominated the market, but it wound up a mere historical curiosity.

geocar · on Jan 17, 2018

> …highly opinionanted/unusual setup…

That setup contributed to security.

It also made qmail very easy to extend: I operated a medium-sized mail service a while ago and qmail's pluggability meant I could add features that didn't exist in other mail servers (like postfix or exim) without forking the entire project.

* I had SMTP AUTH when the only other mail server to support it was Netscape; before RFC2554 was written.

* My qmail-popup proxied to another machine so it didn't need root (making me immune to the Guninski vulnerability) so my users only needed to know about a single hostname regardless of where their mail was stored, and without needing to use something icky like NFS.

* I had a web interface with auto-enrolled client certificates for authentication and confidentiality for SMTP, POP, and IMAP.

* My qmail-remote recognised certain suggestions our IP was being blacklisted and would retry immediately with a new IP.

* My qmail-remote recognised certain greylisting error messages and rescheduled retry for that time.

* I had multiple mail queues based on the number of retries the message had seen.

And so on. I didn't start out to make those features, they just grew over a decade or so organically. At no point would I have forked postfix or exim to add any of those features because once you fork it you own it unless you can get your changes upstream. I had shit to do, so the real alternative was simply buy more servers and/or pay for commercial software.

I wish the model had caught on, because it's a superior way to develop software. I didn't understand why though until fifteen years later...

> …and unwillingness to share and collaborate on its development…

Dan absolutely collaborated, and I certainly was using betas back in 1996.

If my memory/anecdote isn't enough: There are a number of explicit points of evidence in the changelog distributed with qmail.

What he doesn't do is let people save face when they say something incredibly stupid and then try to backpedal when it's obvious how wrong they are. This bruised more than a few egos, and contributed to a campaign to actively smear his name and discredit his software.

> It could have easily dominated the market…

I think if Dan had let the peanut gallery have their way, we probably wouldn't have gotten postfix, but then qmail wouldn't have been qmail except in name. What's the value in the qmail brand if it isn't secure anymore?

Steeeve · on Jan 17, 2018

Having strong personal convictions about security and architecture was one of the primary reasons that tinydns and qmail were the staples that they were.

Yes, they have been feature locked for a long time. But let's not pretend that collaboration in 1996 was anything like it is today. And let's not pretend that there haven't been a plethora of security issues in all kinds of software that simply haven't befallen Bernsteins' software as a direct result of how he managed the projects.

I had the distinct impression that "dominating the market" was never a thought in his mind. He wrote the software he wanted to write and shared it. It was good software. It's still being used by many people.

The man isn't infallable. He isn't always right. But whenever I paid attention to him, he was right a lot more frequently than he was wrong.

The fact that he wasn't driven by dollars or market dominance was a good thing.

I, for one, admire the man and his software.

oblio · on Jan 17, 2018

> I wish the model had caught on, because it's a superior way to develop software. I didn't understand why though until fifteen years later...

Can you elaborate?

geocar · on Jan 17, 2018

Which part. The model? The article explained it well enough. I think of it as two main parts: (1) make fewer bugs, and (2) show your users the correct way to do things (which is the inverse of getting the correct requirements).

Or are you asking what took me fifteen years to understand?

Dan basically planned the whole thing. When the whole thing was too big, he would break off a piece that wasn't and develop it as a wholly separate thing.

I understood this strategy worked, but I didn't fully understand why this strategy was successful until I realised source code matters more than I thought[1]: If your program is short enough, there won't be any bugs in it. This is why learning how to read and write dense code[2] can make your programs better -- and that's what I learned how to do[3].

[1]: http://www.jsoftware.com/papers/tot.htm

[2]: http://www.nsl.com/papers/origins.htm

[3]: https://github.com/geocar/dash/blob/master/d.c#L62

hursortue · on Jan 17, 2018

> learning how to read and write dense code

> and that's what I learned how to do

This is good code?

  for(i=b=0;i<r;++i){switch(p[i]){
  case'\n':if(!e)e=i;if(!q){q=kpn(p+w,g-w);if('\r'==(cf=p[i-1]))cf=p[i-2];}else if(!c){w=!!strchr("kK1",cf);if(!o)sc(d,0,sd);else if(w){if('g'==o)cwrite(f,OK("200","keep-alive")BLANK);else cwrite(f,OK("204","keep-alive")END204);}else{if('g'==o)cwrite(f,OK("200","close")BLANK);else cwrite(f,OK("204","close")END204);close(f);}a=k(o?-d:d,"dash",knk(2,q,xD(a,v),0),0,0);b=i+1;if(!o){if(!a){poop(f);R 0;}if(10!=a->t){r0(a);poop(f);R 0;}writer(f,kC(a),a->n);r0(a);if(!w)close(f);}if(b==r)R 1;q=0;o=0;a=ktn(11,0),v=ktn(0,0);}else{if((c-m)==10&& !strncasecmp(p+m,"connection",c-m))cf=p[s];js(&a,sn(p+m,c-m));jk(&v,kpn(p+s,e-s));}w=e=g=s=c=0;m=i+1;break;
  case' ':case'\t':case'\r':if(w&&!g)g=i;if(s==(e=i))++s;break;
  case':':if(!c)s=1+(c=i);break;
  case '/':if(!w)w=i+1;case '?':case '&':if(r>=(i+4)&&p[i+1]=='f'&&p[i+2]=='=')o=p[i+3];default: e=0;break;}}if(a)r0(a),r0(v);if(q)r0(q);R b;}

fiddlerwoaroof · on Jan 20, 2018

That's how people with an APL background write C and it is, in fact, readable to those people: to me, however, it's gibberish

geocar · on Jan 17, 2018

It could be shorter if I spent some more time on it.

irundebian · on Jan 17, 2018

LMFAO

b6 · on Jan 17, 2018

I think you're missing something if that's what you got out of qmail. The main idea I internalized was, if you find yourself making programming mistakes, take some time to write an API that does not allow you to make that kind of mistake and commit to using it everywhere.

The licensing stuff, I'm not super clear on. If memory serves, DJB did not want to worry about bugs introduced by patches he'd never seen or approved, added by package maintainers for OSes he didn't use.

As for the somewhat weird ecosystem (daemontools), I think it wasn't that it wasn't good enough, it was just that people always find reasons to be dissatisfied. I can't even keep track of the latest reinvention of whatever people are using instead of daemontools, but I bet it's a hundred times as complex, and much less reliable.

JdeBP · on Jan 17, 2018

The licensing stuff was simple: There wasn't a licence. Some people creatively misinterpreted M. Bernstein's page, explaining what one could do under the law in the U.S. as it stood even without a copyright licence, as a licence. But that was a misinterpretation.

* http://jdebp.eu./FGA/law-licence-free-softwares.html

Daemontools ironically wasn't weird at all, as evidenced by the fact that over the past almost two decades the daemontools world got us changes to softwares, all of those do not daemonize options that have appeared in that time as well as things like removing mysql_safe and other Poor Man's Dæmon Supervisors, that made it a lot easier to use those softwares with other service managers.

* http://jdebp.eu./Softwares/nosh/mariadb-and-mysql.html#Promp...

Some of the things that people use instead of daemontools are not much less reliable. (-:

* http://jdebp.eu./FGA/daemontools-family.html

viraptor · on Jan 17, 2018

I still don't get djb's distinction between untrusted and minimal privilege code. What he calls "not violating security requirements" is effectively a successful least privilege approach. Very few elements can become hacked without breaking security requirements. If you can't gain anything from hacking a piece of software, then why is it even executed? - it obviously didn't deal with anything the user wants.

In his example, yes, you could change the DNS responses, but you still could not escalate to a higher lever where you can potentially modify stored user data. That is a success in practice.

geocar · on Jan 17, 2018

Something that can respond to a DNS request can put whatever it wants in the response. If there's a bug in that program, then whoever controls that bug can put whatever they want in the response.

The only protection from this is to make the code that does this as small as possible so that us human beings can convince ourselves that it is correct and that the risk of a bug that someone can control is zero (or as close to zero as to make no odds).

When Jim Reid wants to pat himself on the back because "at least they didn't get root on my nameserver box", he misses the point: gethostbyname()'s spec doesn't say "it may or may not return. if it returns it could return anything. don't trust it, don't even use it!" They say gethostbyname() return a structure describing the address of the named Internet host, so people expect that and depend on that. Something that "suddenly" violates that gets in the news[1]. Fortunately, nobody remembers what Jim said so the BBC doesn't ask him for a comment.

Anyway.

"Minimizing privilege" doesn't solve that problem because the DNS server needs the privilege to respond to DNS requests.

It might be easier to think about a better example. Let's talk about zlib.

A program that needs to decompress some text is not concerned with the contents of the compressed text, only the uncompressed text. Resource limits on our program exist to keep some things from getting out of control[2], but what about bugs?

If we could run zlib's decompress() with the permission only to decompress text, then the worst-case impact would either spin the cpu or be equivalent to "getting out of control". What do we need to do that?

• No creating file descriptors can be done with setrlimit() except for the dynamic linker is going to open a shittonne of files. We need to know what the minimum number of files are, and decompress can't ever change that without changing our program anyway.

• No accessing files or the network could be done with a setuid wrapper and iptables. At least on Linux. Most programmers don't do this, and most sysadmins only do what they're told, so in practice this doesn't happen.

• Sandboxing! Google published some clever user-level sandboxing that works on Linux to whitelist each syscall. This "verifier" could do it as long as it's smaller than decompress()!

That sandboxing one is tricky: A tiny inflate routine takes around 500 lines of C done the normal way, but how big is our sandbox? Probably a lot bigger.

• Ask the operating system for help! This is what DJB suggests. Ask for a disablefiles and a disablenetwork system call. OpenBSD is implementing this with their pledge[3] system call.

There's not a portable and satisfying solution here yet, but you can see they all cluster around reducing the privileges of the untrusted program.

Now, what's to prevent decompress from lying? What if someone can produce a content stream that causes a future decompress run from producing invalid results. Maybe something really sneaky[4]. What possible protection could we have?

As you can see, in this case so long as decompress is supposed to produce "text", there's nothing we can do to make sure it produces the "correct text".

That's why DJB doesn't want to focus on the "untrusted" aspect, and instead on trying to solve the problem that we have to solve anyway: How do we write software that is correct?

[1]: http://news.bbc.co.uk/2/hi/technology/7496735.stm

[2]: https://swtch.com/r.gz

[3]: https://man.openbsd.org/pledge.2

[4]: https://cmaurice.fr/pdf/ndss17_maurice.pdf

irundebian · on Jan 17, 2018

That's a great explanation, but I still don't understand why he says that the principle of least privilege is _fundamentally_ wrong. I fully agree that POLP could lead to an illusion of security or doesn't ensure user's security requirements, but that doesn't make it fundamentally wrong. The correct point is, that you shouldn't over prioritize POLP over code correctness. Maybe he is just arguing against the very strict implementation of POLP I could also agree, but in general, I would argue that POLP is fundamentally true and necessary, but that doesn't mean you should implement complex fine-grained solution with a lot of administrative overhead.

As soon as you build non-trivial systems, you have to contain error propagation with POLP, although you are striving to build simple and secure systems.

geocar · on Jan 17, 2018

DJB is drawing a distinction between two designs in his paper.

1. Netscape had a "dns helper" -- which ostensibly could only do DNS lookups, is designed in the principle of least privilege.

2. Ariel Berkman's xloadimage implementation -- which implements every image loader as a separate filter in a separate process who can do nothing but input image data and output image data (in the "common" format), is designed around eliminating trusted code.

The former could (and did) suffer a bug that affected DNS lookups, and was convinced to perform all sorts of network traffic since, it by definition needed to perform network activity to do it's function, and it could access files like resolv.conf because again, it needed to do that to perform it's function. That it couldn't be exploited to "yield root" wasn't really relevant, since most people didn't run Netscape as root. It could read user files and ship them over the Internet which is frankly bad enough.

The latter, is what DJB is recommending.

irundebian · on Jan 17, 2018

I would argue that both are designed following the principle of least privilege. Netscape haven't had the luck of having correct code. So what would have helped in Netscapes case? How would eliminating trusted code work in this case? Netscape has to do DNS lookups. I'm not sure if there was much more left to do as writing secure correct code. And of course you should prioritize writing secure correct code over implementation of least privilege. That doesn't make the principle of least privilege fundamentally wrong.

My opinion is that if you design your software securely threat modeling should result in the decision of implementing the least privilege principle and whether it makes sense and benefits (complexity vs benefit) or not. Of course you better eliminate trusted code so that there are less case where you have to get to these decisions. I assume that soon or later, there are situation, where you can't eliminate trusted code and it makes sense to implement least privilege.

geocar · on Jan 17, 2018

> I would argue that both are designed following the principle of least privilege.

Okay, but that's not what DJB means, and attempting to read his words with the definitions in your head, instead of the definitions in his head won't help you understand him.

I'm not going to humour an argument about mere semantics: For the purposes of this discussion they are not both the "principle of least privilege".

> So what would have helped in Netscapes case?

Writing the DNS client correctly.

DJB's point is that absolutely nothing else would help: You can't realistically put a box around buggy code as long as the code needs privileges.

And all that effort in writing that sandbox? A waste of time; fundamentally the wrong thing to focus on. Writing a DNS client is far less work.

> I assume that soon or later, there are situation, where you can't eliminate trusted code and it makes sense to implement least privilege.

That was what DJB assumed when he wrote Qmail, however he is now convinced that was wrong. His paper gives some explanation why.

If you can't eliminate trusted code, and it's still big enough you think there might be bugs hiding inside, you should rethink your design.

viraptor · on Jan 17, 2018

Right. I think I see the difference he intends. I see this more of a practice -vs- theory issue. (Or in isolation/in deployment) In theory he can work on designing the correct version of gzip and there's a chance he'll succeed. But in practice, I'm still putting a seccomp/pledge-equivalent on it, because if he fails, I'm stopping local root escalation and potential lateral movement, which he doesn't seem to think are interesting consequences.

geocar · on Jan 17, 2018

There's a point of subtlety remaining:

DJB isn't advocating against using seccomp or a pledge-equivalent.

DJB is advocating against stopping there.

viraptor · on Jan 17, 2018

That's definitely not how I understand this DJB quote:

> I have become convinced that this “principle of least privilege” is fundamentally wrong. Minimizing privilege might reduce the damage done by some security holes but almost never fixes the holes. Minimizing privilege is not the same as minimizing the amount of trusted code, does not have the same benefits as minimizing the amount of trusted code, and does not move us any closer to a secure computer system.

By "does not move us any closer" I don't believe he wants us to do it at all.

geocar · on Jan 17, 2018

> By "does not move us any closer" I don't believe he wants us to do it at all.

Then take a look at § 5.1 of the paper which gives a clearer example with which to draw the distinction.

Eliminating trusted code is what you're doing by decorating uncompress with pledge() with any capability to acquire resources; anything beyond stdio (or seccomp)

Minimizing privilege means focusing on finding some other argument for pledge().

twhb · on Jan 17, 2018

I think he intends “privilege” to refer only to filesystem and other OS-level privileges, not more generally to the capabilities of code, and I think he uses “untrusted” to mean minimally-trusted—more restricted than the OS can enforce.

Taking the DNS Helper example, one could imagine a function-like DNS Helper which has the capability only to return a value. This would make libresolv just a bug, not a security hole, because the attacker would only pervert their own request.

joveian · on Jan 17, 2018

My favorite quote from that paper is "I have discovered that there are two types of command interfaces in the world of computing: good interfaces and user interfaces."

As others have pointed out, one thing left out of the paper is not updating the software. qmail doesn't support SPF or other security extensions, which makes it useless these days without patches.

1110001110 · on Jan 17, 2018

Interesting article, the only thing I fail to see how this is related to Meltdown and Spectre. Those are not simple 'bugs', it's multiple good features of modern processors combined to yield an attack vector. My opinion is that with any level of process problems like this will arise sooner or later just because the complexity is so high.

dchest · on Jan 17, 2018

They are caused by performance optimizations with disregard to security (likely not intentional, but caused by not considering security aspects of optimizations carefully).

farnsworthy · on Jan 17, 2018

Nice summarizing article, with some programming concepts—explicit data flow, for example—that are even more generally applicable (though the topics of security and code volume/quality be linked).