Long term support considered harmful

thaumaturgy · on Jan 27, 2015

> Frequent upgrades amortize the cost and ensure that regressions are caught early. No one upgrade is likely to end in disaster because there simply isn’t enough change for that to happen.

Oh, how I wish this were true.

For what it's worth, it's pretty true as far as OpenBSD is concerned, in my experience. But OpenBSD is the exception here, not the rule. Everywhere else, developers all seem to have embraced "break early, break often".

Eventually you get burned. For me, it was a routine should-have-been-minor web server update where one of the packages I relied on suddenly became unsupported and every single hosted site stopped working. Since there's no way to roll back server upgrades, I had a marathon night involving building a new server stack and migrating all hosted sites there by 8 a.m.

But you can't yell at anybody when that happens, because the answer's always the same: it's not the developers' fault.

Who really believes sysadmins wouldn't update everything all the time if they could? Old, dodgy, out-of-date servers exist exactly because updates are butthole-puckering, because everyone's been burned at least once by a "minor" update, and because once the damage is done, undoing it is horrifyingly difficult.

fluidcruft · on Jan 27, 2015

Many times you have to hold back software upgrades on things like MRI scanners to wait for multi-year research studies to complete, and often new studies start up in the interim so that locks you down for even more time. Scanner upgrades change all sorts of things in ways that introduce all sorts of confounds.

Not to mention that in the real world scanner upgrades often break surprisingly fragile clinical workflows. Technically, the engineering and processing of the scanners are improved quite a bit in one aspect or another by the upgrades, but old workarounds need to be replaced by new workarounds etc and documentation is very sparse and quite uninformative.

Figs · on Jan 28, 2015

Hmm... what does an MRI scanner update do? I would have thought a system like that would just record whatever it gets from its sensors and any updates needed would only apply to the analysis and visualization software... Do the updates actually modify what the hardware does during scanning? Or, is it all painfully coupled because some sort of interactivity is required during scanning?

btilly · on Jan 28, 2015

MRIs do not work like you think they do.

A single measurement is that you magnetize the body in a particular pattern, then watch how that magnetic pattern fades. Then do it again in another pattern and repeat. Think of the patterns as being terms in a Fourier series, which you eventually will do a Fourier transform on to get the original thing.

The name of the game, therefore, is to be able to get away with as few measurements as possible, and to be able to perform measurements here while still recovering from measurements there. Oh, and while we're at it, let's try not to be thrown off from things that tend to move around. Like arteries do every time another heartbeat comes through.

So yes..there is a lot of interactivity in an MRI measurement.

EpicEng · on Jan 29, 2015

You may think they are simple machines which do one job and pretty much never need to be changed (not an unreasonable assumption), but that's not the case. I work in the medical device field and, well, you still have to sell instruments. To do that you need to beat the competition. To do that you need more features which make the doctor's/tech's life easier and the diagnosis more accurate.

That doesn't mean upgrading is easy. At the 501k/PMA level it pretty much always requires a re-filing, so you try not to do it often. But you do improve the product over time.

fencepost · on Jan 28, 2015

Probably improved image processing - noise removal, sharpness, could be a whole range of things, possibly down to something as seemingly simple as changing motor stepping for some of the actual moving parts.

edofic · on Jan 27, 2015

"Since there's no way to roll back server upgrades"

It is if you run a modern filesystem like ZFS or btrfs. You just do a cheap snapshot before upgrading(can be automated) and roll back if there are problems. Even works with lvm.

KaiserPro · on Jan 28, 2015

Sadly as zfs isn't as widely available as I'd like, you can use LVM to provide snapshots.

Its not as friendly as ZFS snapshots, but it is at least available in centos 5 https://www.centos.org/docs/5/html/Cluster_Logical_Volume_Ma...

(sadly somethings need long term support)

lsc · on Jan 28, 2015

rolling back a LVM snapshot involves dding off the snapshot, and on to whatever you want your production disk to be (or just running off the snapshot forever, which has.... performance consequences with LVM.)

Yes, LVM snapshots exist, but they are of limited utility compared to ZFS and the like.

I've been experimenting with CentOS6 and ZFS on Linux; so far it looks pretty good. it handles failing consumer grade hard drives vastly better than lvm on md, and snapshots are inexpensive.

b101010 · on Jan 28, 2015

You can also roll back lvm snapshots using the merge option

https://access.redhat.com/documentation/en-US/Red_Hat_Enterp...

lsc · on Jan 28, 2015

nice. I had not seen that before... that makes it a lot more useful.

KaiserPro · on Jan 28, 2015

Oh yes, LVM is abhorrent, along with mdadm.

Thats why I'm not so happy about the direction the tools are going with BTRFS. If ZFS isn't around, LVM is your only real option.

However, RHEL usually sets up LVM, so you might as well use it.

thaumaturgy · on Jan 27, 2015

Ooh, thanks. That's a good idea. I'd never considered running ZFS on my small servers.

keithpeter · on Jan 27, 2015

Or perhaps simply gzip of the / partition before upgrading. If insuperable problems, zap new / and restore old known working one?

Assuming data/sites are on different partitions.

kbenson · on Jan 28, 2015

While initially conceptually easier to grasp, that is far inferior to using a snapshot.

Here's a short list of ways in which that may cause you problems:

1) gzip of a path is not point in time, synced files may no longer be in sync since they were backed up at slightly different times (e.g. I hope you didn't expect database consistency to actually mean anything).

2) gzip of a path will take a while, because it has to actually function on every file (a snapshot is generally copy on write, meaning it's "free" (not quite) for every file until it's changed. Throw away the snapshot before a change and there's no need to copy the file.

3) gzip will take more room (see #2)

keithpeter · on Jan 28, 2015

Excellent reply. The implications of a significant number of transactions per minute had escaped me.

davexunit · on Jan 28, 2015

Or, you could try out NixOS/GuixSD, which support transactional upgrades and rollbacks for the full system. No need to take a disk image (outside of your normal backup routine, of course).

vacri · on Jan 28, 2015

btrfs still isn't production ready. I went to a talk from a btrfs dev two weeks ago, and it was still "use btrfs with caveats", not "just use btrfs".

vel0city · on Jan 28, 2015

Or, if you're using VMs, just take a snapshot and roll back if something doesn't work right. For anything that isn't virtualized, there's ZFS.

lsc · on Jan 28, 2015

>Eventually you get burned. For me, it was a routine should-have-been-minor web server update where one of the packages I relied on suddenly became unsupported and every single hosted site stopped working. Since there's no way to roll back server upgrades, I had a marathon night involving building a new server stack and migrating all hosted sites there by 8 a.m.

The traditional way to handle this in a cluster is with testing. The idea being that you have the upstream repo, the testing repo, and the production repo.

Note, in most cases 'repo' means "directory tree served via http" and "sync" means "Copy, you know, with rsync or something" - This is not complicated.

10% of your servers point at the testing repo, the rest at the production repo.

every X days, you set a test box against the upstream repo, update, reboot, and run your tests, you know, to catch the obvious stuff. If that works, you sync your testing repo with the upstream repo, and so 10% of your production now runs your new stuff. This is where you are gonna catch most of your problems, in my experience, but if something chokes, you only lose 10% of capacity.

after Y days of the test stuff being on 10% of your boxes, you sync the test repo to the production repo.

Of course, if you are like me, and when 10% of your boxes are down 10% of your customers are down, you want to spend a lot more effort on the 'test before you get customers on it' step.

Also, sometimes there are 'don't sleep until you roll it out everywhere' updates, like this one (oh god, I am so happy that srn is on that now and I didn't have to deal with it) or like shellshock. In that case, well, sometimes you sync the upstream repo straight to production.

yellowapple · on Jan 27, 2015

Even with OpenBSD that isn't always true. The time_t fixed killed binary-compatibility between 5.4 and 5.5 on 32-bit systems, which meant that any installed packages had to be uninstalled and reinstalled if one were to perform an upgrade. The OpenBSD project isn't afraid to break compatibility if it means fixing a bug - something that I think is a very good thing, but has implications in terms of support.

On another note, upgrade hell is a pretty convincing argument for systems like NixOS, where upgrades can be easily rolled back, configurations are declarative, etc.

PythonicAlpha · on Jan 28, 2015

I also can not recommend the "frequent upgrades" model.

When systems would be more stable and less problems would occur at updates, this would be right. But the reality is simply different. My experience is, that on many updates there are surprises:

- software that once was good, just has gone garbage since the last version

- some problems with not so common software combinations that was not found from package maintainers

- some device drivers that do not cooperate

- desktop environments that do no longer support some options as before or just have gone bad

- legacy data is not supported by newer application versions or some subtle problems with this data occur ...

- ....

Also, when upgrades would be possible without hassle and troubles, that model could work -- but it just is not. For example: I wanted to install a newer Ubuntu version on my hosted server. But the automatic upgrade process explicitly says, that it should not be done via a remote session. So for a hosted system, I have to fall back to a complete new install (backup data, fresh install, complete new configuration, re-install backed up data).

Also, when you get into trouble, it is not possible to easily go back to the last stable version (in this situation, virtualized systems are very useful).

In such a case, it is clear, that I don't want to give up my life to always have the newest stuff on my server.

I regret, that OpenBSD is not for me (unless OpenBSD does never suffer from those troubles).

marcosdumay · on Jan 27, 2015

> and because once the damage is done, undoing it is horrifyingly difficult

That's the one reason why I'm looking at NixOS. There's no inherent reason for it to be hard.

ryantrinkle · on Jan 28, 2015

Go for it! I've been using NixOS personally since February 2014 and in production since September, and it's been fantastic.

vacri · on Jan 28, 2015

But OpenBSD is the exception here, not the rule.

It's worth mentioning that OpenBSD doesn't support a lot of the hardware people use, doesn't support a lot of the applications people use, and is a fairly specialist distro. They do some awesome things, but they also benefit greatly from not having to support regular non-techie end-users.

Edit: As an example, the upgrade process that you have to do at least once a year[1] is hardly something that a tech naif would find painless

[1]http://www.openbsd.org/faq/upgrade56.html

4ad · on Jan 28, 2015

And what hardware it does not support? And what applications it does not support? I'm tired of this meme. OpenBSD runs on virtually everything you throw it at, and virtually all open source is supported.

In some cases OpenBSD hardware support is better. E.g. I have a few laptops where suspend only works in OpenBSD, but not Linux. Also a few years back my WiFi cards were supported natively only in OpenBSD, not Linux. For other hardware, it might be the other way around, but overall it's pretty unlikely to find something OpenBSD does not run on.

And for software, it's exactly the same. Sure there's some Linux-specific software out there, but the bulk of it, is not.

vacri · on Jan 28, 2015

Hipchat and Skype were two popular applications that I used today that don't run on OpenBSD. I ran them on my laptop with an nvidia GPU, which isn't wonderfully supported by linux, but it's even less supported by OpenBSD. Nvidia is a pretty damn common brand. Flash, for all it's sins, is still popular and not supported properly. Steam is another popular application that doesn't work on OpenBSD. Admittedly you said open source, but frankly, I didn't. I said 'applications people use'. It's utter bullshit to move the goalposts and then chide me for inaccuracy.

Then there's virtualisation software in general, which is increasingly popular and widespread, which OpenBSD doesn't support well, if at all (kvm, xen, vmware, virtualbox, and friends. qemu by itself is s.l.o.w.). Docker and other containers are really taking off at the moment and have a lot of mindspace, though admittedly these are linux-specific. Openstack is another significant emerging bit of software that doesn't support BSD as a host.

Then there's plenty of stuff like this http://blog.lxde.org/?p=1111 where OpenBSD could be better supported but is a broken experience. Legitimate reason, sure (not enough eyes), but it's still a broken experience.

mathattack · on Jan 27, 2015

I worked a firm that supported two main branches that forked >5 years apart. It was a nightmare fixing both simultaneously, especially after the fork was 5+ years old. In the end they wound up breaking something with every new functionality.

joncrocks · on Jan 28, 2015

> Eventually you get burned. For me, it was a routine should-have-been-minor web server update where one of the packages I relied on suddenly became unsupported and every single hosted site stopped working.

I'm not wishing to be inflammatory here, but surely you could have exactly the same problem with something that is updated 'slowly'. If you're relying on the software you get being 100% correct/bug free every time you get it, you're building your house on sand.

This is where having testing (ideally a good set of automated testing) is invaluable. Having a robust set of tests you run before you roll out changes to 'production' is important if your 'production' is "we can't afford for this to go down."

Along testing, you need to have a backout strategy. What do you do if it all goes wrong? This is usually very similar to your backup/restore strategy as the problem is generally the same: If your server gets hosed/breaks/fails, what do you do?

hermanradtke · on Jan 27, 2015

>> Frequent upgrades amortize the cost and ensure that regressions are caught early. No one upgrade is likely to end in disaster because there simply isn’t enough change for that to happen.

> Oh, how I wish this were true.

Your anecdote is even more evidence that it _is_ true. Most minor upgrades do not end in disaster. Few of them do end up in disaster.

Frequent updates still require some diligence on the part of the person or organization updating.

AlisdairO · on Jan 28, 2015

The difference with irregular upgrades is that it's clear when the breakage is likely to occur and you can plan around that.

xorcist · on Jan 28, 2015

> Since there's no way to roll back server upgrades,

Apart from the obvious comment about snapshots (at the volume or machine level), really most of the time your problems are because of a specific package.

And while most package managers don't support downgrades per se, you can just remove the offending package and install the old one. Nine times out of ten that would give you more time to fix the problem.

That said, you need to test before you go to production, no matter how trivial the patch may seem. Staged rollouts, a separate environment, or both.

And that's also the elephant in the room in Ted's rant. Sure, if we didn't have to test anything, we could update every 6 months. But we have to, and we can't.

rlpb · on Jan 28, 2015

> I had a marathon night involving building a new server stack and migrating all hosted sites there by 8 a.m.

This is because of a shortcut that somebody took when building your server originally, by failing to make the deployment reproducible.

> ...where one of the packages I relied on suddenly became unsupported

I've never heard of this happening in any Linux distribution. Can you be more specific? Did you choose to use some third party source for a package here? If so, how can you expect your distribution's developers to support you going off-piste in a way that they never claimed to support in the first place?

> Old, dodgy, out-of-date servers exist exactly because updates are butthole-puckering, because everyone's been burned at least once by a "minor" update, and because once the damage is done, undoing it is horrifyingly difficult.

You might want to look into "DevOps". The idea is that you script your deployments together with automated tests for them. There are many tools to help you do this now. With this in place, nothing you've stated is true any more.

thaumaturgy · on Jan 28, 2015

> I've never heard of this happening in any Linux distribution. Can you be more specific?

It was a couple of years ago. I know somewhere I have notes on it, but I can't find them just this minute. I remember that it had something to do with one of the components in my apache-mpm-worker--php5--libapache2-mod-fcgid--apache2-suexec-custom--libapache-mod-security stack. It was something like, mpm-worker no longer supported libapache2-mod-fcgid or some such thing. At the time, I was really good about doing regular server updates, so when it happened, I spent some time researching it but eventually found the package had been removed during the update and was no longer supported, with no workaround aside from finding a new way to build an apache server.

Either it got fixed or I'm running a slightly different stack now than I was at the time. Sorry I can't be more helpful.

> This is because of a shortcut that somebody took...

> You might want to look into "DevOps"...

That somebody was me, and I'm aware of devops. Pretty big fan of it actually. Unfortunately, I'm just a small MSP, the owner and the senior tech and the sole software developer and the sysadmin, and I don't charge enough. The servers exist as an add-on service for my clients, especially ones that have special needs that other hosting companies can't easily meet. They work, the stacks I built a few years ago are robust and efficient (one of my customers was featured on a popular national radio show with a reputation for killing websites, but their site stayed up and responsive the whole time). Reproducible deployments, centralized management, further improvements to automated security, etc. are all on my to-do list -- along with like a couple dozen other things.

I was supposed to rebuild all of the servers during last December, typically a slow period, but instead it was unseasonably, hair-on-fire, brain-meltingly busy, and it still hasn't let up.

rlpb · on Jan 28, 2015

> It was something like, mpm-worker no longer supported libapache2-mod-fcgid or some such thing.

No stable release distribution ever does this within a release, unless there is a security issue that cannot be fixed any other way. In this case would you prefer to have remained vulnerable?

Of course, mistakes can happen, but they would be fixed in a further regression update, or you could have even looked at fixing the bug yourself.

Based on the package names you're likely talking about Debian or Ubuntu. In both cases, you could have just downgraded the packages for a quick (albeit temporary) end to your emergency.

sorbits · on Jan 28, 2015

> I've never heard of this happening in any Linux distribution. Can you be more specific?

Not the OP but I run a personal server for mail etc., and I do remember one LTS → LTS upgrade of Ubuntu that removed the DKIM milter that my postfix config depended on.

So it happens.

> You might want to look into "DevOps". The idea is that you script your deployments together with automated tests for them

For a personal server the things that in my experience break are related to configuration files.

This can be subtle things, like some new option enabled by default that conflicts in some edge case, it can be upgrade scripts that butcher existing customized configuration files, or it can be a total restructure of how a package has structured its config.

All of this has happened to me, and I don’t think having scripts able to redeploy my server would have been of any help resolving these issues.

rlpb · on Jan 28, 2015

> ...and I do remember one LTS → LTS upgrade of Ubuntu...

That's different. Supported features do change between distribution releases. But you don't ever need to do an emergency security update between releases. You have years to plan for a release upgrade. The grandparent was referring to unexpected emeregency breakages during updates within a release, which is an entirely separate thing.

PythonicAlpha · on Jan 28, 2015

For people with >2 servers, that might be the right solution, but for the others, scripting deployment might be just more overhead.

derekp7 · on Jan 28, 2015

I can see that, especially if someone is administrating their own small site, and hasn't had experience in larger shops. But at the minimum (and since it is cheap enough) you should have at least 3 - 5 servers (esp. if you are making money off them) -- dev, test, staging, production, and failover. Just breaking the mirror to the failover box, and upgrading the primary would allow for an easy and quick backout procedure.

Oh, and there is another upgrade trick (I'll have to see if I still have my old writeup on it, and post it as a Gist). You can query the package manager, and make a few lists. First, list the packages (and versions) that are installed. Secondly, get a list of any file owned by a package that has changed since package installation (compare the file MD5 sum to the package's record of same). This should be a small list of files (mostly configuration related) that can be backed up. This gives you a good way to roll back changes if needed, and keep a system documented.

Finally, with Yum, you can roll back updates. Take a look at "yum history", and "yum history undo". This has saved me a couple times.

PythonicAlpha · on Jan 28, 2015

I must acknowledge, that I don't have so much administration knowledge as you have. I come more from the development corner. And I really "hate" that stuff. I also have to say, that what you write is probable right. But I think, there are some people like me, that have one or two servers for some projects running (and I know even people, that have websites or even servers running with even less knowledge) who just want that the stuff works and don't have time or ambition to optimize things.

Do you know, if this rollback stuff is also available for Debian based systems? Sounds pretty good. I regret, I don't know yum, it is for RPM-based systems, I know, but the last systems I used where all Ubuntu and Debian.

derekp7 · on Jan 28, 2015

I haven't done a lot with Debian based systems -- I've traditionally been a Slackware / DIY / Redhat person. I just looked through the documentation and a bit of the source code for apt-get, and nothing popped out at me for a rollback feature. If I come across anything I'll update this comment for you.

But I know what you mean about hating the sysadmin side of things. Of course there are people that really love it too -- there's sort of a mindset that you have to get into, just like with development. Maybe this could be an idea for a new service -- matching up programmers with side projects, with sysadmins that are looking for side projects to help manage.

PythonicAlpha · on Jan 28, 2015

Yea, that would be really a great idea. I have a friend, whom I ask for help some time, but he also has only limited time.

And I lack the time to really dive into automation stuff and that sort. So many interesting tools are available, but you always have to take time and learn it first.

thaumaturgy · on Jan 28, 2015

> matching up programmers with side projects, with sysadmins that are looking for side projects to help manage.

That sounds like a fantastic idea. Even in my case, I do "OK" as a sysadmin most of the time, but would love some occasional guidance or assistance.

thaumaturgy · on Jan 28, 2015

> Do you know, if this rollback stuff is also available for Debian based systems?

Over half the servers I admin are running some flavor of Debian, and have for at least five years or so. To the best of my knowledge there's no rollback method for Debian updates, short of relying on filesystem tricks like using ZFS, discussed upthread.

PythonicAlpha · on Jan 28, 2015

Thank you for your information! That would be really a fine feature. I love Debian for its stability (as it is said ... I just lack time, to compare myself) but the administration tools are still a black box for me and I have the feeling, that they could need a brush-up.

chollida1 · on Jan 27, 2015

> A one year support window isn’t too short; it’s too long.

Wow, I'm in very strong disagreement with the author on this point.

I don't think I could find a single person, technical or otherwise who didn't use the same operating system for more than a year. In fact its the default for almost everyone who isn't technically inclined to upgrade their os only when they get a new computer.

What about every single car produced today. I doubt you'd want to upgrade them every 6 months.

As a ubuntu user, I find that I cross my fingers every time I run sudo apt-get upgrade. About half the time I get broken builds and once a year my OS will just flat out crash from it, or fail to reboot in virtual box.

This view, while idealistic is just so laughable I can't begin to believe the author is serious.

I've worked with some mission critical systems( stock exchange's) They take more than a year planning a big OS upgrade. To turn around and tell them that they should be doing this every 6 months is just so out of touch with reality, I don't know how to respond to the author:(

Many companies won't install a new piece of software until its been proved out in production for 6 months to a year by someone else....

lomnakkus · on Jan 27, 2015

From a different perspective...

... perhaps if we were to actually get serious about doing small/incremental updates all the time, we'd get better at actually doing them without incident.

I think one of the major obstactles is that the software stack has become ridiculously (and hideously) complex over time. If we could converge towards a world more like what's described here[1], then I think we'd be in a better place overall. (Granted, this the video is talking about deploying services, but AFAICT there's little essential difference between that and a modern GUI system with all its DBUS interfaces and whatnot.)

[1] https://www.youtube.com/watch?v=GaHzdqFithc

vinceguidry · on Jan 27, 2015

Small, incremental upgrades are something I'm finding are essential to maintaining sanity as the only dev at my company doing what I do. Waiting to upgrade only increases the pain. Now instead of having one potential issue to troubleshoot every now and again, I have dozens when I finally do buck up and upgrade.

So I get in the habit of upgrading the dependencies of all the apps I work on, every time I work on them. Issues happen, but only to one dependency at a time. It's manageable.

What I would love is to eventually have a CI server do it for me. Every single day, it would run bundle update, run the tests, and deploy unless there's a problem. If there is a problem, it drops me an email with the trace and fixing it becomes part of my morning routine.

If subtler problems surface this way, then I've discovered an oversight in my test coverage, or an overly complicated architecture that I need to remove dependencies from.

I'll probably implement this sometime this year. I'm thinking I'll want to redo deployment instead of relying on Capistrano, then finally growing my own CI solution. I'm slowly moving away from big monolithic apps to smaller, homegrown solutions that do only what I want them to do. I've already reimplemented provisioning and configuration management. I believe in DevOps as code.

sarciszewski · on Jan 27, 2015

> What I would love is to eventually have a CI server do it for me. Every single day, it would run bundle update, run the tests, and deploy unless there's a problem. If there is a problem, it drops me an email with the trace and fixing it becomes part of my morning routine.

Agreed. A while back, there was a security fix in the default php5-fpm configuration (0666 -> 0660, I believe), and I skipped it when updating. My website went down until I learned to change the owner of the process. And this wasn't even a dist-upgrade.

Sometimes manual intervention is required.

In my case, I run updates almost every day. If I had to wade through 6 months of backlogged updates, I would have wasted a lot more time identifying where it broke. Smaller feedback loops are a win.

pfg · on Jan 27, 2015

Gemnasium provides something similar to what you describe in the context of Ruby applications with Gemfiles. It can be configured to automatically generate pull requests updating outdated dependencies, which could trigger your test suite to automatically check for regressions.

steveklabnik · on Jan 27, 2015

I use Travis CI to maintain my Ruby gems, and one thing that I do is turn on a build that allows failure, where it uses the HEAD version of my upstream dependencies. That way I know within a day when upstream has done something broken, which is especially nice if it's by accident.

Sanddancer · on Jan 27, 2015

The problem is that OpenBSD has had some pretty significant user-facing changes in only the past year. The past few releases have seen apache replaced with nginx, and the next version is going to be getting rid of nginx for their own httpd. If you're just using the basic webserver, that's some pretty significant change right there. While they're advertising clean and simple, they can and do introduce potentials for heavy breakage in nearly every release.

lomnakkus · on Jan 27, 2015

Maybe that's an indication that the configuration of the web server should be partitioned into "basic configuration" (which they can all support) and a few other categories (which are supported by a subset), i.e. the configuration itself should be "standardized".

I realize that this is a nontrivial proposition, but if we're ever going to realize the dream of "interchangeable parts" we're going to have to learn to do this stuff.

JoachimSchipper · on Jan 27, 2015

The old code is one pkg_add away, though. Breaking binary compatibility is at least as likely to hurt you - and simply upgrading PostgreSQL without a dump-reload cycle can hurt plenty.

Sanddancer · on Jan 27, 2015

That still requires admin time to update the configuration and change management tools, QA time to run a regression test that will exercise all the potential failure paths, development time if you count on the ability to do configuration updates based on application state, etc. Updates that look small can still require considerable changes.

lstamour · on Jan 27, 2015

If you're used to it, you can streamline these processes and automate as much as possible. Even if it's a change you can't easily test or automate, by making just the one change at a time you know what to blame if it breaks. :)

dietrichepp · on Jan 27, 2015

Presumably that's just the default web server, and you can use the webserver of your choice as well?

rodgerd · on Jan 27, 2015

> ... perhaps if were to actually get serious about doing small/incremental updates all the time, we'd get better at actually doing them without incident.

That would require more developers to actually give a fuck about the user experience not breaking from release to release.

danielweber · on Jan 27, 2015

I simply expect that doing an upgrade of the system (as opposed to grabbing the latest version of applications and libraries) is going to totally destroy my box and I'll spend the day recovering, because that's happened to me multiple times, from multiple OS vendors.

lomnakkus · on Jan 27, 2015

Hopefully (and I realize that this is somewhat wishful thinking), more users doing haphazard upgrades and complaining about them not working would be feedback into the whole cycle.

(I'm not very well placed in the software cycle to actually start this feedback loop, but one can wish, right?)

rodgerd · on Jan 28, 2015

Given the general quality of response to user reports in free software ("The source is there, fix it yourself", "WONTFIX", "That's a feature", "you're doing it wrong."

phunge · on Jan 27, 2015

I've held jobs at both ends of the spectrum, from bleeding-edge to death-by-legacy. There's a good deal of pain with both approaches, that's the reality. Would your org rather deal with security vulnerabilities and no upstream support, or deal with things blowing up once in a while?

What I like about this article is it's talking about what the global optimum is -- what's the best approach for the entire ecosystem? Not just for one individual or one org. What's Pareto optimal? (Or maybe it's a little less general -- what's the best approach for those who depend on FOSS, and don't pay someone like RedHat for long-term support and CYA insurance).

And basically I agree. The world's upgrade cycles could stand to be a whole lot shorter than they are. Every time I think about all the hardware in this world that'll spend its whole lifetime vulnerable to heartbleed, or shellshock, or a gazillion smaller vulns, my skin crawls.

nathan_long · on Jan 27, 2015

> They take more than a year planning a big OS upgrade.

That sounds painful.

I think the author is really advocating for a faster feedback cycle. Imagine if instead of a big upgrade every X months, you had a tiny upgrade every day, which didn't require rebooting and happened in the background.

Breaking changes could surface the same day they were shipped and be rolled back or fixed.

Now imagine that conservative players also constantly upgrade, but always stay 6 months behind the bleeding edge, getting only releases marked as "tried and true". Eg, if something had to be rolled back or patched, they skip over the bad version.

Wouldn't that make upgrades much less painful for companies like you describe?

tormeh · on Jan 27, 2015

You do realize he's talking about a stock exchange? Even a millisecond downtime is unacceptable. Some things don't even need backup, others need testing on identical hardware in identical configuration before deployment. That's just the way it is.

bzbarsky · on Jan 28, 2015

To be fair, it's _unscheduled_ downtime that's not acceptable for a stock exchange. They have scheduled downtimes all the time. For example, as far as I can tell the NYSE is only "up" 9:30am-4pm local time, Monday to Friday, and not on certain holidays.

Now this does mean that you have to make very sure that the upgrade you start at 4pm on Friday will be totally done and bug free by 9:30am on Monday, of course.

And I agree with your larger point that in a situation like that testing done by someone else in a different configuration is of limited (though nonzero) utility.

nathan_long · on Jan 28, 2015

I understand, and I admit I haven't done anything like that. But isn't it still easier to thoroughly test a day or week's worth of OS changes at a time than to test 6 month's worth or a year's?

munchhausen · on Jan 28, 2015

> As a ubuntu user, I find that I cross my fingers every time I run sudo apt-get upgrade. About half the time I get broken builds and once a year my OS will just flat out crash from it, or fail to reboot in virtual box.

What I'm about to say has been said a thousand times, but I guess we need to keep saying it as long as this myth survives. Are you using PPAs? Are you using PPAs that provide newer versions of software that also ships with the system? Are you using third-party drivers (looking at you, Nvidia)? If so, there's your problem. It's not that difficult to blast away your PPAs with "ppa-purge" prior to running a distribution upgrade. As for regular package updates - "apt-get upgrade" won't break a system, that's not been already severely broken by the user. Period.

I use Ubuntu on the desktop and the server, and have done many upgrades over many versions. I don't recall a post-upgrade broken system in recent years that wasn't due to third-party stuff. Even then, it's trivial to fix - reboot to recovery mode if you can't get your X11 on, "apt-get purge" the offending stuff, reboot and carry on.

My impression is that Ubuntu screwed up on a major version upgrade several years ago, and their upgrade process is still being dinged by folks due to the wave of bad publicity that that incident created. Which is a shame, because Ubuntu post 12.04 is a rock-solid operating system that is a joy to use (upgrading included). Granted, third-party packages can sometimes reduce the overall stability, but then what do you expect? It helps to carefully vet which PPAs you use - PPAs that are done by people who know what they're doing don't tend to cause trouble.

If you want a rock-solid system, do a clean install of an LTS release, don't set up PPAs, and enjoy. I have a side Thinkpad with 12.04 on it which is exactly that, and it's the most stable system I've ever used. And it's being kept current with security patches and bugfixes with regular "apt-get upgrade".

jkot · on Jan 27, 2015

I know group who does REBOOT once year, just not to forget how it is done.

nailer · on Jan 27, 2015

I appreciate what you're saying, but your seem to not be addressing the point of the post: you could reduce risk by moving to rolling upgrades. A year of changes is a big change, smaller updates are not. There's also a huge risk in stasis.

Not that it's relevant, but to respond to your first point: you know people who roll their browser, and a few Linux distros roll their release too. I've heard of some startups who use a Fedora rather than CentOS for that reason.

on Jan 27, 2015

[deleted]

zokier · on Jan 27, 2015

> That's what you need to do when OS releases are made only once every few years and are big. They don't have to be that way

Ubuntu and Fedora releases happen already in 6 month cycles. This is not some novel unexplored concept.

wtallis · on Jan 27, 2015

Ubuntu and Fedora are aggressive about incorporating new stuff. It's possible to have the same release cadence while operating with a more Debian-like attitude toward stability.

taeric · on Jan 27, 2015

I question the scale of just what is getting upgraded if it is "quick and painless."

Note, that I do not mean to diminish the accomplishment. Especially as a fan of TeX and someone that often toutes that maybe we are learning the wrong software lessons by constantly rewriting everything to be in the latest safe language/paradigm.

However, to not acknowledge all of the progress that is made in other areas is just as dangerous as not acknowledging the benefits of more deliberate methodologies.

sliverstorm · on Jan 27, 2015

Last I used it, OpenBSD did a good job of compartmentalization. Like your cell phone or tablet, upgrades could be freely burn-the-earth-boil-the-seas, because user data and configuration was well-separated from the system files. Additionally updates were delivered in complete tarball form- you didn't upgrade eighty packages, you just unzipped one complete tarball.

It probably helped that the complexity was low- I remember the default install occupying some 300MB of disk and a few tens of MB of memory, in a time when Ubuntu already occupied some 10GB.

That was more than half a decade ago though.

taeric · on Jan 27, 2015

I question the comparison to cell phones and tablets. In those, upgrades have become painless because I do not value any data on them. That is, there is zero data that I would miss if I completely lost my tablet.

This works because I quit caring about the UI experience. I accept that when I upgrade, things will be dramatically different.

If I adopt a similar policy on my computer, I get similar results. However, as soon as I start actually caring about more and more data on the machine, things get tricky.

And this is where a lot of customizable user interfaces hit ridiculous trouble. Those customizations are rarely treated as important data. Worse, there is often a lack of ability to clean up excess preferences.

Which leads me back to my main point. There have been a lot of advances in consumer user applications that are easy to ignore in server applications. Which makes upgrading much more difficult in the consumer space.

sliverstorm · on Jan 27, 2015

I mostly made the comparison because my Android phone is partitioned similarly to how I remember the recommended scheme in 4.2-4.4 OpenBSD, and in both cases that partitioning scheme lends itself very strongly to painless upgrades.

But you are right about the lower "data risk" on phones, thanks to cloud services and such.

taeric · on Jan 28, 2015

Is a fair point. But I think that we keep far less data on these devices than you ever did on a computer. Certainly not in the size of the data, but more in the types. Preference and general settings data has basically erased on these.

pekk · on Jan 27, 2015

You find the author's view laughable, but I find your view laughable. Reasonable deployments of Ubuntu or Windows get upgraded with security fixes all the time, browsers update themselves all the time. Waiting a year between security fixes is just foolish.

Silhouette · on Jan 27, 2015

The problem isn't the security fixes. Everyone agrees that pushing security fixes promptly is generally a good idea. The problem is all the other changes that you may or may not want that get bundled in with those security fixes in a forced-upgrade model. Certain web browsers are among the best examples I know of this user-hostile behaviour.

abraham · on Jan 27, 2015

The entire point of the article is that "all the other changes" include security fixes nobody knows about.

anon1385 · on Jan 28, 2015

They also include new security flaws nobody knows about.

jms703 · on Jan 27, 2015

> I don't think I could find a single person, technical or otherwise who didn't use the same operating system for more than a year.

You can. Think AWS systems (as an example) where you don't log in and patch, you spin up a new instance instead. This can't apply to everything (stock exchanges, MRIs, etc), but for the software affected by glibc, absolutely.

PeterisP · on Feb 3, 2015

> What about every single car produced today. I doubt you'd want to upgrade them every 6 months.

Tesla cars get automated over-the-air updates far more frequently than every 6 months.

stonemetal · on Jan 27, 2015

In fact its the default for almost everyone who isn't technically inclined to upgrade their os only when they get a new computer.

Most new computers come with a warranty that is 90 days. Why should software support be 5x longer?

viraptor · on Jan 27, 2015

Depends on the country. In most places in Europe that would be 2y minimum.

Silhouette · on Jan 27, 2015

Most new computers come with a warranty that is 90 days.

Where? In the UK, consumer protection laws would probably force a minimum of two years in most cases. Even after that, if an expensive device mysteriously failed on day 2x365+1 in a way that wasn't reasonably expected, the purchaser would still have some relevant rights.

Why should software support be 5x longer?

It is all about reasonable expectations. No-one paying for basic system software would reasonably expect it to stop working after three months.

Of course if we're talking about Open Source software that you're getting for free then it's your own responsibility to make sure it does the job you need and no-one owes you anything. This is part of the reason that selling support for otherwise free software is a viable, and sometimes highly lucrative, business model.

stonemetal · on Jan 27, 2015

US, Canada, Mexico. While no one expects hardware or software to stop working after three months supporting it forever isn't really worth it either.

PhasmaFelis · on Jan 28, 2015

Because hardware warranties and software support have completely different purposes? Seriously, this is a non sequitur.

fidotron · on Jan 27, 2015

Upvoted for some interesting points, but it's quite wrong for the simple reason that on a commercial timescale one year is nothing. There are reasons Microsoft end up being dragged into supporting old OS versions, and it's because running an operating system is just a means to an end, which hopefully is doing something useful. "Upgrades" have, understandably, become synonymous with unnecessary pain and breaking things.

Maybe OpenBSD really is more rigorous about quality control, but (as an example) if you were to just accept every Ubuntu update it wants to install you'd be wasting a significant amount of time just ensuring your system works properly.

deathanatos · on Jan 28, 2015

> on a commercial timescale one year is nothing.

I've worked in commercial environments where common invocations of `tar` did not work, because `tar` was a decade out of date. I had to learn things and habits that had died before I even started programming. It wasn't pleasant. Do not underestimate the age and stubbornness to upgrade of some environments.

Recently, I've helped migrate software from Ubuntu Precise to Trusty, and the amount of differences make things mildly frustrating. We don't get to just run one or the other, and we can't just drop everything and move to Trusty. We have to continue to support the old while we build support for the new, briefly support both, transition to the new, then tear out the support for the old. Migrated. It's a lot of work when the changes are huge, but much more manageable when I can take the changes more piecemeal (it's one if statement, as opposed to many, that I need to manage at any given time). That's in a production environment.

I run Gentoo at home. I much prefer its rolling releases to Ubuntu and Debian which I ran alongside and before, respectively. Things break every now and then. It's a tad annoying, but it gets fixed sooner or later.

on Jan 27, 2015

[deleted]

specialp · on Jan 27, 2015

This is implying that people actually want to upgrade full releases. Perhaps I would like to make a Linux box and have it last a few years, and only patch it for security fixes as it is working fine. Yes if I do have to upgrade that system 5 years later it is going to be more painful than every 6 months, but if I did it every 6 months I would have to devote development time and switch context every 6 months.

Small updates are better if you have to upgrade entire systems, but many people do not want to risk destabilizing working systems every 6 months. For that we have LTS where someone will integrate in security fixes in a hopefully non impacting fashion and for that I am thankful

craigds · on Jan 27, 2015

"Considered Harmful" Essays Considered Harmful: http://meyerweb.com/eric/comment/chech.html

I think the OP is being a bit naïve if they expect all users to upgrade to a new OS every year. Upgrades of Ubuntu and OSX are usually quite painful endeavours, fraught with UI-breakage and new bugs to solve, and there's usually little incentive for users to upgrade in every 6-monthly release.

IMHO Ubuntu has a good two-pronged approach - short-term support for most releases, but a LTS every couple of years for those who don't want to handle the pain of upgrading all the time.

Anecdotally, even upgrading to stay up-to-date with the LTS versions can be difficult. A company I deal with is currently scrambling to ditch Ubuntu 10.04 before it loses support in April this year. That's 5 years old now. Companies don't upgrade for fun.

zokier · on Jan 27, 2015

Personally I think a 6 month release cycle is the worst of both worlds. Either give me LTS and I'll do that one big upgrade every few years, or give me rolling release where any breakage should be relatively localized and minor. But with 6 month release cycle you have to constantly be upgrading, but the upgrades are still big "break the world" upgrades.

learnstats2 · on Jan 27, 2015

>Now on the one hand, this forces users to upgrade at least once per year.

The problem with this article is this assumption which is not true in the slightest.

There's no force which makes users upgrade - some users will upgrade when they're told; some will wait 5 or more years - and may have good reason to.

> Nothing kills a bug report faster than “My network card worked in 4.4 but stopped working in 5.6.” Developers aren’t going to bisect five years of changes; you get to do that yourself.

This statement can translate into don't ever upgrade. Is that really the goal?

sliverstorm · on Jan 27, 2015

Is that really the goal?

The goal appears to be something more along the lines of not capitulating to the worst-common-denominator. Designing your product and spending your efforts to carefully accommodate people who upgrade every few decades (give or take) results in an objectively worse product for everybody else.

Unless, of course, that customer is willing to front the money to make it worthwhile.

taeric · on Jan 27, 2015

Does it really result in an objectively worse product, though? Subjectively, I can't but agree. But objectively?

Especially in an industry where we are constantly amazed by what people accomplished years ago. I question whether long term planning and acting is truly objectively worse.

sliverstorm · on Jan 27, 2015

Windows XP is my standard example. We remember them fondly, but things we take for granted today (WiFi, 64-bit, multiprocessor support, advanced power management including frequency scaling) all had to be kludged into XP because it was forced to stick around for so long, and was never as good at these things as Vista or 7. (I know we all hate Vista, but it did these things well)

A long lifecycle doesn't allow for the rapidly evolving hardware & software environments.

taeric · on Jan 27, 2015

I'm not sure I follow. Of course... I actually don't recall running XP heavily. :(

Are there objective measurements on how these things were kludges back then, but are good features today?

marcosdumay · on Jan 27, 2015

> There's no force which makes users upgrade

Except for auto-upgrade run from cron every few days.

username223 · on Jan 28, 2015

> Except for auto-upgrade run from cron every few days.

This sort of user-hostile behavior is guaranteed to make me take the time to open a shell and nuke your software from orbit with blasts of "rm -rf". Don't do this.

sarciszewski · on Jan 27, 2015

No. Ideally, the goal is that users will upgrade as soon as a fix is available.

LukeB_UK · on Jan 27, 2015

The author makes it sound like the developers won't care about the bug because the user waited from 4.4 until 5.6 to upgrade.

In this case the user is better off not upgrading if the developers aren't doing to do anything about it.

Yen · on Jan 27, 2015

The author didn't say the developers wouldn't care, but that the extra effort means the developers wouldn't care enough.

Fixing a bug requires figuring out what's broken, and that's a lot easier when you know something broke between 4.7 and 4.8, than in any of the revisions between 4.4 and 5.6. (Especially if there were actually two latent bugs in 4.4, one of which got exposed in 4.8, and one of which got exposed in 5.2)

sarciszewski · on Jan 27, 2015

No, in this case, the user should have been upgrading this whole time, not putting it off for 3+ years.

Downvote me all you want. Tough love is what's needed here. Hiding my comments isn't going to make peoples' negligence any less their fault.

obsurveyor · on Jan 27, 2015

Their network card still would have broken at some point and they'd be in the same bind they are in now: A non-working critical part and at the mercy of the developers to identify and fix the bug.

ams6110 · on Jan 28, 2015

Exactly this thing happened to me in an OpenBSD upgrade. My network card no longer worked. That prompted me to actually read the release notes, and I quickly found the reason.

taeric · on Jan 27, 2015

I would think ideally, programmers wouldn't break things that are working. We know that ideal doesn't pan out.

sarciszewski · on Jan 27, 2015

"programmers wouldn't break things that are working"

So you want the world's best security experts to manually review every module of every program you use, as well as every piece of every operating system you want it to run on, to make sure there are 100% certainly no exploitable bugs that could require a fix down the line?

That's the same argument, but with reality applied.

Programmers don't "break things that are working", people find design flaws that could lead to remote exploitation. If you don't patch your stuff, enjoy being 0wned, not my problem. For the rest of the world, bleeding edge is the safest place to be.

taeric · on Jan 27, 2015

Yes, yes I would love that. Though, as I noted, nobody expects that ideal to pan out.

Reality applied to many large organizations to the prospect that things may break on an upgrade translates to "don't upgrade."

And really, this isn't necessarily bad. So long as what isn't getting upgraded has been compartmentalized and does its job correctly. Consider, we don't even think about asking companies to upgrade processors on a yearly basis. Monitors? Good for years. Actually, pretty much all of the hardware.

Why do we think software should be different?

kbenson · on Jan 28, 2015

Every time this topic comes up, we have people talking about really different things not clearly explaining what they are referring to, and causing confusion and people not understanding the reasoning of others.

I've read the following arguments already, but with portions of them implicit. See if you can determine why the people are talking at cross points:

- "I generate VM server images for all my needs, and deploy to virtualization infrastructure behind load balancers to handle services. I treat he OS like an application like the rest of the stack, and all my data is abstracted to a data layer. I just generate a new image with patches and test it, then deploy if it works in my test suite."

- "I have hundreds of servers in dozens of roles with different software needs, and I need them to be secure and stable in a timely fashion, and I can't spend multiple weeks achieving that. Long term support and back-patching allows me the time to plan needed large changes in infrastructure without having to spend all my time managing updates, software changes and configuration changes multiple steps down the dependency graph."

- "I have a few servers with a few roles, or many servers with one or two roles, and I can manage frequent updates just fine, and it allows me to take advantage of the newest features, get security updates immediately as the software authors fix it, and I don't have to worry about end of life of software."

- "I have an application stack with multiple dependencies, and I just make sure to update my stack as I make changes to the software. I would love/have set up continuous integration software to build and test everything as I go, so I know if it works or not before taking it live."

As someone who's been in all of these situations at different points, often multiple at once, let me be clear: Unless you have an argument that addresses ALL of these situations, you really haven't thought through the issue.

hawleyal · on Jan 28, 2015

I think the author is advocating that those scenarios should converge. His argument is that second scenario is an anti-pattern and should be corrected to be more inline. It might be a pipe dream, but it's still his perspective.

kbenson · on Jan 28, 2015

They serve different needs. The second scenario may well be better served by moving towards the first, but that's not a quick project, especially when uptime and reliabiity is important. Even with that, the for the third scenario a full VM infrastructure may by overkill. This whole topic can be summed up with "Examine your needs, examine your options, make the best choice you can at the time, try to make your life easier by leaving pathways to change your choice with the least problems based on expected future needs."

Just because someone looks to be doing something similar to you, doesn't mean their needs and constraints aren't quite different. I think this is the big thing most people miss.

slasaus · on Jan 27, 2015

It's funny that just today, before I found out about the glibc vulnerability, I reconsidered if I really want to upgrade my Ubuntu 10.04 mail and web servers to Ubuntu 14.04 LTS. I was triggered after reading some bad things about 14.04 [1]. I've looked at the M:Tier binpatches and package upgrades for OpenBSD, looked at FreeBSD, Debian with it's experimental LTS, but eventually I'm still in favor of the 5-year support for Ubuntu server.

I have bad experiences in upgrading production machines in-place, whether it is OpenBSD, Ubuntu or Debian and always install a new machine (vps) side-by-side which is really the only stress-free guaranteed way to go in my opinion. Having to do this only once every 5 years is really a lot nicer then at least once a year. The good security backports of Ubuntu, minimal breakage (auto-security upgrades at my Ubuntu servers have been working almost flawlessly [2]) are the least maintenance, stable and secure setup I can imagine.

OpenBSD having only one year of support, no binpatches of the kernel and having no stable security fixes of the packages are the reason I only use it with anything that can be done by the base system (backup host and nameserver). OpenSMTPd looks very promising, but I would need supported amavisd packages, same goes for httpd that needs php in my setup. Besides it's limited use, I still love OpenBSD and the mindset that stewards it. If only they had longer support and binpatches for kernel and packages :)

[1] https://tim.siosm.fr/blog/2014/04/25/why-not-ubuntu-14.04-lt...

[2] the mail config was overwritten once after an auto security update of dovecot in 10.04, quickly recovered it with etckeeper (/etc in git)

tghw · on Jan 27, 2015

Thanks for the tip on etckeeper! I've been wanting something like that for a while.

allendoerfer · on Jan 27, 2015

His reasoning makes sense but, as all developers tend to do, he spots an error in a system and wants to switch to another or invent a new one. You know the xkcd.

Often times the solution is just to work together and fix the old one. For me the logical fix would be to patch all kinds of malicious, undefined or non-spec behaviors in LTS releases in short cycles regardless whether the developer thinks, it is security-critical or not. To make this more feasible you could either pay for it or use a minimal base system separated from the user-space. Both exists today.

JoachimSchipper · on Jan 27, 2015

That's both a lot of work and a lot of churn, though - people choose LTS because they don't want things to change!

allendoerfer · on Jan 27, 2015

In a sense, things do not change, they stop to be changed from a assumed state.

Sanddancer · on Jan 27, 2015

There are a lot of problems with this argument, especially when you go into the ports section. For example, the version of go between 5.4, 5.5, and 5.6 is different for all three versions, which means you're going to have to rebuild packages, run regression tests, etc every six months to make sure some new feature doesn't run into your code, or some bug isn't introduced. Similarly, in the case of the base install, you are still going to have to run a regression test there to ensure that any deprecated or removed features don't impact your workflow.

These things are expensive from a development and ops perspective, and why most commercial software vendors tend to stick to a releasing for a few platforms they know are going to be supported for longer than a year or two. Yes, OpenBSD upgrades are mostly painless, but still, bugs can and do happen, and expecting everyone to be able to just drop everything and rebuild is plain and simple unreasonable.

jerf · on Jan 27, 2015

In theory, I agree. I've often likened code development to muscles; you get good at what you do, and what you don't do atrophies. If you push changes ever hour, you get very good at making sure that works. If you push them every three months, you get good at pushing them every three months, and panic when something requires you to move within days. And so on. This applies to all sorts of development process aspects (test a lot, get good at testing, fail to test, becomes impossible to test later, etc).

In practice, neither what tedunangst nor I think matters, because once you ship something to a customer in any way, they're not going to upgrade it. Offer automated upgrades and they'll demand a way to turn them off. It's not just developer boxes we have to worry about, unfortunately.

CrLf · on Jan 27, 2015

Forcing users to upgrade frequently has one outcome, and one outcome only: they will stop upgrading, period. If you give users only two choices, they will choose one, but the one that requires less effort.

There is more to life than keeping up with the constant upgrade treadmill. It's already bad as it is. Some Linux distributions are on a 3-year support cycle, which is too short. If you have many servers, especially many servers running a lot of different things, upgrading every 3-years means having people doing nothing more than upgrades.

Once you fix your applications to support the changes in the next OS release, that release is almost out of support...

wmf · on Jan 27, 2015

they will stop upgrading, period

Has that been observed with Chrome, ChromeOS, iOS, etc.?

CrLf · on Jan 27, 2015

That's an apples to oranges comparison. To upgrade Chrome a consumer depends only on himself. To upgrade a server OS, the "user" (the sysadmin) depends on a number of other people with different priorities. And that's _if_ the people he depends on are still with the company...

I've been faced inumerous times with impossible upgrades (nobody cares about upgrading besides you, so nobody cares to fix whatever is preventing the upgrade). And when the upgrade is possible, many times it takes years.

That same user that upgrades Chrome on his laptop, is still stuck with IE7 at work because his company relies on it to pay his salary.

gtirloni · on Jan 27, 2015

Oh please, all it would take for this "Ghost" vulnerability not to happen was someone at glibc to have made the right call regarding how to treat a buffer overflow in a function dealing with external data. They didn't, so what? The lesson here is that Red Hat & other companies should do additional review if they are shipping something branded as super secure and stable.

Long term support is great. People screw up sometimes. Additionally, it's about time core software components get more attention from companies with the big bucks (and that are profiting from it).

vezzy-fnord · on Jan 27, 2015

Personally I'm of the idealistic belief that glibc is really starting to show its corrosion, and should be replaced by a clean-room, standards-compliant and elegant design like musl: http://wiki.musl-libc.org/wiki/Functional_differences_from_g...

Of course, this would be a massive change for the Linux ecosystem at large. It is my hope that projects like Sabotage Linux will eventually procure most major software to become musl-compatible, though.

The source of the GHOST bug was in NSS, which musl lacks by design.

GregBuchholz · on Jan 27, 2015

Speaking of corrosion, maybe we should be building systems with a language which avoids most of the problems of C.

Silhouette · on Jan 27, 2015

The trouble with that otherwise perfectly reasonable position is that no such language exists yet that doesn't have its own significant problems of one kind or another.

I'm hopeful that this situation will finally change within the next decade and that C and C++ can be honourably discharged as soon as possible thereafter. However, I don't see this happening until newer languages with sounder foundations -- Rust comes to mind as a promising example -- have entered the mainstream. Until then, the ecosystem around C and C++ has too much momentum for a lot of projects to switch to more experimental technologies even if those technologies have advantages in terms of robustness.

Sanddancer · on Jan 28, 2015

I think that the OpenBSD team has more than definitively shown that C isn't a problem, programmers' attitudes is the problem. While I disagree with them on the premise of upgrade early and often, I do agree with them on active reviews, active documentation, and enforcing code hygiene, which prevents the sort of problems that proponents of other languages harp on about when talking about C.

MaulingMonkey · on Jan 28, 2015

> I think that the OpenBSD team has more than definitively shown that C isn't a problem, programmers' attitudes is the problem.

To err is human, no matter what your attitude. While having the correct attitude can help, this line of argument - that it's the programmer fault and not his tools - is part of the very attitude that causes problems!

Any decent programmer embraces the fact that they'll make faults - and beyond simply trying to improve, they'll also embrace tools that help them compensate. Such as static analyzers. Fuzzing frameworks. Safer programming languages.

Perhaps they'll choose tradeoffs that sacrifice some of these options - I certainly do, coding in C++ all day, a language with at least 40 references to "undefined behavior" in the '03 standard alone. But that doesn't mean those tools aren't worth considering.

> I do agree with them on active reviews, active documentation, and enforcing code hygiene, which prevents the sort of problems that proponents of other languages harp on about when talking about C.

I see zero inherent reason to rely on extreme vigilance by programmers to catch errors that better tooling can catch without having the occasional shared blind spot where 5 programmers all miss an uninitialized variable.

You can help use these to help compensate for C's weakness if that's a tradeoff you need to make, but that's quite an opportunity cost - leaving them with far less time to catch other logic bugs and ship features.

There's a reason nobody preaches about how it's going to be the year of the OpenBSD desktop (and that's not a knock at OpenBSD!)

Vigilance also requires the right knowledge. A coworker recently revealed to me he'd only recently learned that C++ member variables will generally have unspecified values if you don't initialize them. He thought other threads were to blame for garbage values.

Every now and then, I check to see if any of the compilers we're using have added a warning I can enable that will catch leaving members uninitialized after a constructor completes. So I can enable it. And configure it as an error. So everyone who builds the code knows there's a problem. Immediately.

So far, no dice.

GregBuchholz · on Jan 28, 2015

...bringing up OpenBSD is interesting. Isn't there an email from de Raadt floating around that says that review, etc. aren't enough, and that's why they use ProPolice, W^X, randomized malloc, etc.. Tools which pretty much only exist as a stop-gap for C's horrible security record.

zdw · on Jan 27, 2015

The downside of this is that OpenBSD is pretty difficult to upgrade in a clean manner given it's "uncompress the tarball over /, then clean stuff up manually" method.

I really like OpenBSD, but it desperately needs a modern package management system. This is frankly my largest sticking point with it - it's faster and much cleaner in most cases to wipe, reinstall, then apply configuration than to try to upgrade.

brynet · on Jan 27, 2015

This is oft-repeated, but how is OpenBSD's package management system "not modern"? It handles upgrades properly, modified configuration in /etc is preserved. Packages are cryptographically signed as of 5.5, files are individually checksummed..

I'd say OpenBSD still handles dependencies and upgrades better than FreeBSD's supposedly "next-generation" pkg.

OpenBSD's package system and ports tree are tightly integrated and work together, and despite the historic naming, OpenBSD's pkg_{add,delete,info,create}(1) are unrelated to the now deprecated FreeBSD equivalents.

ams6110 · on Jan 28, 2015

Package management in OpenBSD is as good or better than any other distro I've used.

Upgrades are: boot bsd.rd, upgrade. Then run sysmerge, and pkg_add -u

danielweber · on Jan 27, 2015

As someone who used to help maintain hundreds of OpenBSD boxes, yes.

But don't forget, OpenBSD is designed for the developers.

brynet · on Jan 27, 2015

The recommended and supported way of upgrading the base system is to boot a bsd.rd kernel and type 'U'. If you don't have a console, the manual upgrade guide on the website is an added convenience.

jkot · on Jan 27, 2015

How about debian with freebsd kernel?

elektronjunge · on Jan 27, 2015

The new pkg tool in freebsd is pretty good and meshes nicely with the ports tree. The marginal improvements for 3rd-party built packages with apt aren't really worth getting rid of the rest of freebsd for. It also doesn't help OpenBSD, which really needs a new package manager.

cesarb · on Jan 28, 2015

A problem with the lack of long term support is that you have to keep moving.

The most important part of "long term support" distributions is that breaking changes are kept to a minimum. If a "long term support" distribution releases an update to for instance glibc, you can expect that applying that update will change nothing other parts of your software stack might depend on.

The dependencies might be subtle; for instance, a new version of a database server might have optimized its query planner, which happened to make one particular query your software does a couple of percent slower, which led to it using a few more seconds to do its processing, enough to push it over the timeout limit for a different part of the system. So the only sane way to avoid breaking changes is to avoid all changes.

The opposite would be, as advocated in this post, frequent upgrades. "Frequent upgrades amortize the cost and ensure that regressions are caught early", but that means you are dealing with upgrades and regressions all the time. You arrive at work in the morning, planning to write a new feature; but an upgrade has just arrived, and it needs a few changes to your project. You develop, test, and deploy these changes; in the meantime, another upgrade has just arrived, needing more changes, to something you had changed just a few days ago. The day ends, and you didn't even start to develop the new feature. You use more time chasing the upgrade stream than doing productive work.

Long term releases "batch" the changes. When several changes affect one part of your software, you only have to deal with them once. Sometimes, you can even discard that part of your system and do something else, while with a continuous change stream, you might have wasted time adjusting little by little.

A somewhat relevant post from Joel on Software: http://www.joelonsoftware.com/articles/fog0000000339.html

ak217 · on Jan 28, 2015

Spoken like someone who has never had to maintain a large production system.

This notion is handily disproved by the market, though. Red Hat and Canonical enjoy commercial success and mindshare precisely because they provide stable long-term support platforms on which others can build software. Most of Canonical's worth is actually embodied in their commitment to LTS releases, which provide a sweet spot between stability and the glacial pace of RHEL.

zokier · on Jan 27, 2015

Isn't 6 month release cycle and support for past 2 releases essentially also the Fedora release model? From what I've heard (and used) that hasn't been massively successful.

> No one upgrade is likely to end in disaster because there simply isn’t enough change for that to happen.

That might be true for OpenBSD, but in Linux land the rate of change certainly is great enough to cause major breakage even at 6 month cycle.

jrochkind1 · on Jan 27, 2015

The only thing that would make this realistic is if open source developers prioritize backwards compatibility very highly.

Of course, there will always be some bugs anyway. And it varies across language (or other open source affinity) communities. But in many communities, developers aren't really even _trying_ to ensure backwards compatibility.

If developers committed to backwards compatibiilty with successive versions for X number of years -- then you could update to the latest release for a number of years worrying less about breaking things, rather than needing to freeze the versions you use for that number of years.

Yes, this would significantly increase developer hours on open source projects. Nothing comes for free.

But wouldn't those developer hours be better spent centrally on ensuring backwards compatibility in the projects themselves; as opposed to every developer of every consumer of the project needing to deal with backwards incompatibility on every upgrade? Or every distro needing to backport patches? (This latter is more debatable, which is exactly why we have the status quo)

jacquesm · on Jan 27, 2015

There is a cost to upgrading and there is a cost to staying behind. As soon as the cost to stay behind outweighs the cost to upgrade I upgrade, but no sooner. Otherwise I'd just be throwing away money and time. The bigger problem here is that the cost to upgrade is only known after the fact. What should be a routine job can easily spiral out of control into a marathon of misery.

contingencies · on Jan 28, 2015

Reading the title I assumed this was a rant about how proper ops processes replaced the need for LTS releases and the demonstrably bad practices they could be seen to encourage. IMHO, this would have been a more interesting and valid point.

The current post is a straw man argument for a few reasons.

First, because vulnerabilities will be exploited whether the window of availability is 1 day, 100 days, or 1000 days (like this one) and likewise, upgrades will be missed, for various reasons, whether or not they are encouraged.

Second, the author spuriously implies that a 6 month sliding upgrade window ("one size fits all") approach is superior, easier and more appropriate for everyone through a combination of broad assumption and vague hand-waving about developer time efficiency. Rubbish.

If you want newer code, then ASAP is the go and BSD-style monolithic releases are bad... something closer to the versionless OS ideal of constant, incremental, package-wise release processes such as Gentoo's portage (BSD-inspired!) where you can even install a -9999 (git HEAD) version of many packages would be ideal. Obviously, with such an approach, stability caveats must be considered (as they are). Thus, both FreeBSD and OpenBSD actually represent a 'middle-ground' between Gentoo and commercial Linux vendor LTS, with OS wide release processes incorporating some greater collective package and package interoperability testing, combined with the package-wise release of things through the ports trees.

What's the real ideal here? A general purpose means to build working, tested, maintainable, secure systems with minimal effort for a broad audience.

How to get there? Clearly, not by whinging about external release window frequencies of volunteer-based projects, as any frequency still creates bugs, and no process is perfect. The answer, I believe, is time-honored. Careful design for failure and minimalism, and good process. Test. Measure. Remove surplus features and components. Iterate.

IshKebab · on Jan 27, 2015

Can we have a ban on "X considered harmful" titles? Have some originality.

Silhouette · on Jan 27, 2015

This is not an argument against long term support. It is an argument against using OpenBSD for any project you need to work for more than one year.

reality_czech · on Jan 28, 2015

Yes, we should all step forward into the wonderful world of running brand new software all the time in production. I'm sure it will all be much more secure than relying on year old software that is being actively maintained.

Is the date on this one supposed to be April 1st?

jeffdavis · on Jan 28, 2015

PostgreSQL has a 5-year support window.

I'm curious what negative consequences the author thinks PostgreSQL suffers as a result from that policy.

silvestrov · on Jan 28, 2015

It's a lot easier with a 5 year window for PostgreSQL: Kernel developers have to support all kinds of weird semi-buggy hardware that they can't buy in their country, they have to make the kernel work with badly written GPU drivers , and they have to deal with interrupt-level code which is notoriously difficult to debug.

jeffdavis · on Jan 29, 2015

The author's primary example for the "LTS is bad claim" was glibc, which is not a kernel.

Are you saying the "LTS is bad claim" only applies to kernels and other things very close to the hardware (closer than a database)?

coherentpony · on Jan 27, 2015

"[Useful popular thing] considered harmful."