Mender – An open-source OTA software updater for embedded Linux devices

currywurst · on Feb 27, 2017

Neat :)!

In case the Mender folks are here, have you looked into incorporating the concerns addressed bt The Update Framework (TUF) https://theupdateframework.github.io/

eystein · on Feb 27, 2017

Thanks! :)

Yes, we have looked into it and the nice thing is that TUF seems to be quite easy to add as an additional security layer down the road.

One interesting challenge is downgrade attacks. How do you allow rollback of a bad deployment while disallowing an attacker to deploy an old and vulnerable version?

theamk · on Feb 27, 2017

Why? TUF is all about reimplementing SSL and PKI. Since mender can use regular SSL with good-old PKIs, there is no reason to go with weird solutions.

aseipp · on Feb 27, 2017

TUF protects against more attacks than just HTTPS or regular trivial signing methods do (rollback attacks, freezes, mix and match attacks, and helps secure mirrors), and has little to do with HTTPS or raw "transport layer encryption". It absolutely compliments and suppliments HTTPS if you're using it for your downloads, it is not obsoleted by it. (Though, the subtext on the introduction page probably doesn't help this impression by saying "Like the S in HTTPS...")

theamk · on Feb 27, 2017

Well, the reason TUF has to protect against all of the attacks is because it is choosing to support a varying set of requirements, including lack of SSL and insecure mirrors. Mender simply does not care about them, so it can be dramatically simpler:

- rollback attacks -- impossible since all comms are secure, and there are no untrusted mirrors

- freezes -- impossible, because SSL channel must be re-negotiated every time

- mix and match attacks -- nothing to mix+match, mender only does one file (rootfs)

- helps secure mirrors -- mender does not support 3rd party mirrors, so no need to secure them.

You can see it right on the TUF homepage: it claims to replace application, library package and system package managers. This is a lot of work, which requires a lot of complexity, and there is no need at all to pay that price if you do not need to.

nrclark · on Feb 27, 2017

This is kind of a stupid question, but why not just use .rpm/dnf or .deb/apt and a custom repo?

kristoffer · on Feb 27, 2017

Embedded Linux systems supporting OTA usually employs a dual root file system (RFS) approach where the upgrade is placed onto the currently not used partition and then after successful upgrade the RFS to boot into is replaced.

It is an easy solutions which ensures integrity and has few drawbacks for typical embedded systems.

The output of a typical Embedded Linux CI build is a complete RFS, having to care about individual packages would just be a headache, when you can replace the entire RFS and be done with it.

Matthias247 · on Feb 27, 2017

That's true. I'm currently also involved in the development of an embedded linux project for which we yet have to find out the perfect update story. We thought about replacing individual packets too, but it looks like a hard way from various perspectives:

- Build system: Would need to figure out how to build all these packets (with their requirements) in a reliable way.

- Deployment: Deployments should be reproducable. All devices should get the same version of all subcomponents. And a rollback to an older version (with older versions of subcomponents) should be possible. Having some devices floating around with untested configurations (because the device decided to update on package but not the remaining ones) is not desirable, because you want to only handover fully tested/qualified software versions to the customer.

Building the whole RFS at once, storing it and flashing it at once currently seems like the way to go. Will take a look at mender if it could help.

jononor · on Feb 27, 2017

Yes, I did package-based updates on a embedded system (not enough time to go dual-RFS), and it was painful. The statefulness of such update means that all transition paths must be tested for each release, which adds up very quickly. This effectively reduced how often and how timely we shipped updates.

Package systems are also quite fragile, ie if a post-trigger/script fails the package install fails. And sometimes not cleanly either, where the partially-installed package blocks install of a new fixed package. In the worst case the system is left in an in-between broken state.

karmicthreat · on Feb 27, 2017

I'm kind of in this boat now with a couple of products. The catch is I need to do both local and OTA updates. This seems to be a rare feature that nobody is doing.

I also am trying to get Resin.io like containers working. It seems like it would be an easier way to test and deploy.

Matthias247 · on Feb 27, 2017

That's also a requirement for me - local updates (via connected USB stick, triggered via reboot or web interface) should work also in addition to OTA. _In fact local updates would have an even higher priority.

We fiddled around also a little bit with the container route. I liked it for quick iteration times (rebuild a docker container, pull it from local image registry to device and test it there). But we found out that we won't get by with just container updates, and that most updates would also need to contain a new kernel or kernel module versions.

eystein · on Feb 27, 2017

It is not that uncommon for an updater to support both local and remote updates. For example, Mender has two modes of operation: standalone and managed [0].

Like you, many teams are still doing local updates, or transitioning from local to OTA, at least for some products.

[0] https://docs.mender.io/1.0/Architecture/Overview#modes-of-op...

xyzzy_plugh · on Feb 27, 2017

Exactly. I wish Mender was around 4 years ago. I built exactly this!

padelt · on Feb 27, 2017

Interesting! How did you go about falling back to an older version if the update was bad? Is there a nice way to do this automatically? Say I update to a really botched version with the kernel panicing before it reaches userland. Does this need manual intervention?

mpasinski · on Feb 27, 2017

disclaimer: I work for Mender

It is possible to make rollback fully automatic. In order to do so you need some integration with bootloader. It needs to be configured so that it can roll back to the previously working partition if update is broken. What is more, you can add some user space runtime checks that can verify the update and if those are not passing (updated image is broken) you can rollback to the previous one as well.

dividuum · on Feb 27, 2017

As someone that also implemented A/B booting for the Pi: I wonder how you roll back fully automated? I read a bit of the code but wasn't able to find that. Or is that already handled by u-boot?

In my case, the first thing I do once an unverified version boots is to switch back to the other partition (so the known good version is active during the next boot), then run a detached reboot process that forces a reboot in 5 minutes. Once the system is up and it verified that everything is ok, it commits the next version (by switching back to the partition that booted and marking it as confirmed) so it is now active by default. Finally it kills the still running 'reboot' process.

As far as I understand your update process: You download a complete new version for every update and are able to stream that directly to the new partition? Is there any way to do delta updates? In my experience, most of the disk content is unchanged, unless you do major updates. In my case I download the new version using zsync, verify the downloaded/updated `install.zip` (which is kept on the volatile data partition), then extract that to the new partition. I make sure that `install.zip` is created in a way that it is rsyncable, so updates are pretty small that way. Of course you lose the streaming feature, unless you modify zsync somehow to support that.

eystein · on Feb 27, 2017

I work on Mender, so I can tell you how automated rollback works there.

The update is written to the inactive rootfs partition, uboot is configured to boot from it and the device is rebooted. Using the bootcount feature of uboot it is possible to roll back automatically if booting fails. Once the mender daemon comes up it will try to report the success of the deployment to the server. If this fails it will also roll back. Only after successfully reporting the success to the server Mender will "commit" the update, meaning configuring uboot to persistently boot from this updated partition.

Mender already does compression, but you are right that there are optimizations that can be made for application updates, e.g. delta or other types of updates. We are planning to implement this as well. The first priority for Mender is to make it robust, i.e. make sure the update is atomic and that you can always roll back.

dividuum · on Feb 27, 2017

> Using the bootcount feature of uboot it is possible to roll back automatically if booting fails.

I see. Thanks for the info. I suspected that u-boot does have support for that, but I wasn't sure.

> Once the mender daemon comes up it will try to report the success of the deployment to the server. If this fails it will also roll back.

Is there any deadline at all for that? I explicitly spawn a reboot command that ensures that even if everything gets stuck (in software, not in hardware) for whatever reason, the system falls back to the previous version (unless the reboot command gets killed too, in which case a manual restart is required). Any thoughts on that?

ralphmender · on Feb 27, 2017

This is a valid point. If booting just hangs after the bootloader but before the Mender daemon comes up is actually quite tricky to manage.

We have looked into hardware watchdog for this, but it is in the gray-zone of what an updater should be involved in. This is actually a more generic problem - maybe it hangs even when you did not deploy an update. There is varying support for hardware watchdogs across boards as well, unfortunately.

Most of the time it will not just hang, maybe it will crash or kernel panic and in those cases Mender will rollback. But the indefinite-hanging case is quite tricky and not yet handled.

Would be open to ideas here.

xyzzy_plugh · on March 1, 2017

Hey! Love the product, but I'm out of the embedded game. Thought I'd give my $0.02:

The first step of our boot process was to enable the watchdog. We extend the timeout periodically during the boot process, but generally if userspace isn't reached within 30 seconds or so we reset. Once in userspace, the daemon validates that things look good (this includes things beyond just application of the update -- did services start up correctly? Is the hardware operating as we expect?) before disabling the watchdog and marking the update as a success, at which point rollback isn't possible. At this point we might consider applying new updates, etc.

We also modified our first stage bootloader to be resilient to bootloader update issues, and chainloaded our second stage bootloader from a stub which could rollback.

We also niced the update process to avoid resource contention, allowed the updates to be delayed until the network was quiet, and paused them when it became noisy to make for a good user experience. There was a server-side flag to force updates to apply regardless, with higher priority, as well as one to basically disable all other functionality in the case of a unforeseen serious, perhaps security related, issue.

We actually had a discrete watchdog service which was responsible for petting an always-on watchdog, to rescue the system if it locked up or became unresponsive (if certain processes were not running, or responding, the watchdog would not be pet).

All of this led to effectively 0 failures in the field, a seamless user experience (except for the 30-second reboot when inactive). I wish everything I owned worked this way.

I could talk ad nauseum about this stuff. It's very cool to see the designs of others. I feel this is an under appreciated and under explored problem space.

padelt · on Feb 27, 2017

(Way) back when I was working with OpenEmbedded/Ångström I dreamt up sth like this: make uboot/whatever set up a HW watchdog that is retriggered once from kernel mode and then in the userspace as ususal. The daunting task would've been to implement the (platform specific) watchdog in the bootloader. Looking now, there is support for at least some platforms - nice! Good luck with Mender - I'm sure you are badly needed!

snuxoll · on Feb 27, 2017

Let me tell you a story about the last time I ran `yum upgrade -y` on my oVirt node at home without doing it in `screen`: I decided to upgrade my Fedora desktop at the same time and rebooted after it finished not thinking twice about it, 5 seconds later I said "oh fuck" and prayed that the update on the server finished before I rebooted. It didn't.

I then proceeded to spend the next 2-3 hours reinstalling CentOS on my server, re-configuring oVirt and re-importing all of my VM's from the store.

Now, imagine an embedded device with a potentially flaky power source that could be interrupted in the middle of a system update - you really want to avoid that situation if at all possible. OSTree or Mender are a much better solution when you want an "all or nothing" upgrade, especially as they protect the system from FAILED updates and allow easy rollback.

raverbashing · on Feb 27, 2017

Several embedded distros don't even run a packet manager, it is not needed

You have "disk" and RAM limits. Your Android phone may have 8GB flash and 128Mb of RAM, but other devices might have less. And you can't (shouldn't) turn swap on

You also need a way of ensuring that if the update is b0rked you can recover in some (user friendly) way.

veli_joza · on Feb 27, 2017

Can someone summarize the difference between Mender and OSTree? I see that QtOTA chose OSTree as their underlying mechanism, which is significant in embedded automotive industry.

mwcampbell · on Feb 27, 2017

From my perspective as a curious observer of both projects, OSTree certainly looks attractive, because it doesn't waste space on two rootfs partitions which have to be oversized to accommodate future growth of the image. I initially thought OSTree required btrfs, because Project Atomic used btrfs the last time I looked at it. But according to the docs, while OSTree will take advantage of btrfs features if btrfs is being used, OSTree itself will work with a variety of filesystems including ext4.

Edit: An upside of the alternating rootfs partition approach is that the rootfs can be cryptographically verified at the block level. Chromium OS implemented this, and CoreOS also uses that implementation. This is probably outside the scope of Mender itself, but the updating approach used by Mender enables it.

eystein · on Feb 27, 2017

Cryptographic signing and verification is in scope for Mender [0], and frankly it should be in scope for all updaters -- too many hacks have happened due to lack of codesigning.

[0] https://tracker.mender.io/projects/MEN/issues/MEN-1020

theamk · on Feb 27, 2017

Mender should be pretty immune to power failures in the middle of the updates -- even if second partition is only half-written , it is still not activated.

OSTree seems to rely on underlying filesystem to provide protection from power failures. This is, in my experience, is not a very reliable mechanism, especially when the SD cards are used. Thus, it is likely that if you have a device using OSTree updater and someone yanks the power cord at the bad time, your device may become unbootable.

ralphmender · on Feb 27, 2017

<disclaimer: I am with Mender>

We have been evaluating OSTree as a potential building block for Mender, however the key challenges we've come across:

integrating OSTree into an existing device/OS seems quite invasive - https://ostree.readthedocs.io/en/latest/manual/adapting-exis...

block level signatures is not possible, which we feel is a requirement for an over-the-air updater

rollback is not built-in and can be quite challenging to implement reliably (bootloader level)

Think OSTree as more of a building block, like Git is for your development process. We might use it in the future but robustness and easy integration are our first priorities.

mwcampbell · on Feb 27, 2017

Why is it important to have signatures at the block level? Wouldn't signing an archive or binary diff be good enough?

eystein · on Feb 27, 2017

Signing an archive would probably be good enough for many cases. Block level is a bit simpler (all or nothing) and thus less risk of mixing with unsigned parts (sideloading attacks).

For security-sensitive embedded devices (e.g. payment terminals), block level signatures would allow hardware verification during boot as well (1st stage bootloader verifies 2nd stage, then kernel, etc.) if designed correctly.

oytis · on Feb 27, 2017

If you are interested in OSTree, there is an open-source solution for OSTree updates on embedded devices.

Yocto layer: https://github.com/advancedtelematic/meta-updater Quickstart project with a nice tutorial: https://github.com/advancedtelematic/garage-quickstart-rpi

And yes, we have chosen it because you don't have to waste twice as much disk space, and, more importantly (for wireless networks at least), you don't have to download the whole image.

PanosJee · on Feb 27, 2017

What are the differences against Resin.io?

mpasinski · on Feb 27, 2017

disclaimer: I work for Mender

Mender is basically full image update solution while resin is container based. Mender is fully open source, both client and server, resin is having only client open source. Mender is more lightweight, it provides a thin layer to be integrated with the already existing stack, while resin is providing full stack you need to use to be able to incorporate update mechanism.

karmicthreat · on Feb 27, 2017

So is there any way to make Mender do local updates that are not OTA?

gregdistefano · on Feb 27, 2017

https://docs.mender.io/1.0/Getting-started/Standalone-deploy...

gregdistefano · on Feb 27, 2017

Mender is full image based, using active/passive partitions, while Resin is container based.

Mender is also on a less restrictive software license

imrehg · on Feb 27, 2017

> Mender is also on a less restrictive software license.

According to the article Mender releases under Apache 2.0, and all resin.io's open source code is also on Apache 2.0, so it should be the same permissive setup.

(Source: working at resin.io)

gregdistefano · on Feb 27, 2017

my bad!

mwcampbell · on Feb 27, 2017

I'm curious about why you chose Yocto over buildroot for your official integration. I figured buildroot would be better, because Yocto's opkg system is superfluous on a device with full image updates.

eystein · on Feb 27, 2017

Yocto has quite large community and is growing fast. That said, think of Yocto as the first integration not the only - buildroot is surely interesting too but we had to start somewhere. :)

brightball · on Feb 27, 2017

Is Elixir's Nerves project using anything like this?

pjmlp · on Feb 27, 2017

Cool!

I guess an over-the-air (OTA) software updater for embedded Linux devices could be considered some kind of systems programming...

amq · on Feb 27, 2017

Does Mender work with mbed OS?

amq · on Feb 27, 2017

Figured the answer: no, you need Linux.

ingve · on Feb 27, 2017

Strictly speaking you don't need Linux. As mentioned in the blog post, Mender also works with IncludeOS [0]. A demo was shown at the OpenIoT Summit last week, the video should be available in the near future.

[0] http://www.includeos.org/