Ubuntu 22.04 LTS servers and phased apt updates

nyx · on Jan 15, 2023

Ahh, /etc/machine-id, we meet again... Last time I met you, you were causing my cloned VM to get the same DHCP address as its original and blow up my VM network, despite libvirt's virt-clone utility properly randomizing vNIC MAC addresses, because netplan defaults to using you as the DHCP client identifier. If only we could meet under more pleasant circumstances.

kstenerud · on Jan 15, 2023

NixOS has a related problem in that you can't set the machine's MAC address without a hack:

      # Hack: Change the default MAC address after network but before dhcpcd runs
      systemd.services.setmacaddr = {
        script = ''
          /run/current-system/sw/bin/ip link set dev eth0 address ${macaddr}
          /run/current-system/sw/bin/systemctl stop dhcpcd.service
          /run/current-system/sw/bin/ip addr flush eth0
          /run/current-system/sw/bin/systemctl start dhcpcd.service
        '';
        wantedBy = [ "basic.target" ];
        after = [ "dhcpcd.service" ];
      };

If you don't do this, it uses some black magic to decide the MAC address based on various hardware, making system migration and maintenance a nightmare.

mananaysiempre · on Jan 15, 2023

Is it NixOS doing it or is it e.g. systemd-networkd? (Sounds like something systemd would do.)

While I’m confused as to why the boot would need to decide on a MAC address (VM or cheap SBC without a burned-in address? the first case might be easier to correct from the outside), the general state of NixOS is that some things are very flexible while others only cover some common cases (ones that the original author needed to solve). Unlike a traditional distro where the package manager will complain if you replace distro-provided stuff, in NixOS it’s entirely possible to override parts that don’t work for you rather than paper over them with programmatic overrides like these. It’s not even hard to upstream your changes if you make them backwards-compatible, although the benefit can be limited because the testing is not particularly thorough so other changes may still inadvertently break them.

kstenerud · on Jan 15, 2023

I gave up trying to find a root cause for this after a couple of days of rabbit holes and yak shaving. NixOS is simply too impenetrable once you fall off the happy path (which is unnervingly often).

The next time I rebuild this server, I'll go back to Ubuntu or Debian and use build scripts for "good enough" determinism. For the time being, I just run everything important in LXC and Docker containers on top of this delicately balanced NixOS hypervisor for as long as it'll last.

mananaysiempre · on Jan 15, 2023

It’s the fate of every config generator, yes, although I find NixOS better than average in that respect (between Arch and Debian in how easy it is to figure out what the hell it’s doing to the underlying software).

Troubleshooting guides for NixOS are non-existent, but the system itself is not all that difficult to inspect: two things you can do is `nixos-rebuild build` without switching and meditate on ./result/; and inspect `(builtins.getFlake(toString ./.).nixosConfigurations` in `nix repl` (use import etc. if not using flakes) as that will include every derived setting down to the text of generated config files, not only those you specified explicitly.

But you might’ve just nerd-sniped me, we’ll see.

ETA: Looks like it’s indeed systemd-networkd’s doing[1]. For a static interface, setting `networking.interfaces.${NAME}.macAddress`[2] to the desired value should work.

[1] https://freedesktop.org/software/systemd/man/systemd.netdev...., see the description for MACAddress.

[2] https://search.nixos.org/options?show=networking.interfaces....

winterqt · on Jan 15, 2023

We (as far as I can tell) use some parts of systemd-networkd by default (as in, even if you haven't enabled it), as MAC addresses via `networking.interfaces.<name>.macAddress` are set through it: https://github.com/NixOS/nixpkgs/blob/65e07f20cf04f5db9921dc...

kstenerud: I assume you tried that option?

Edit: ah, you caught that in an edit :-)

Denvercoder9 · on Jan 15, 2023

> Is it NixOS doing it or is it e.g. systemd-networkd? (Sounds like something systemd would do.)

Given that OP uses dhcpcd, it's unlikely to be systemd-networkd.

mananaysiempre · on Jan 15, 2023

Good point. But something has to be setting those MAC addresses...

And it looks like, curiously, explicitly configured interfaces have their setup expressed as .link units even if networkd is not in use[1]. A comment[2] states: “.link units are honored by udev, no matter if systemd-networkd is enabled or not”.

It seems that .link units are nowadays interpreted not by networkd (which NixOS gates with useNetworkd) but by udevd (which it does not). The documentation for them (but not for udevd) even points that out[3] if you’re the kind of person who reads introductions: “link.link: A plain ini-style text file that encodes configuration for matching network devices, used by systemd-udevd(8) and in particular its net_setup_link builtin”.

[1] https://github.com/NixOS/nixpkgs/blob/65e07f20cf04f5db9921dc...

[2] https://github.com/NixOS/nixpkgs/blob/65e07f20cf04f5db9921dc...

[3] https://www.freedesktop.org/software/systemd/man/systemd.lin...

lima · on Jan 15, 2023

virt-clone only takes care of the libvirt config but won't touch unique identifiers in your disk image.

/etc/machine-id isn't the only thing to worry about when duplicating a VM - think SSH host key, DHCP leases, various filesystem UUIDs, log files, MAC addresses in ifcfg-* files and udev-persistent-net rules...

You can use virt-sysprep[1] to clean up a disk image.

[1]: https://www.libguestfs.org/virt-sysprep.1.html

Godel_unicode · on Jan 15, 2023

Cloning VMs is kind of an anti-pattern anyway, it’s absolutely full of this type of foot gun. Just use ansible or similar and build a new one.

lathiat · on Jan 16, 2023

or cloud-init

boot the clean base image and configure it on boot with packages, ssh keys, custom commands, etc.

selfhoster11 · on Jan 16, 2023

Ansible has a non-trivial transition cost (speaking from experience here). Cloning VMs is a legitimate stopgap measure, because that’s sometimes the best you can do if available engineer hours are tight.

Godel_unicode · on Jan 16, 2023

Ansible (as a proxy for infrastructure as code in general) being hard to implement is a warning that your setup is too convoluted. Cloning VMs is the high-interest unsecured loan of tech debt, and when that bill comes due it’s going to be much worse than spending a few days on some scripting.

selfhoster11 · on Jan 17, 2023

Completely agree on all points, except for "few days". It took much longer than that, but partly due to learning Ansible as I went. The pain was still worth it, though.

rlpb · on Jan 15, 2023

Or better, cloud-init. It's same thing that cloud images to use to specialise VM "clones" to their environment, so is more likely to do all the right things. If it has a mechanism to detect that it's been cloned (eg. instance-id on EC2, etc) then it will deal with ssh key regeneration and everything else.

usr1106 · on Jan 15, 2023

As another comment pointed out elsewhere you are not supposed to touch /etc/machine-id to control apt. It has its own Apt::Machine-ID you can change without breaking anything else.

yjftsjthsd-h · on Jan 15, 2023

That's almost worse in my opinion. If you're going to have unique machine ids, is it too much to ask that there's only one to deal with?

Denvercoder9 · on Jan 15, 2023

In the default configuration apt uses /etc/machine-id, it just has its own override (which is reasonable, imo).

yjftsjthsd-h · on Jan 15, 2023

Oh, perfect; yes, that's the best way to do it. (Best of both worlds, perhaps I should say.)

ilyt · on Jan 15, 2023

When I read what netplan is supposed to be I thought "FINALLY", when I saw who is making it (Canonical) I said "fuck, they will fuck it up"

magicalhippo · on Jan 15, 2023

Back in the NT4/Windows 2k days I recall having to do some extra steps to modify an identifier in the registry (or similar) when cloning Windows images.

Otherwise the clone would not properly register on the network. Perhaps it was only when speaking to the Domain Controller though.

IIRC the later versions of Norton Ghost, which was what we used, did this process for us automatically.

luma · on Jan 15, 2023

It's the SID you were changing, and it turns out it was never actually required: https://techcommunity.microsoft.com/t5/windows-blog-archive/...

Russinovich eventually pulled the tool from circulation.

toyg · on Jan 15, 2023

There are some conspiracy theories, fuelled by the fact that Russinovich only pulled the tool after being acquihired by Microsoft.

It's undeniable, in my experience, that the tool did help, despite all the swearing to the contrary. Making a leap from there to believing that it probably made it too easy to clone Windows machines in a way that Microsoft had no control on, and hence asked him to pull it, doesn't seem so crazy though.

Godel_unicode · on Jan 15, 2023

You just needed to actually read the documentation and use sysprep (ideally with an unattend file). Just remember to image it before sysprepping since you can only run it a few times. There are a ton of things sysprep does that are really helpful and not handled at all by tools like that.

Microsoft doesn’t care if you clone systems, (why would they?) they care about the volume of help desk tickets created by doing it wrong.

toyg · on Jan 16, 2023

> You just needed to actually read the documentation and use sysprep [...] There are a ton of things sysprep does that are really helpful

And a lot that are not helpful, iirc. It was probably 10 years ago when I had to deal with this, but my recollection is that Sysprep was messing with a lot of stuff - more than I needed.

> Microsoft doesn’t care if you clone systems, (why would they?)

Lol, licenses, of course. Nowadays they got a bit softer on the issue, but back then they were still very very twitchy about duplicating and virtualizing systems.

Godel_unicode · on Jan 16, 2023

We have very different memories, then. The official docs were really good as far back as 2006 or so, they just required you to actually read them. Sysprep was certainly a powerful tool, but if you told it not to generalize and no OoBE, it mostly just reset the SID. Most people I ran into who had issues were using some random poorly written blog posts that were really just content farming for ad money.

Huh? I’ll grant you that Microsoft was picky about being paid for their software, but they produced a ton of tools to support cloning and duplicating. License compliance was handled with audits and CALs, not some weird cabal of anti-imaging.

magicalhippo · on Jan 15, 2023

Ah yes, that was it. I do recall we got error messages preventing the machine from working, and the guys spent some time researching before a solution was found. The error messages went away after changing the SID.

tjoff · on Jan 15, 2023

What is the reasonable for not use the mac as the identifier?

Trying to think of a use case I thought of wireless + wired, would be kind of neat if they had the same IP. But that falls apart completely if you have them connected at the same time (which I often do).

linsomniac · on Jan 15, 2023

At one point I set up my wired and wireless interfaces as a bond with wired as the primary, and I could do a file transfer and watch the speed go up or down as I plugged and unplugged the wired interface. That was pretty slick.

josteink · on Jan 15, 2023

That sounds cool. Did you document it anywhere for others to setup themselves?

We’re there any obvious issues in such a setup?

linsomniac · on Jan 16, 2023

I did, but that blog was taken offline the beginning of this year when the company I was a part of when I wrote it went out off business. It was horribly out of pace with modern setups though, I did that ~20 years ago, it used a network manager that no longer exists.

I'm not sure how it'd fit into a Network Manager or systemd world. There really wasn't any particular trick to it, IIRC I used link monitoring to detect link failure, though I might have used ARP, and set the ethernet as the primary interface, just using the standard Linux bond driver.

justinsaccount · on Jan 15, 2023

With things like docking stations/usb-c docks/other adapters, the actual mac address doesn't necessarily identify a machine.

tjoff · on Jan 15, 2023

As my example illustrates, that is a feature.

dark-star · on Jan 15, 2023

Can't you just get rid of that file entirely? I would assume that there is a (sane) fallback for DHCP in case this file is missing? What other uses besides the phased updates and DHCP does that file solve?

tremon · on Jan 15, 2023

Google Chrome reads it, probably for tracking purposes.

cosmin800 · on Jan 15, 2023

this is not an ubuntu only issue, happens on debian too

wongarsu · on Jan 15, 2023

The article repeats that they would like to set up a canary server but can't. But they also link to an askubuntu answer which mentions these two settings:

    Update-Manager::Always-Include-Phased-Updates;
    APT::Get::Always-Include-Phased-Updates: True;

    Update-Manager::Never-Include-Phased-Updates;
    APT::Get::Never-Include-Phased-Updates: True;

Surely the best setup would be to set some canaries to always include phased updates, and the rest of the fleet to never include them (so you get them once they are at 100% rollout and no longer phased)?

dinosor · on Jan 15, 2023

True. I wondered the same for the canaries. But they did mention why they didn't want to go with the 100% rollout option.

> We could set some very important machines to only get updates when packages reach 100% and stop being phased updates, but Ubuntu has a good record of not blowing things up with eg OpenSSH updates.

imp0cat · on Jan 15, 2023

Absolutely. Also, no need to fix machine id across the entire cluster.

rurban · on Jan 15, 2023

Extremely bad advice given on this page. Don"t mess with the machine-id! Rather turn off phased updates.

Set this in some of your apt.conf.d:

    Update-Manager::Always-Include-Phased-Updates;
    APT::Get::Always-Include-Phased-Updates;

PBondurant · on Jan 15, 2023

Note that this code actually turns phase-updates ON. To turn them OFF:

  Update-Manager::Never-Include-Phased-Updates;
  APT::Get::Never-Include-Phased-Updates: True;

soneil · on Jan 15, 2023

As I understand it, either 'always' or 'never' actually solve OP's problem. (edit: was both/and - both sounds like a bad idea.)

"always include" effectively puts you in phase 0 - if an update is being phased, you're an eager beaver.

"never include" effectively puts you in phase 100 - if an update is being phased, you'll wait until the phasing is complete.

AIUI OP's problem isn't that he wants these updates on day 0, it's that he wants his environments to be consistent with each other, which either of these options would provide.

proactivesvcs · on Jan 15, 2023

Perhaps the advice is bad because this doesn't appear to be mentioned in the documentation.

hoten · on Jan 15, 2023

I think the advice was to set a configuration value for apt that would override the value, not to change the actual machine id.

wongarsu · on Jan 15, 2023

yeah, they are clearly talking about setting the APT::Machine-ID override in /etc/apt/apt.conf.d, not the actual /etc/machine-id

linsomniac · on Jan 15, 2023

Anybody got this to work? I've tried setting the different values from the AskUbuntu page [1]:

    Update-Manager::Always-Include-Phased-Updates;
    APT::Get::Always-Include-Phased-Updates: True;

In a file /etc/apt/apt.conf.d/99phased-updates. I've also tried the Never version of them. Neither one seems to have any effect. The only thing that seems to work is the suggested command-line setting:

    sudo apt -o APT::Get::Always-Include-Phased-Updates=true upgrade

[1]: https://askubuntu.com/questions/1246962/what-apt-configurati...

linsomniac · on Jan 16, 2023

FYI: After just throwing a bunch of stuff at the wall, I've finally come up with the following which does work, at least for apt (haven't tried update manager):

    Update-Manager::Always-Include-Phased-Updates;
    APT::Get::Always-Include-Phased-Updates True;

Note the lack of a ":" in the APT:: line. Putting a : silently causes it to do phased updates.

lima · on Jan 15, 2023

Adding to that, the machine ID is a global identifier and changing it could lead to all sorts of unpredictable consequences elsewhere.

pixl97 · on Jan 15, 2023

Hmm, time to fetch some random characters on each boot for this file and see what starts going on.

Jedd · on Jan 15, 2023

First I've heard of this, and as a long-time Debian user it initially sounded like yet another oddity peculiar to Ubuntu.

It sounds like this problem shouldn't / wouldn't affect most 'business' servers iff you were running the LTS (stable) branch, as security updates aren't phased (as per TFA's comment).

LTS Ubuntu, AIUI, is similar to Debian stable -- it only receives security patches, not new (feature) versions of any packages.

Reviewing apt & apt-get man pages here (I'm on 2.5.4) I see no references to 'phase'.

Reviewing Debian's changelog.gz I see the first phase reference in 2.5.1 (2022-07), and a subsequent reference in 2.5.3 (2022-09) - but nothing about 'phased updates' on the Debian wiki, and per Ubuntu's discourse[0] phased updates were first introduced to apt in version 2.1.16 - so I'm guessing it took a while for those to be fed upstream (to Debian - assuming they are still considered upstream for - dare I say it, the canonical owners of - apt/apt-get).

There's an askubuntu [1] post describing the what & why behind this change, and it sells the reader on the 'improved stability' claim, while downplaying TFA's concerns (potential random inconsistency), while acknowledging poor defaults, tooling, and documentation.

It does feel a bit of an odd solution to me - Debian provides a testing branch which seems to serve this function - some relatively small subset of users will use testing and find your bugs for you. If you seek safety & predictability then you stick with the stable / LTS branch.

In contrast, randomly selecting some subset of your users - with Ubuntu's typical opt-out default - to randomly get an early / deferred updating of some random subset of your installed packages just seems ... well, I can see why TFA was frustrated.

[0] https://discourse.ubuntu.com/t/phased-updates-in-apt-in-21-0...

[1] https://askubuntu.com/questions/1431940/what-are-phased-upda...

usr1106 · on Jan 15, 2023

Will Debian upstrem even accept the feature? From what the blog post says the implementation is user-hostile: No logging what is going on and limited choice, no way to run a canary.

Jedd · on Jan 15, 2023

I've updated my earlier comment to clarify that I was checking the changelog for apt in my Debian system.

On review it doesn't make much sense - searching for 'phase' there's an 'Add support for phased updates' in 2.5.3, but a few months earlier, in 2.5.1, the comment is:

"(Temporarily) Rewrite phased updates using a keep-back approach."

I thought I'd done a case-insensitive search before, but obviously hadn't, as there's one earlier reference in 2021-01 (for v2.1.16), which kind of aligns with the askubuntu's historical mention of 2.1.16 on the timeline, but the description certainly doesn't sound like the base feature (phased updates) introduction.

"Add support for Phased-Update-Percentage, previously used only by update-manager."

usr1106 · on Jan 15, 2023

So you are saying Debian has accepted the feature. Although Debian servers proably don't use it for the time being?

usr1106 · on Jan 15, 2023

Fedora CoreOS has phased updates, too. But being atomic updates of the whole distribution there is no risk of inconsistencies between packages. Either your machine has the new version or not. And the user can choose a priority, do they want updates early or late, so you can run a canary. Not a convincing feature from Ubuntu with such poor tooling, limited user choice and no logging.

AshamedCaptain · on Jan 15, 2023

It's not possible to get inconsistencies here either unless you force it, which is kind of the intention.

usr1106 · on Jan 15, 2023

Given that no bugs exist, it's not possible to get inconsistencies by installing packages in random order because they declare their dependencies. If no bugs existed we would not need updates at all...

p4l4g4 · on Jan 15, 2023

That's why we only allow RFC9225 compliant packages on our systems!

AshamedCaptain · on Jan 17, 2023

And assuming bugs exist, you can get inconsistencies everywhere, so I fail to see the point of this message.

stingraycharles · on Jan 15, 2023

Maybe, but there’s still some state in eg config files that persists over upgrades. If there are incompatibilities there, you’ll still have issues, so it’s not a silver bullet.

cosmin800 · on Jan 15, 2023

Ubuntu is known to break things, quite often, last one that comes to my mind was the sudo behaviour:

On Wed, 2019-05-15 at 02:42:56 +0930, Dan Streetman wrote:

> in Ubuntu, sudo retains the calling user's $HOME > > this is different from upstream sudo as well as all other UNIXes and > even the sudo documentation we provide. Should we remove our custom > patch that adds this behavior?

Ubuntu is diverging.

cuillevel3 · on Jan 15, 2023

Well, when Ubuntu was first released 18 years ago, it was the first big distribution without any open ports in the default installation and no root password. Of course there were hardening guides for Debian, which you could use to shut down the fingerd daemon and the ftp server and get rid of the global administrator account. Linux distributions had so many remotely exploitable bugs, that whole books were written about them. (Windows was still worse)

Other distros slowly started to adapt the "secure by default" policy and came up with different approaches. OpenSUSE for example still uses the root password for sudo. The patch to /etc/sudoers is massive.

I wouldn't expect sudo to behave the same across distros, there is a lot of history to it.

ur-whale · on Jan 15, 2023

> Ubuntu is known to break things, quite often,

Indeed.

Try and install 22.04 on a server that has two exactly identical NVME drives, you're very likely going to be in for a very interesting adventure involving a strange beast called 'multipath' devices.

It was so bad I had to switch back to the legacy text installer for 20.04.

withinboredom · on Jan 15, 2023

Disabling multi path is pretty easy. That being said, months of strange errors and mount issues took a really long time to discover that that was the issue.

ur-whale · on Jan 16, 2023

> Disabling multi path is pretty easy.

At install time?

All my attempts have failed in the following fashion:

    . I boot the text only installer
    . Once it runs I switch to a shell
    . I edit the python code of subiquity / curtin to rip out anything that has to do with mutlipath
    . I disable all systemd shite related to multipath
    . I kill the installer (and systemd restarts it)
    . When I get to partioning, I finally see my NVME devices instead of the weird multipath stuff
    . I create a raid-0 partition on them
    . The installer then fails miserably

Care to share your recipe?

withinboredom · on Jan 17, 2023

Put

```

blacklist { devnode "^sd[a-z0-9]+" }

```

In /etc/multipath.conf (modify for nvme as needed)

tankenmate · on Jan 15, 2023

Wow, that is a surprise. Personally I've never hit it because I use "sudo su -".

cuillevel3 · on Jan 15, 2023

Isn't that identical to 'sudo -i'?

folmar · on Jan 15, 2023

No, `sudo su -` gives you a shell resembling one you would get when logging in interactively as root, while `sudo -i` applies some of its configuration. Which is not always well suited for interactive uses to put it lightly. For example PATH is set to something smaller than I would like.

RealStickman_ · on Jan 15, 2023

It gives you a pretty similar result in the end. From my understanding, with 'sudo -i', you're still using sudo itself to run commands as root (or any other specified user).

'sudo su -' instead executes the 'su -' command, giving you a root shell, as a superuser with 'sudo'. If you left the 'sudo' out, you'd have to type the root password.

tremon · on Jan 15, 2023

sudo -i and sudo -s also give you a root shell. "sudo su" is a tautology that's unnecessary is almost all cases.

yrro · on Jan 15, 2023

    $ sudo -l
    [...]
    
    User yrro may run the following commands on fw33748-02:
        (ALL : ALL) ALL
        (ALL : ALL) !/usr/bin/sudo, !/usr/bin/su, !/bin/su

So

    $ sudo su -
    Sorry, user yrro is not allowed to execute '/usr/bin/su -' as root on fw33748-02.example.qq.

tremon · on Jan 15, 2023

I'm not sure what point you're trying to make, but:

  $ sudo /bin/sh -c su -

It's never useful to deny certain commands to a user if that user is allowed to open a shell. Any shell. So you probably want to change that first line to

  (ALL : ALL) NOEXEC: ALL

and provide a whitelist for all tools that do spawn children as part of their normal operation (such as apt, dpkg, and probably half of all unix tooling).

yrro · on Jan 15, 2023

It's how I've trained myself to avoid 'sudo su -' - by removing my user's ability to use sudo to run su ;)

stefantalpalaru · on Jan 15, 2023

> I use "sudo su -"

You can simplify that to "sudo -i".

paol · on Jan 15, 2023

I noticed this too starting with 21.10 (I think?), and it eventually bothered me so much I went looking for answers and found out about this.

This was an unusually badly communicated change. I'm fairly attentive to this type of stuff and was caught by surprise.

We're about to start rolling out 22.04 at work. I'll make sure to disable this on servers, non-determinism is not a desirable property for system updates.

kmac_ · on Jan 15, 2023

Absolutely not related with the issue: apt started to print some Reddit reference on the screen. I know, not everything has to be serious, but come on, when I deploy an image to a customer I would prefer not to see some random messages.

jeroenhd · on Jan 15, 2023

I don't mind the message itself too much, but I think it was still in particular bad taste because the message was introduced after they started spamming ads into your install logs and people got upset.

"Haha here's how you disable the shitty ads ;)" isn't something I expected, even from Canonical.

Animats · on Jan 15, 2023

The messages are terrible. Software Update tells you that something went wrong with the update, but doesn't tell you what. Synaptic Package Manager says 0 packages are broken. As someone commented in the bug report, "I have seen many people on IRC very* upset after wasting a lot of time trying to install updates that apt will not let them install. Fixing this is critical to our reputation."*

The other big Ubuntu update hassle is a constant string of notifications demanding that you exit applications so they can be updated. However, you have to keep them closed long enough for an updating cycle to notice. And then there's the notification that you need to close the Snap daemon so it can update. The user doesn't start the Snap daemon; it starts at startup and has no desktop presence. Lame. This may have been fixed; I haven't seen that recently.

AnonCoward42 · on Jan 15, 2023

Is that the reason Ubuntu installs half updates and breaking the system the last few months? Three times now the kernel was updated but not the nVidia modules, leaving the user with a system without the actual graphics drivers. And once the wrong kernel was installed for an unknown reason (oem package installed, but I never selected it).

After a manual update it worked once again. Between the update and the fix there might be a few days in between tho as the updates were automatically installed, but the effect/issue can only be seen after a restart.

crakenzak · on Jan 15, 2023

And once; *

Sorry had to correct, for my own understanding =)

Klasiaster · on Jan 15, 2023

> Now that I've looked at all of this and read about APT::Machine-ID, we'll probably set it to a single value across all of our fleet

This is not a good idea, the machine ID is supposed to be unique and shouldn't change over the lifetime - a handful of software relies on this property and it's the best identifier for an installation if you can't rely on hostnames.

asmor · on Jan 15, 2023

APT::Machine-ID overrides it only for APT, you can still have a unique machine-id in /etc

squiffsquiff · on Jan 15, 2023

IIRC this is similar to the experience with Windows where people have been struggling for years to get consistent patch states

sokoloff · on Jan 15, 2023

We struggled a bit and largely succeeded in consistency by hand—picking the patches and using automation to drive them.

What we couldn’t do was get to both consistent and really timely patching states but this was a decade ago where that was arguably slightly less important.

ndsipa_pomu · on Jan 15, 2023

So, that's what happening with the held package updates. I was seeing this on a few boxes, but hadn't looked into why it was happening as it hasn't caused any issues.

I understand why there's people here who don't like that behaviour, but I like the idea of non-security phased updates, so I'll be leaving machines with the default behaviour until I encounter a problem caused by it.

voakbasda · on Jan 16, 2023

I wonder whether this “feature” is a ploy to push more people to their fleet management offering. My own systems are now all out of sync with each other, because they are all in different phases of deployment.

After reading some of the replies here, I think that I ought to disable this feature on all of my systems, but I worry that the whole situation foreshadows the possibility that the Ubuntu maintainers want to play more fast and loose with the “stable” release feeds. You know, move fast and break things.

kramerger · on Jan 15, 2023

TL;DR: a decade ago Ubuntu wanted to update a few computers first, then more computers if no issues were detected. Due to missing tooling, this didn't hit mainstream until now.

A practical problem with these phased updates is that you may end up with different versions of some software on your fleet of (until now) identical servers and you have no way of avoiding it.

danw1979 · on Jan 15, 2023

> then more computers if no issues were detected.

.. a job they should have left to the sysadmins in charge of those machines receiving said updates.

charcircuit · on Jan 15, 2023

Why do you think pushing out broken updates to people and causing them to do extra work is a good thing? It's something that should happen as little as possible.

If the OS developers have the ability to save thousands of man hours of their users they should do so.

SahAssar · on Jan 15, 2023

Ubuntu does a lot of things in both server and desktop that might only make sense on desktop (if even there).

ekianjo · on Jan 15, 2023

You do have a way of avoiding it as mentioned in the link by setting an identical machine id string

AshamedCaptain · on Jan 15, 2023

And then after reusing /etc/machine-id everywhere you spend the day after that debugging miscellaneous systemd issues.

drothlis · on Jan 15, 2023

In the article they don't change /etc/machine-id, but APT::Machine-ID in apt.conf.

philjohn · on Jan 15, 2023

There is a way to avoid it, disable phased updates as detailed here: https://news.ycombinator.com/item?id=34388575

brnt · on Jan 15, 2023

Who runs a fleet without their own apt cache server?

soneil · on Jan 15, 2023

I don't believe mirrors and caches affect this behaviour.

How this works is that the package index contains a stanza Phased-Update-Percentage: giving a number between 0 and 100. Then your client picks a number in the same range, and accepts an update if the repository's value is higher than the client's value.

Whether the fixed value comes from your mirror or Ubuntu's, makes no odds - You still have a fixed value on the server and a variable value on the client, so the outcome is still variable. Caching the index, or mirroring it without regenerating it, still leaves this decision to the end client.

Something else I found interesting looking into this mechanism - the seed for the rng is sourcePackage-version-machineID. So you won't "that one machine that always updates last" or "that one machine that charges head-first into phased updates", it should be randomly distributed for each version of each package.

Symbiote · on Jan 15, 2023

I do, since there's an official apt mirror in the same datacentre.

brnt · on Jan 15, 2023

That should isolate you from this issue, shouldn't it?

3np · on Jan 15, 2023

At the scale of OP with their particular needs (and wishes for canary rollouts), shouldn't it already be about time to host your own internal mirrors regardless? Then you get full control.

ilyt · on Jan 15, 2023

I swear, Ubuntu is at that point just taking what works (Debian) and decides to break shit at random

nequo · on Jan 15, 2023

You can also check if a particular package, call it x, is held back by phased updates by running

  apt-cache policy x

CoastalCoder · on Jan 15, 2023

Wouldn't it make more sense for Canonical to either...

(a) add several more release channels? E.g., latest-phase-1, latest-phase-2, ..., latest-phase-n, or

(b) add highly granular, timestamped releases? E.g., 2023-01-01.1, 2023-01-02.1, 2023-01-02.2 (if needed), etc., or

(c) publish explicit rollout-group channels, and let user pick their group if they want? E.g., stable-stage{1...k}-group{1...j}, and also offer an optional technique for admins to participate in a lottery for which group they draw from?

Then individual computer owners could decide policy for themselves.

I really don't understand how Canonical's views on the proper role of a Linux distro provider could have diverged so far from my own. I think they're the ones who have changed, but I'm not sure.

exabrial · on Jan 15, 2023

I was literally wondering what the hell was going on... applied updates to some servers then came back and saw there were some remaining!

MR4D · on Jan 15, 2023

Early on in the days of OS X (perhaps the first 10 years or so), Apple made a good habit of focusing on one particular subsystem for massive changes, and then tended to only make smaller scale changes across the rest of the OS. I wonder sometimes if that would be too hard to replicate as a Linux OS vender. Seems like the approach worked well and it could again.

smoldesu · on Jan 15, 2023

That's kinda how Linux operates, just without a project manager. A good example is the audio subsystem Linux had 2 or 3 years ago - pretty bad, dropped Bluetooth connections all the time and broke on newer systems. So, the community came together and wrote PipeWire to fix these problems. The end-result is uncharacteristically impressive, in my opinion.

Still though, the last thing I want the Linux community to do is drop everything to focus on $SOMETHING. The people using Linux are a diverse audience with many different use-cases, and naturally not all of them want another MacOS to babysit.

MR4D · on Jan 15, 2023

My point is more that distributions don’t have to update everything in a release - they can focus their energy on certain new things rather than everything.

secondcoming · on Jan 15, 2023

I'm not sure what's going on here. Does it only apply to `apt upgrade` or also to `apt update`? Is it possible that my local dev environment uses different packages to our CI build system?

warystarship · on Jan 15, 2023

I'm running ubuntu 22.04 as my daily driver and managed to get held-back updates to run using "sudo aptitude safe-upgrade". Right or wrong move?

baggy_trough · on Jan 15, 2023

It is highly annoying that one can have beta servers with out of sync updates compared to production servers.

markstos · on Jan 15, 2023

It sounds like the issues are avoided if you only apply the security updates.

throwawaaarrgh · on Jan 15, 2023

The wonderful world of corporate OSS. Your needs come second to the company. Remember, kids: if you're not paying for it, you're the product.