Ahh, /etc/machine-id, we meet again... Last time I met you, you were causing my cloned VM to get the same DHCP address as its original and blow up my VM network, despite libvirt's virt-clone utility properly randomizing vNIC MAC addresses, because netplan defaults to using you as the DHCP client identifier. If only we could meet under more pleasant circumstances.
NixOS has a related problem in that you can't set the machine's MAC address without a hack:
# Hack: Change the default MAC address after network but before dhcpcd runs
systemd.services.setmacaddr = {
script = ''
/run/current-system/sw/bin/ip link set dev eth0 address ${macaddr}
/run/current-system/sw/bin/systemctl stop dhcpcd.service
/run/current-system/sw/bin/ip addr flush eth0
/run/current-system/sw/bin/systemctl start dhcpcd.service
'';
wantedBy = [ "basic.target" ];
after = [ "dhcpcd.service" ];
};
If you don't do this, it uses some black magic to decide the MAC address based on various hardware, making system migration and maintenance a nightmare.
Is it NixOS doing it or is it e.g. systemd-networkd? (Sounds like something systemd would do.)
While I’m confused as to why the boot would need to decide on a MAC address (VM or cheap SBC without a burned-in address? the first case might be easier to correct from the outside), the general state of NixOS is that some things are very flexible while others only cover some common cases (ones that the original author needed to solve). Unlike a traditional distro where the package manager will complain if you replace distro-provided stuff, in NixOS it’s entirely possible to override parts that don’t work for you rather than paper over them with programmatic overrides like these. It’s not even hard to upstream your changes if you make them backwards-compatible, although the benefit can be limited because the testing is not particularly thorough so other changes may still inadvertently break them.
I gave up trying to find a root cause for this after a couple of days of rabbit holes and yak shaving. NixOS is simply too impenetrable once you fall off the happy path (which is unnervingly often).
The next time I rebuild this server, I'll go back to Ubuntu or Debian and use build scripts for "good enough" determinism. For the time being, I just run everything important in LXC and Docker containers on top of this delicately balanced NixOS hypervisor for as long as it'll last.
It’s the fate of every config generator, yes, although I find NixOS better than average in that respect (between Arch and Debian in how easy it is to figure out what the hell it’s doing to the underlying software).
Troubleshooting guides for NixOS are non-existent, but the system itself is not all that difficult to inspect: two things you can do is `nixos-rebuild build` without switching and meditate on ./result/; and inspect `(builtins.getFlake(toString ./.).nixosConfigurations` in `nix repl` (use import etc. if not using flakes) as that will include every derived setting down to the text of generated config files, not only those you specified explicitly.
But you might’ve just nerd-sniped me, we’ll see.
ETA: Looks like it’s indeed systemd-networkd’s doing[1]. For a static interface, setting `networking.interfaces.${NAME}.macAddress`[2] to the desired value should work.
We (as far as I can tell) use some parts of systemd-networkd by default (as in, even if you haven't enabled it), as MAC addresses via `networking.interfaces.<name>.macAddress` are set through it: https://github.com/NixOS/nixpkgs/blob/65e07f20cf04f5db9921dc...
Good point. But something has to be setting those MAC addresses...
And it looks like, curiously, explicitly configured interfaces have their setup expressed as .link units even if networkd is not in use[1]. A comment[2] states: “.link units are honored by udev, no matter if systemd-networkd is enabled or not”.
It seems that .link units are nowadays interpreted not by networkd (which NixOS gates with useNetworkd) but by udevd (which it does not). The documentation for them (but not for udevd) even points that out[3] if you’re the kind of person who reads introductions: “link.link: A plain ini-style text file that encodes configuration for matching network devices, used by systemd-udevd(8) and in particular its net_setup_link builtin”.
virt-clone only takes care of the libvirt config but won't touch unique identifiers in your disk image.
/etc/machine-id isn't the only thing to worry about when duplicating a VM - think SSH host key, DHCP leases, various filesystem UUIDs, log files, MAC addresses in ifcfg-* files and udev-persistent-net rules...
You can use virt-sysprep[1] to clean up a disk image.
Ansible has a non-trivial transition cost (speaking from experience here). Cloning VMs is a legitimate stopgap measure, because that’s sometimes the best you can do if available engineer hours are tight.
Ansible (as a proxy for infrastructure as code in general) being hard to implement is a warning that your setup is too convoluted. Cloning VMs is the high-interest unsecured loan of tech debt, and when that bill comes due it’s going to be much worse than spending a few days on some scripting.
Completely agree on all points, except for "few days". It took much longer than that, but partly due to learning Ansible as I went. The pain was still worth it, though.
Or better, cloud-init. It's same thing that cloud images to use to specialise VM "clones" to their environment, so is more likely to do all the right things. If it has a mechanism to detect that it's been cloned (eg. instance-id on EC2, etc) then it will deal with ssh key regeneration and everything else.
As another comment pointed out elsewhere you are not supposed to touch /etc/machine-id to control apt. It has its own Apt::Machine-ID you can change without breaking anything else.
Back in the NT4/Windows 2k days I recall having to do some extra steps to modify an identifier in the registry (or similar) when cloning Windows images.
Otherwise the clone would not properly register on the network. Perhaps it was only when speaking to the Domain Controller though.
IIRC the later versions of Norton Ghost, which was what we used, did this process for us automatically.
There are some conspiracy theories, fuelled by the fact that Russinovich only pulled the tool after being acquihired by Microsoft.
It's undeniable, in my experience, that the tool did help, despite all the swearing to the contrary. Making a leap from there to believing that it probably made it too easy to clone Windows machines in a way that Microsoft had no control on, and hence asked him to pull it, doesn't seem so crazy though.
You just needed to actually read the documentation and use sysprep (ideally with an unattend file). Just remember to image it before sysprepping since you can only run it a few times. There are a ton of things sysprep does that are really helpful and not handled at all by tools like that.
Microsoft doesn’t care if you clone systems, (why would they?) they care about the volume of help desk tickets created by doing it wrong.
> You just needed to actually read the documentation and use sysprep [...] There are a ton of things sysprep does that are really helpful
And a lot that are not helpful, iirc. It was probably 10 years ago when I had to deal with this, but my recollection is that Sysprep was messing with a lot of stuff - more than I needed.
> Microsoft doesn’t care if you clone systems, (why would they?)
Lol, licenses, of course. Nowadays they got a bit softer on the issue, but back then they were still very very twitchy about duplicating and virtualizing systems.
We have very different memories, then. The official docs were really good as far back as 2006 or so, they just required you to actually read them. Sysprep was certainly a powerful tool, but if you told it not to generalize and no OoBE, it mostly just reset the SID. Most people I ran into who had issues were using some random poorly written blog posts that were really just content farming for ad money.
Huh? I’ll grant you that Microsoft was picky about being paid for their software, but they produced a ton of tools to support cloning and duplicating. License compliance was handled with audits and CALs, not some weird cabal of anti-imaging.
Ah yes, that was it. I do recall we got error messages preventing the machine from working, and the guys spent some time researching before a solution was found. The error messages went away after changing the SID.
What is the reasonable for not use the mac as the identifier?
Trying to think of a use case I thought of wireless + wired, would be kind of neat if they had the same IP. But that falls apart completely if you have them connected at the same time (which I often do).
At one point I set up my wired and wireless interfaces as a bond with wired as the primary, and I could do a file transfer and watch the speed go up or down as I plugged and unplugged the wired interface. That was pretty slick.
I did, but that blog was taken offline the beginning of this year when the company I was a part of when I wrote it went out off business. It was horribly out of pace with modern setups though, I did that ~20 years ago, it used a network manager that no longer exists.
I'm not sure how it'd fit into a Network Manager or systemd world. There really wasn't any particular trick to it, IIRC I used link monitoring to detect link failure, though I might have used ARP, and set the ethernet as the primary interface, just using the standard Linux bond driver.
Can't you just get rid of that file entirely? I would assume that there is a (sane) fallback for DHCP in case this file is missing? What other uses besides the phased updates and DHCP does that file solve?
The article repeats that they would like to set up a canary server but can't. But they also link to an askubuntu answer which mentions these two settings:
Surely the best setup would be to set some canaries to always include phased updates, and the rest of the fleet to never include them (so you get them once they are at 100% rollout and no longer phased)?
True. I wondered the same for the canaries. But they did mention why they didn't want to go with the 100% rollout option.
> We could set some very important machines to only get updates when packages reach 100% and stop being phased updates, but Ubuntu has a good record of not blowing things up with eg OpenSSH updates.
As I understand it, either 'always' or 'never' actually solve OP's problem. (edit: was both/and - both sounds like a bad idea.)
"always include" effectively puts you in phase 0 - if an update is being phased, you're an eager beaver.
"never include" effectively puts you in phase 100 - if an update is being phased, you'll wait until the phasing is complete.
AIUI OP's problem isn't that he wants these updates on day 0, it's that he wants his environments to be consistent with each other, which either of these options would provide.
In a file /etc/apt/apt.conf.d/99phased-updates. I've also tried the Never version of them. Neither one seems to have any effect. The only thing that seems to work is the suggested command-line setting:
FYI: After just throwing a bunch of stuff at the wall, I've finally come up with the following which does work, at least for apt (haven't tried update manager):
First I've heard of this, and as a long-time Debian user it initially sounded like yet another oddity peculiar to Ubuntu.
It sounds like this problem shouldn't / wouldn't affect most 'business' servers iff you were running the LTS (stable) branch, as security updates aren't phased (as per TFA's comment).
LTS Ubuntu, AIUI, is similar to Debian stable -- it only receives security patches, not new (feature) versions of any packages.
Reviewing apt & apt-get man pages here (I'm on 2.5.4) I see no references to 'phase'.
Reviewing Debian's changelog.gz I see the first phase reference in 2.5.1 (2022-07), and a subsequent reference in 2.5.3 (2022-09) - but nothing about 'phased updates' on the Debian wiki, and per Ubuntu's discourse[0] phased updates were first introduced to apt in version 2.1.16 - so I'm guessing it took a while for those to be fed upstream (to Debian - assuming they are still considered upstream for - dare I say it, the canonical owners of - apt/apt-get).
There's an askubuntu [1] post describing the what & why behind this change, and it sells the reader on the 'improved stability' claim, while downplaying TFA's concerns (potential random inconsistency), while acknowledging poor defaults, tooling, and documentation.
It does feel a bit of an odd solution to me - Debian provides a testing branch which seems to serve this function - some relatively small subset of users will use testing and find your bugs for you. If you seek safety & predictability then you stick with the stable / LTS branch.
In contrast, randomly selecting some subset of your users - with Ubuntu's typical opt-out default - to randomly get an early / deferred updating of some random subset of your installed packages just seems ... well, I can see why TFA was frustrated.
Will Debian upstrem even accept the feature? From what the blog post says the implementation is user-hostile: No logging what is going on and limited choice, no way to run a canary.
I've updated my earlier comment to clarify that I was checking the changelog for apt in my Debian system.
On review it doesn't make much sense - searching for 'phase' there's an 'Add support for phased updates' in 2.5.3, but a few months earlier, in 2.5.1, the comment is:
"(Temporarily) Rewrite phased updates using a keep-back approach."
I thought I'd done a case-insensitive search before, but obviously hadn't, as there's one earlier reference in 2021-01 (for v2.1.16), which kind of aligns with the askubuntu's historical mention of 2.1.16 on the timeline, but the description certainly doesn't sound like the base feature (phased updates) introduction.
"Add support for Phased-Update-Percentage, previously used only by update-manager."
Fedora CoreOS has phased updates, too. But being atomic updates of the whole distribution there is no risk of inconsistencies between packages. Either your machine has the new version or not. And the user can choose a priority, do they want updates early or late, so you can run a canary. Not a convincing feature from Ubuntu with such poor tooling, limited user choice and no logging.
Given that no bugs exist, it's not possible to get inconsistencies by installing packages in random order because they declare their dependencies. If no bugs existed we would not need updates at all...
Maybe, but there’s still some state in eg config files that persists over upgrades. If there are incompatibilities there, you’ll still have issues, so it’s not a silver bullet.
Ubuntu is known to break things, quite often, last one that comes to my mind was the sudo behaviour:
On Wed, 2019-05-15 at 02:42:56 +0930, Dan Streetman wrote:
> in Ubuntu, sudo retains the calling user's $HOME
>
> this is different from upstream sudo as well as all other UNIXes and
> even the sudo documentation we provide. Should we remove our custom
> patch that adds this behavior?
Well, when Ubuntu was first released 18 years ago, it was the first big distribution without any open ports in the default installation and no root password. Of course there were hardening guides for Debian, which you could use to shut down the fingerd daemon and the ftp server and get rid of the global administrator account. Linux distributions had so many remotely exploitable bugs, that whole books were written about them. (Windows was still worse)
Other distros slowly started to adapt the "secure by default" policy and came up with different approaches.
OpenSUSE for example still uses the root password for sudo. The patch to /etc/sudoers is massive.
I wouldn't expect sudo to behave the same across distros, there is a lot of history to it.
Try and install 22.04 on a server that has two exactly identical NVME drives, you're very likely going to be in for a very interesting adventure involving a strange beast called 'multipath' devices.
It was so bad I had to switch back to the legacy text installer for 20.04.
Disabling multi path is pretty easy. That being said, months of strange errors and mount issues took a really long time to discover that that was the issue.
All my attempts have failed in the following fashion:
. I boot the text only installer
. Once it runs I switch to a shell
. I edit the python code of subiquity / curtin to rip out anything that has to do with mutlipath
. I disable all systemd shite related to multipath
. I kill the installer (and systemd restarts it)
. When I get to partioning, I finally see my NVME devices instead of the weird multipath stuff
. I create a raid-0 partition on them
. The installer then fails miserably
No, `sudo su -` gives you a shell resembling one you would get when logging in interactively as root, while `sudo -i` applies some of its configuration. Which is not always well suited for interactive uses to put it lightly. For example PATH is set to something smaller than I would like.
It gives you a pretty similar result in the end. From my understanding, with 'sudo -i', you're still using sudo itself to run commands as root (or any other specified user).
'sudo su -' instead executes the 'su -' command, giving you a root shell, as a superuser with 'sudo'. If you left the 'sudo' out, you'd have to type the root password.
I'm not sure what point you're trying to make, but:
$ sudo /bin/sh -c su -
It's never useful to deny certain commands to a user if that user is allowed to open a shell. Any shell. So you probably want to change that first line to
(ALL : ALL) NOEXEC: ALL
and provide a whitelist for all tools that do spawn children as part of their normal operation (such as apt, dpkg, and probably half of all unix tooling).
I noticed this too starting with 21.10 (I think?), and it eventually bothered me so much I went looking for answers and found out about this.
This was an unusually badly communicated change. I'm fairly attentive to this type of stuff and was caught by surprise.
We're about to start rolling out 22.04 at work. I'll make sure to disable this on servers, non-determinism is not a desirable property for system updates.
Absolutely not related with the issue: apt started to print some Reddit reference on the screen. I know, not everything has to be serious, but come on, when I deploy an image to a customer I would prefer not to see some random messages.
I don't mind the message itself too much, but I think it was still in particular bad taste because the message was introduced after they started spamming ads into your install logs and people got upset.
"Haha here's how you disable the shitty ads ;)" isn't something I expected, even from Canonical.
The messages are terrible. Software Update tells you that something went wrong with the update, but doesn't tell you what. Synaptic Package Manager says 0 packages are broken. As someone commented in the bug report, "I have seen many people on IRC very* upset after wasting a lot of time trying to install updates that apt will not let them install. Fixing this is critical to our reputation."*
The other big Ubuntu update hassle is a constant string of notifications demanding that you exit applications so they can be updated. However, you have to keep them closed long enough for an updating cycle to notice. And then there's the notification that you need to close the Snap daemon so it can update. The user doesn't start the Snap daemon; it starts at startup and has no desktop presence. Lame. This may have been fixed; I haven't seen that recently.
Is that the reason Ubuntu installs half updates and breaking the system the last few months? Three times now the kernel was updated but not the nVidia modules, leaving the user with a system without the actual graphics drivers. And once the wrong kernel was installed for an unknown reason (oem package installed, but I never selected it).
After a manual update it worked once again. Between the update and the fix there might be a few days in between tho as the updates were automatically installed, but the effect/issue can only be seen after a restart.
> Now that I've looked at all of this and read about APT::Machine-ID, we'll probably set it to a single value across all of our fleet
This is not a good idea, the machine ID is supposed to be unique and shouldn't change over the lifetime - a handful of software relies on this property and it's the best identifier for an installation if you can't rely on hostnames.
We struggled a bit and largely succeeded in consistency by hand—picking the patches and using automation to drive them.
What we couldn’t do was get to both consistent and really timely patching states but this was a decade ago where that was arguably slightly less important.
So, that's what happening with the held package updates. I was seeing this on a few boxes, but hadn't looked into why it was happening as it hasn't caused any issues.
I understand why there's people here who don't like that behaviour, but I like the idea of non-security phased updates, so I'll be leaving machines with the default behaviour until I encounter a problem caused by it.
I wonder whether this “feature” is a ploy to push more people to their fleet management offering. My own systems are now all out of sync with each other, because they are all in different phases of deployment.
After reading some of the replies here, I think that I ought to disable this feature on all of my systems, but I worry that the whole situation foreshadows the possibility that the Ubuntu maintainers want to play more fast and loose with the “stable” release feeds. You know, move fast and break things.
TL;DR: a decade ago Ubuntu wanted to update a few computers first, then more computers if no issues were detected. Due to missing tooling, this didn't hit mainstream until now.
A practical problem with these phased updates is that you may end up with different versions of some software on your fleet of (until now) identical servers and you have no way of avoiding it.
Why do you think pushing out broken updates to people and causing them to do extra work is a good thing? It's something that should happen as little as possible.
If the OS developers have the ability to save thousands of man hours of their users they should do so.
I don't believe mirrors and caches affect this behaviour.
How this works is that the package index contains a stanza Phased-Update-Percentage: giving a number between 0 and 100. Then your client picks a number in the same range, and accepts an update if the repository's value is higher than the client's value.
Whether the fixed value comes from your mirror or Ubuntu's, makes no odds - You still have a fixed value on the server and a variable value on the client, so the outcome is still variable. Caching the index, or mirroring it without regenerating it, still leaves this decision to the end client.
Something else I found interesting looking into this mechanism - the seed for the rng is sourcePackage-version-machineID. So you won't "that one machine that always updates last" or "that one machine that charges head-first into phased updates", it should be randomly distributed for each version of each package.
At the scale of OP with their particular needs (and wishes for canary rollouts), shouldn't it already be about time to host your own internal mirrors regardless? Then you get full control.
(c) publish explicit rollout-group channels, and let user pick their group if they want? E.g., stable-stage{1...k}-group{1...j}, and also offer an optional technique for admins to participate in a lottery for which group they draw from?
Then individual computer owners could decide policy for themselves.
I really don't understand how Canonical's views on the proper role of a Linux distro provider could have diverged so far from my own. I think they're the ones who have changed, but I'm not sure.
Early on in the days of OS X (perhaps the first 10 years or so), Apple made a good habit of focusing on one particular subsystem for massive changes, and then tended to only make smaller scale changes across the rest of the OS. I wonder sometimes if that would be too hard to replicate as a Linux OS vender. Seems like the approach worked well and it could again.
That's kinda how Linux operates, just without a project manager. A good example is the audio subsystem Linux had 2 or 3 years ago - pretty bad, dropped Bluetooth connections all the time and broke on newer systems. So, the community came together and wrote PipeWire to fix these problems. The end-result is uncharacteristically impressive, in my opinion.
Still though, the last thing I want the Linux community to do is drop everything to focus on $SOMETHING. The people using Linux are a diverse audience with many different use-cases, and naturally not all of them want another MacOS to babysit.
My point is more that distributions don’t have to update everything in a release - they can focus their energy on certain new things rather than everything.
I'm not sure what's going on here. Does it only apply to `apt upgrade` or also to `apt update`? Is it possible that my local dev environment uses different packages to our CI build system?