Hacker News new | past | comments | ask | show | jobs | submit login
Systemd: Enable indefinite service restarts (stapelberg.ch)
126 points by secure 9 months ago | hide | past | favorite | 78 comments



> Why does systemd give up by default?

> I’m not sure. If I had to speculate, I would guess the developers wanted to prevent laptops running out of battery too quickly because one CPU core is permanently busy just restarting some service that’s crashing in a tight loop.

sigh … bounded randomized exponential backoff retry.

(exponential: double the maximum time you might wait each iteration. Randomized: the time you want is a random amount, between [0, current maximum] (yes, zero.). Bounded: you stop doubling at a certain point, like 5 minutes, so that we'll never wait longer than 5 minutes; otherwise, at some point you're waiting for ∞s, which I guess is like giving up.)

(The concern about logs filling up is a worse one. It won't directly solve this, but a high enough max wait usually slows the rate of log generation enough that it becomes small enough to not matter. Also do your log rotations on size.)


Arguably, this logic should live in another place that monitors the service.

Especially that service startup failure is usually not something that gets fixed on its own, like a network connection (where exponential backoff is (in)famous). A bad config file, or a failed disk won’t recover in 10 minutes on its own, so systemd’s default makes sense here, I believe.


systemd is a service monitor. It wouldn't be nearly as useful if it wasn't!


From the servers perspective, external problems typically do get fixed on their own. It is nice when resolving the primary issue is sufficient to fix the entire system; instead of needing to resolve the primary issue; then fix all the secondary and tertiary issues.

At my work, we have a simple philosophy for this. The tester is allowed to (on the test system): toggle servers' power; move around network cables; input bad configuration; etc; in any permutation he wants. So long as at the end of the exersise everything is setup correctly the system should function nominally (potentially after a reasonable delay).

There should, of course, be a system level dashboard that notifies someone there is a problem; but that is unrelated to the server internal retry logic.


Q: Why is the optimal lower bound zero and not "at least as long as you waited last time"?


edit: I did some more investigation, and I missed something crucial: The distribution of requests over time becomes very wonky if the lower bound isn't 0. It stabilizes given enough time, but that time seems very long. Whereas lower-bound 0 quickly becomes uniform.

See the following figure, where I plot the # of requests over type: https://i.imgur.com/PNUFhjc.png

And that is why you should use 0.

---

I am not an expert, but I got nerd-sniped.

I have made some simulations[1] and done some napkin math[2], and I would summarize as follows:

I don't think there is any globally optimal, I think it depends on your exact circumstances, and which of {excess waiting time, excess load} you want to optimize. Defaulting to 0 is a lot easier to implement, and is at most a factor 2 worse than the optimal lower bound w.r.t. waiting time vs. load. You may also consider [0.5 * maximum, maximum], which trades some excess waiting time for less load. Your suggestion is a similar heuristic one might use depending on their exact circumstances.

[1] https://gist.github.com/Dryvnt/1984d9389ae7386127f5e8998bf52...

[2] Consider a family of random bounded exponential back-off strategies with ultimate upper bound U as follows:

  Strategy X(z): Random of [z * U, U], where z is constant, 0 <= z <= 1
There are other families of back-off algorithms with other characteristics, and I am not considering or comparing those, just this family. Note that strategy OP suggests is X(0). Consider that X(1) is non-random, which is undesirable.

Simple probability tells us

  avg(X(z)) = (1 + z) * U / 2
Server load ~= frequency, frequency is inverse duration, so

  load(X(z)) ~= 1 / avg(X(z)) = 2 / (1 + z) * U
The difference in load between any two X strategies is

  load(X(z1)) / load(X(z2)) = (1 + z1) / (1 + z2)
Since z1 and z2 are constant, this relative load is constant. Due to the bounds on z, the largest possible relative load is

  load(X(1)) / load(X(0)) = 2 / 1 = 2
Now consider your suggested strategy

  Strategy Y: Random of [L, U], where L is the last choice
Note that

  Y = X(y), where y = L / U.
In fact, Y approaches X(1) exponentially fast, since the difference between L and T is halved each step, on average. So your suggestion still falls within this at-most-factor-2 difference. Exactly where just depends on the outage length.


Yeah, all fair. What piqued my curiosity is that you could wait _less_ time than you did before (potentially not at all!) which feels like the opposite of what you want to do in such situations.


Indeed, it was a good question. That's why I couldn't let it go either. I personally find the result quite interesting. Counterintuitive, but the most uniformly distributed load is probably the most likely to recover.

As always, https://xkcd.com/356/


Regardless, all this opinionated settings should be by OS maintainers or similar. I don’t see why a low level init system tries to make decisions for others. Yes, it may be with good intentions, but don’t.


Seems like OS maintainers can set those settings, what exactly is the problem?


Systemd has to pick something as a default. Distributions are more than capable of changing the default on their builds of Systemd if they want to. Feel free to file a bug with Redhat, or Debian, or whoever maintains your distribution and see if they want to change the default on their system.


The amount of times i had to fight and debug systemd compared to any other init system is at least 10x.

Yes it does a lot of stuff for you and in others I had to write custom scripts but it was much more understandable and maintainable long term. Sadly systemd won and now i build my own OS without it.


Even for basic service running, complex dependencies are so much more manageable in systemd.

I'm glad systemd "won"; it's much more maintainable IMO than shell scripts written once and forgotten about (until they break).


Agreed, I have some friends that don't like systemd, and I respect their opinions greatly, but my experience with systemd has been quite good over the last decade. I really like it much more than the old init.d scripts.

With one exception... I tried to set up a inetd-like service where a script was run on TCP connection with the output going out to the socket. It was tricky to get set up, but eventually I got it working... or so I thought. A few days later the system was super sluggish and I found that every connection that had been made to this service was left open and consuming resources. Switched that to an socat running under systemd and it's been fine since. Never could figure out what exactly the deal was.


With all due respect, I just don’t believe that. Perhaps it’s just rosy glasses on you, or the modern complexity of services/their dependencies.


> I would guess the developers wanted to prevent laptops running out of battery too quickly

And I would guess sysadmins also don't like their logging facilities filling the disks just because a service is stuck in a start loop. There are many reasons to think a service failing to start multiple times in a row won't start. Misconfiguration is probably the most frequent reason for that.


Exactly. If a service crashes within a second ten times in a row, it's not going to come up cleanly an eleventh time. The right thing to do is stay down, and let monitoring get the attention of a human operator who can figure out what the problem is. Continually rebooting is just going to fill up logs, spam other services, and generally make trouble.

I'm sure there are exceptions to this. For those, set Restart=always. But it's an absolutely terrible default.


It might actually, if a network connection is temporarily down.


Or a disk not attached yet. Or another service it depends on being slow to finish starting up.


So, you two know how systemd gets heat for doing too much, right?

This is one of those things.

The 'After=' and 'Requires=' directives address this.

Depends on a mount? Point those directives at a '.mount' unit.

Depends on networking, perhaps a specific NIC? Point those directives at 'systemd-networkd-wait-online@$REQUIRED_NIC.service'

Point being: declare these things, don't wait for entropy to eventually become stable.


After and Requires are only when starting the service though. If a service (stupidly) crashes when a network connection is temporarily down (someone tripped over the router's power cord?), it needs to restart until the network connection is back up.


Sure, but now we're kind of back where we started: 'Restart='

With the requirements properly laid out we've avoided restarting in a loop and a bit of robustness

There's also 'PartOf=' which can help make the relationship bidirectional


I get your point, but these features are the bare minimum any boot system should have. If someone calls that “bloat”, they should go back and hit rocks together.


Agreed. Relationships in 'init' are principle

Back on point though: don't expect the 11th restart to work when the last 10 didn't.

Contrived examples are contrived, it's solved. Declaring dependencies.


Interestingly, the kubernetes approach is the opposite one. Dependencies between pods / software components are encouraged to be a little softer, so that the scheduler is simpler.

Starting up, noticing that the environment doesn't have what you need yet and dying quickly appears to be The Kubernetes Way. A scheduler will eventually restart you and you'll have another go. Repeat until everything is up.

The kubelet operates the same way afair. On a node that hasn't joined a cluster yet, it sits in a fail/restart loop until it's provisioned.


Heh. We used syslog at one place, with it configured to push logs into ELK. The ingestion into ELK broke … which caused syslog to start logging that it couldn't forward logs. Now that might seem like screaming into a void, but that log went to local disk, and syslog retried it as fast as disk would otherwise allow, so instantly every machine in the fleet started filling up its disks with logs.

(You can guess how we noticed the problem…)

Also logrotate. (And bounded on size.)


it's wild how easy it is to misconfigure (or not configure) logrotate properly and have a log file fill up the disk. Out of memory and/or out of disk are the two error cases that have led to the most pain in my career. I think most people who started with docker in the early days (long before there was a docker system prune) had this happen where old docker containers/images filled up the disk and wreaked havoc at an unsuspecting point.


I used to joke that if VMware engineers couldn't figure out the logrotate configuration for their own product for a few releases, what chance do I have?


I've seen bad service design having e.g.

Before=systemd-user-sessions.service

This means that as long as systemd is trying to (re)start the service, nobody can log in. Which is a problem with infinite restarts.

It's still pretty easy to accidentally set up an infinite restart loop with the default settings if your service takes more than 2s to crash.


This must be a different philosophy. When I see something like this happening, I investigate to find out why the service is failing to start, which usually uncovers some dependency that can be encoded in the service unit, or some bug in the service.


I think the author's specified use case is to address transient conditions that drive failures.

When the given (transient) condition goes away (either passively, or because somebody fixed something), then the service comes back without anyone needing to remember to restart the (now dead) service.

By way of example, I've run apps that would refuse to come up fully if they couldn't hit the DB at startup. Alternatively, they might also die if their DB connection went away. App lives on one server; DB lives on another.

It'd be awfully nice in that case to be able to fix the DB, and have the app service come back automatically.


Imagine you use systemd to manage daemons in a large distributed system. Crashes could be caused by a failure in a dependency. Once you fix the dependency, you want all your systems to recover as quickly as possible, you don't want to go through each one of them to manually restart things.

This doesn't mean that you don't investigate, it just means that you have an additional guarantee that the system can automatically eventually recover.

If you set a limit on number or time or restart, what's a reasonable limit? That will be context dependent, and as soon as it's more than a few minutes, it may as well be infinite.


If your server has a bug that makes it crash every two hours you still want it up the rest of the time until you fix it.


That's exactly why systemd should blindly attempt to restart the service infinitely. Seperation of concerns. An init system should simply start and monitor services. That is what an init system is meant to do. The fact that systemd is overengineered and tries to do multiple things causes headaches for a lot of us. Busybox-init is one of the best alternatives, I would use that everywhere if I could.


It's trivial to make systemd do that if that is what you want, but there are also plenty of cases when that is not what you want and you then end up trying to write crash-proof startup scripts to provide backoff instead of just changing a flag in a unit file.

(And if you want a dumb unit system, there are plenty of options which will run just fine under systemd as a single unit so you never have to actually use systemd for your own services even if you're forced to use systemd for the overall system for whatever reason)


Of course you understand you can do both, like I do.


...and now you know why I don't run systemd. I believe their thought process is: what would Windows do? This is an example. For instance, the desktop shell still crashes often. In the old days, this would lock up the keyboard and mouse, and you'd have to power cycle. But MS "fixed" it by simply adding infinite restarts to the system. Now we have systemd. When something crashes, there's no need to fix the bug, just restart it.

My favorite new misfeature is PulseAudio. These geniuses actually built code for a multi-user, multi-tasking OS...which will only run for ONE user, and then only if that user is logged in. So forget running cron jobs, and sounding an alert if something needs attention.

This is all code produced by FreeDesktop[.]org. Thanks to them, your industrial strength, mission-critical server OS is now only suitable for single-user desktop systems.


I can understand avoiding infinite restarts when there is something clearly wrong with configuration, but I can't figure out why they made the "systemctl restart" command also limited by this. For services which don't support dynamic reloading, restarting them is a substitute for that. This makes "systemctl restart" extremely brittle when used from scripts.

Nobody accidentally runs "systemctl restart" too fast, when such a command is issued it is clearly intentional and should be always respected by systemd.


systemctl just uses dbus, as far as I understand, and someone can easily send dbus commands too fast


Recently discovered while making a monitoring script that systemd exposes a few properties that can be used to alert on a service that is continuously failing to start if it's set to restart indefinitely.

    # Get the number of restarts for a service to see if it exceeds an arbitrary threshold.
    systemctl show -p NRestarts "${SYSTEMD_UNIT}" | cut -d= -f2

    # Get when the service started, to work out how long it's been running, as the restart counter isn't reset once the service does start successfully.
    systemctl show -p ActiveEnterTimestamp "${SYSTEMD_UNIT}" | cut -d= -f2

    # Clear the restart counter if the service has been running for long enough based on the timestamp above
    systemctl reset-failed "${SYSTEMD_UNIT}"


It would be nice if `RestartSec` weren't constant.

Then you could have the default be 100ms for one-time blips, but (after a burst of failures) fall back gradually to 10s to avoid spinning during longer outages.

That said, beware of failure chains causing the interval to add up. AFAIK there's no way to have the kernel notify you of when a different process starts listening on a port.


> AFAIK there's no way to have the kernel notify you of when a different process starts listening on a port.

You can use mandatory access control for this.

AppArmour or SELinux are examples.

Unfortunately they are hard, not sexy and sysadmins (people who tend to do not sexy hard things) are a dead/dying breed


There's `RestartSteps` and `RestartMaxDelaySec` for that, see the manpage `systemd.service`.


Ah, not in the man page on my system.

Available since systemd 254, released July 2023 (only 1 release since then). Huh, has release rate severely slowed down?


> AFAIK there's no way to have the kernel notify you of when a different process starts listening on a port.

For startup, I’d argue the proper way is for the process to bind the socket before forking as a daemon.

With such a design, one can launch a list of dependent processes without worrying how long they each take to start up. No polling loops needed!

Of course, that requires some careful design — “fork and forget” is too appealing. The process would be responsible for creating its PID file after forking but the socket before…

Alternatively, an IPC notification could be used but would require some sort of standardization to be generally useful.


> AFAIK there's no way to have the kernel notify you of when a different process starts listening on a port.

Would the ExecCondition be appropriate here, minimally, with a script that runs `lsof -nP -iTCP:${yourport} -sTCP:LISTEN`?


I'm talking: once your process has started, how do you wait for a process you depend on?

Obviously if systemd opens the port for you it's easy enough (in this case, even across machines), but otherwise you have to do a sleep loop. And I'm not sure how dependency restarts work in this case.

ExecCondition must moves the spin to systemd, and has more overhead than doing in your own process. There's no point in gratuitously restarting after all.


I've always preferred daemontools and runit's ideology here. If a service dies, wait one second, then try starting it. Do this forever.

The last thing I need is emergent behavior out of my service manager.


Systemd can do that exactly that. it just doesn't do that by default. But if that's what you want, it's trivial


... and how many of us knew this before the article?


Anyone who read its documentation, which is comprehensive and clear.


Indeed, if I hadn't written my own .service file then I probably wouldn't know, but when you write a service file I don't think it's unreasonable to expect that the writer looks up the possible settings to make sure they're configuring it optimally. It's fairly easy to find a list and (IMHO) the parameter names are very self-descriptive


Comprehensive, yes.


Is it possible to do this system wide? Or do I have to do it for each individual service? It may be a trivial amount of work but if the configuration is fragile, I've gained nothing.


It's literally described in the article.


They're literally rhetorical questions. So, yes, you can set /defaults/, but you can't /force/ the configuration globally. Which means you still need to examine every single configuration file to understand the behavior of your system as a whole.

Hence.. why I called it a fragile mechanism.


It takes a tenth of a second to run over every unit file in the system and change/set the restart policy.


It's funny how some people call systemd overcomplicated and here we have someone complaining because it only has two levels (default/specific) for config instead of three (default/specific/forced default).


You've got the argument wrong. I'm suggesting you don't need two, you only need one. Unconfigurable forced default.


That would be a bit hard to swallow for a general purpose supervisor.


It's trivial to grep all system service files for the relevant lines to see if they override the default. It is also easy to override that override to set it to whatever you like of so desired.


Then watch for changes. Or remember to do this on more than one system. Or work with a team of people all credentialed to make work related system level changes to the fleet.

Yes, there are ways to deal with anything, but less is more, this is unix afterall. I can't love having a bunch of options, in several places, that may override each other, most of which I would never use for any practical reason anyways.

Reminds me of the type of systems I abandoned in favor of linux.


I believe this allows you to have cascading restart strategies, similar to what can be done in Erlang/OTP: Only after the StartLimit= has been reached, systemd considers the service as failed. Then services that have Required= set on the failed service will be restarted/marked failed as well.

I think you can even have systemd reboot or move the system into a recovery mode (target) if an essential unit does not come up. That way, you can get pretty robust systems that are highly tolerant to failures.

(Now after reading `man systemd.unit`, i am not fully sure how exactly restarts are cascaded to requiring units.)


You can trigger units explicitly on failure with OnFailure=someservice as well (and since you can parameterize service names, you can have e.g. a single failure@.service that'll do whatever you prefer once a service fails.

OnFailure makes it easy to implement more complex restart or notification logic.


I’ve been bitten by the restart limit many times. Our application server (backend) was crash looping, newest build fixed the crash, but systemd refused to restart the service due to the limit. A subtle but very annoying default behavior.


are you saying systemd was refusing to restart after manual intervention?


Correct, because the startup limit had been reached: `service start request repeated too quickly, refusing to start`.


Thats terrifying, systemd shouldn't pretend to be smarter than manual intervention.

That violates everything I ever enjoyed linux for, I left Windows because it thought it knew better than me.


Well, it's behaving as documented and following the semantics of your unit file.

Though I agree the default is poor, if it bites you more than once, write a standard template to reuse for your unit files and/or write a wrapper that calls reset-failed for you. Systemd is far from perfect, but this is a minor nuisance.


SystemD's philosophy is incompatible with UNIX principles. This statement shouldn't be controversial, yet we live in a world where SystemD criticisms are treated like heresy, and Wayland is unnecessary and unusable according to some X11 users directly contradicting most X.org maintainers and contributors.


It's not terrifying, it's mildly annoying. It's also fixed with 'systemctl reset-failed'. SysD doesn't know if 'systemctl start' was emitted by the operator or by a badly running script.


I don't think it should matter if 'systemctl start' was issued by an operator or an external script, it should try to start no matter what. SysD itself should use a different start command or flag that is subject to the limit when trying to restart after it detects a failure to start.


Hello, welcome to an init vs systemd rehash! :)


It doesn't the user just fucked up. You can always run systemctl reset-failed whatever.service


Did your deployment process/script not include restarting the service?


It does, but systemd refused to start the service because of the startup limit.


Seems reasonable if the service is failing due to a transient network issue, which takes many minutes to resolve.


> And then you need to remember to restart the dependent services later, which is easy to forget.

You missed the other direction of the relationship.

I posted elsewhere in the thread on this, don't rely on entropy. Define your dependencies (well)

After=/Requires= are obvious. People forget PartOf=.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: