init scripts were originally designed to be "dumb" scripts that only performed one simple task. other scripts could then be written around them reliably, without wondering what the script might be doing in the background other than loading some options and running a program.
with modern service management, a lot of extra "intelligence" has been added to how programs are executed and managed. this leads to levels of complexity and uncertainty that eventually lead administrators to hacking the hell out of the system to be able to use it reliably, usually disabling advanced features so they can control such features themselves.
on top of that, automation of service restart is a solved problem since... decades ago? it's trivial to have a service restart if it's killed. unfortunately, once such a function is added to your init system, people use it all over the place and don't consider the consequences of services restarting automatically. eventually they get bitten by an unforeseen problem and build in limits to the service restart, etc etc. system automation is a lot harder than most people think.
> it's trivial to have a service restart if it's killed. unfortunately, once such a function is added to your init system, people use it all over the place and don't consider the consequences of services restarting automatically. eventually they get bitten by an unforeseen problem and build in limits to the service restart, etc etc. system automation is a lot harder than most people think.
All the more reason to solve it once instead of in a billion different and differently awful addon packages.
Plus, systemd is actually better at this than supervisord, daemontools, etc., already because of its cgroup usage- it can reliably kill all children before restarting, even if the process forks.
they are quite nice, and show that, in fact, thought has gone into it.
> unfortunately, once such a function is added to your init system, people use it all over the place
I'm not super on board with this view- for almost everyone, if apache goes down, they want something to restart apache because the problem was probably transient and having apache being down is a Big Problem. Plus, most modern supervisors know about things like "restart throttling".
see, this is the problem. people don't seem to understand the paradigm of system automation fully. okay, let's take driving a car automatically as an example.
where are you going? let's assume we know that all the time. now how do we get there? we have to know where we are to figure out how to get there. and on the way, will we need to stop for gas or a bathroom break? once you figure out those things (and a few more) you can get to the meat of the "code", which is how to maneuver the streets without running into anyone. that part is the easy part, because our variables are based on concrete laws of physics, traffic, etc.
the hard part is all those other variables that change based on each trip you take. did we get a flat tire? did someone run into us? is there an unexpected detour? all of these things and more may or may not happen, and you really can't account for them all.
so while it's nice to have intelligent tools, the tools should be designed to make it easy for people to customize their services to their needs, and not attempt to solve all the problems themselves. in my experience, the simpler the tool is, the less assumptions there are.
this leads you to build in things like monitoring, trending, alerting, quotas, resource limits, and generally design your system better to detect, withstand and prevent fault, instead of just reflexively killing and restarting anything when the eventual problems happen.
> Plus, systemd is actually better at this [..] because [..] it can reliably kill all children before restarting
... so that when the state of your database/index/locking/etc goes wacky because some job was killed before it could release the lock, you can then write a lock-cleaner-upper and a post-killing-script-execution add-on to systemd. i'm sure they already thought of that (or encountered it on a running system) so it's probably already a feature, but if not: yikes.
(side note: i did investigate systemd initially and found tons of things they either didn't think of, or hadn't released as features yet. i wouldn't have been able to use systemd as a replacement for my current system without some of them!)
> [simple shell scripts lead] you to build in things like monitoring, trending, alerting, quotas, resource limits, and generally design your system better to detect, withstand and prevent fault
Simple shell scripts might lead people in that direction, but in practise, few people actually go in that direction, and those that do don't go all the way. Systemd might not have all of those features, but it has most of them (which is a massive improvement for most services), and it doesn't stop you from adding the rest yourself (which is fine for services who care about going all the way).
> so while it's nice to have intelligent tools, the tools should be designed to make it easy for people to customize their services to their needs, and not attempt to solve all the problems themselves. in my experience, the simpler the tool is, the less assumptions there are.
I'm unimpressed by this argument. I've used sysv-init and upstart (but, to be honest, mostly using the sysv emulation) with Debian and Ubuntu for years, and often see problems the packaged init shell scripts. Bad 'restart' and 'reload' logic, unreliable 'stop', inconsistent output, etc. Upstart's configuration files have been a huge boon- people just screw them up less than shell scripts. Systemd service files are the same.
And really, for most things you don't need anything complicated. You want to listen on a socket. If something bad happens, there needs to be an alert and the thing should be restarted. If it can't restart, give it a couple seconds. Eventually a meat person will wander over and figure it out.
> this leads you to build in things like monitoring, trending, alerting, quotas, resource limits, and generally design your system better to detect, withstand and prevent fault, instead of just reflexively killing and restarting anything when the eventual problems happen.
systemd is great for quotas and resource limits, because they're such a heavy user of cgroups. It's an easy win. For the rest- do you build that in to your server processes?
Look, at bottom, all init systems run an executable with arguments. If you need to run a shell script to launch your daemon, you can still run a shell script to launch your daemon. The things that are important are:
* Do they do the OS bootstrapping well?
* Are they reliable?
* What tools do they provide to manage your daemons?
SysV and Upstart seem OK at the first two. SysV provides almost no tools, Upstart has a bunch of improvements, though some are arguably busted. Systemd has done a really impressive job at the third.
The general response to that is the infinitely disappointing "I have an unspecified unhandled corner case and you are violating the unix philosophy by not using 10,000 poorly written shell scripts".
> ... so that when the state of your database/index/locking/etc goes wacky because some job was killed before it could release the lock, you can then write a lock-cleaner-upper and a post-killing-script-execution add-on to systemd. i'm sure they already thought of that (or encountered it on a running system) so it's probably already a feature, but if not: yikes.
I posted a link to the man page already, and if you go to it and search for 'ExecStart', you will see a whole list of things- you can specify commands to start, reload, or restart your daemon, to run after stopping, as watchdogs, etc. If you have these problems they can be handled just fine.
> Plus, systemd is actually better at this than supervisord, daemontools, etc., already because of its cgroup usage- it can reliably kill all children before restarting, even if the process forks.
That's exactly the wrong thing for some services, for example sshd.
Which is why it doesn't do that for some services, for example sshd :P Each login session gets a separate cgroup, so the ssh daemon and all the users using it can be managed separately.
with modern service management, a lot of extra "intelligence" has been added to how programs are executed and managed. this leads to levels of complexity and uncertainty that eventually lead administrators to hacking the hell out of the system to be able to use it reliably, usually disabling advanced features so they can control such features themselves.
on top of that, automation of service restart is a solved problem since... decades ago? it's trivial to have a service restart if it's killed. unfortunately, once such a function is added to your init system, people use it all over the place and don't consider the consequences of services restarting automatically. eventually they get bitten by an unforeseen problem and build in limits to the service restart, etc etc. system automation is a lot harder than most people think.