Hacker News new | past | comments | ask | show | jobs | submit login
Beyond Ctrl-C: The dark corners of Unix signal handling (sunshowers.io)
165 points by PuercoPop 65 days ago | hide | past | favorite | 71 comments



My favorite signal surprise was running nginx and/or httpd in the foreground and wondering why on earth it quit whenver i resized the window.

Turns out, they use SIGWINCH (which is sent on WINdow CHange) for graceful shutdown.

It's a silly, silly problem.


> Turns out, they use SIGWINCH (which is sent on WINdow CHange) for graceful shutdown.

That’s … that’s even worse than people who send errors with an HTTP 200 response code.


Disagree. Annoyingly there is a reasonable case for 200 but with an error, if http is your transport but not your application, then 200 says "yes, the message was transfered and understood correctly, here is your response" which may be an error response from the application


If you’re using HTTP for something other than transferring hypertext — i.e., if your application is not a hypermedia application — then you are doing something just as wrong as encoding IP in DNS packets or email messages. Don’t do that. It’s wrong, even if it is technically interesting.

If, OTOH, your application is a hypermedia application, then returning a success status for errors is just wrong.


Every JSON API under the sun disagrees, but I do agree in principle. People very much like using HTTP as a JSON (or XML) transfer protocol


This ship sailed the day the first HTTP proxy was installed, and likely well before that.


Sorry, what? HTTP is perfectly fine for APIs which are not hypermedia.


For example: Apache (httpd) replaces the 4xx and 5xx response body with its own content instead of whatever you'd returned from an external handler like wsgi. You have to use a 2xx (except for 204) to get a relevant error message back out.


> For example: Apache (httpd) replaces the 4xx and 5xx response body with its own content instead of whatever you'd returned from an external handler like wsgi.

This is the default behavior. Apache httpd can be configured to produce different responses by way of ErrorDocument[0]. From the documentation:

  Customized error responses can be defined for any HTTP
  status code designated as an error condition - that is,
  any 4xx or 5xx status.
HTH

0 - https://httpd.apache.org/docs/trunk/custom-error.html


Even with custom error documents configured in the web server, you still lose the application-specific (and probably request- and error-specific) message generated by the application itself.


Yeah, this is how we ran across it - whoever originally wrote a particular feature was trying to do the right thing by using an HTTP error code, but with a message that would be presented to the user about why that operation failed. A generic response wouldn't work, there were multiple possible reasons all fixable by the user, and tying a whole error code to one specific feature would've probably been a bad idea anyway.


Which is why "you resized the terminal window, clearly you meant to shut down this web server" is even crazier, yes


Indeed. That is particularly good at violating the principle of least astonishment


That's ... not what most people are doing. People send _application_ errors on HTTP 200 response codes, because HTTP response codes are for HTTP and not applications. Most "REST" libraries and webdev get this wrong, building ever more fragile web services.


Applications using status codes is useful because it can tell browsers and load balancers to not cache the page in a uniform way.


I don't think the distinction is as clear-cut as you're making it out to be.

For example, HTTP 409 Conflict generally means an application-level conflict (e.g. an optimistic concurrency mechanism detected a conflict).

HTTP 422 Unprocessable Entity is also usually an application-level error (e.g. hash validation failure, or identifier not recognized by the server).


Task failed successfully


y'know...what really is an error, anyway?


For what is an error, if not a success at failing?


Exactly. Gotta be happy you got a response at all!


In my day, successful commands output nothing at all, so it would seem that a blank page is the only truly error-free result.


Why? That's what SIGTERM is for.


No clue what the decision making process was.

There's a bug report for httpd dating back to 2011[0]. The nginx mailling list also has a grumpy person contemporary with that[1].

My guess is someone thought "httpd is a server running somewhere without a monitor attached, why on earth would it get a SIGWINCH!? surely it's available to use for something completely different", not considering users running it in the foreground during development. Nginx probably followed suit for convention, but that's pure speculation on my part.

Also that was before docker really took off (I'm not sure if it was around in 2011 yet; still in it's infancy maybe). Running it in the foreground didn't happen as much yet. People were still using wamp or installing it via apt and restarting via sudo.

[0] https://bz.apache.org/bugzilla/show_bug.cgi?id=50669

[1] https://mailman.nginx.org/pipermail/nginx/2011-August/028640...


> why on earth would it get a SIGWINCH!?

Reminds me of those "/* not reached */" stories.


They use SIGWINCH for gracefully shutting down workers but not the main process [0]. SIGQUIT is used for a graceful shutdown and SIGTERM for a sort of graceful shutdown (with timeouts).

SIGWINCH is apparently used for an online upgrade [1]. Because it only shuts the workers down you can quickly transition back to the old binary and old configuration if there's a problem, even after upgrading the binary or config stored on the hard drive.

I'm sure there are other ways to get a similar capability, but this set of signals is apparently what they came up with.

[0] http://nginx.org/en/docs/dev/development_guide.html#processe...

[1] https://www.digitalocean.com/community/tutorials/how-to-upgr...


I tried to find out why.

Unfortunately the change that introduces it predates the official release by a few months. And predates the mailing list by about a year:

https://trac.nginx.org/nginx/changeset/5238e93961a189c13eeff...


ok, I found a commit in 2005, coming about because linuxthreads was interfering with the SIGUSR1 signal.

It looks like they wound up making it platform specific, so BSDs and unix like operating systems might still use SIGUSR1.

https://github.com/apache/httpd/commit/395896ae8d19bbea10f82...


I don't know whether to laugh or cry.


definitely laugh! life's too short, you'll never get out alive :)


> Another common extension is to use what is sometimes called a double Ctrl-C pattern. The first time the user hits Ctrl-C, you attempt to shut down the database cleanly, but the second time you encounter it, you give up and exit immediately.

This is a terrible behavior, because users tend to hit Ctrl-C multiple times without intending anything different than on a single hit (not to mention bouncing key mechanics and short key repeat delays). Unclean exits should be reserved for SIGQUIT (Ctrl-\) and SIGKILL (by definition).


If you don't know about it, sure, but I find it's kind of convenient to get a safe shutdown and then be able to easily say "I don't care, just stop this program" without needing a separate kill -9 command or something.


Kids these day. Try resetting server windows on a sgi.

Subject: -42- How can I restart the X server? Date: 10 Sep 1995 00:00:01 EST

  To restart the X server (Xsgi) once, do any one of the following
  (in increasing order of brutality):

  - killall -TERM Xsgi
  - hold down the left-Control, left-Shift, F12 and keypad slash keys
    (this is fondly known as the "Vulcan Death Grip")
  - /usr/gfx/stopgfx; /usr/gfx/startgfx
  - reboot

  To restart the X server every time someone logs out of the console,
  edit /var/X11/xdm/xdm-config, change the setting of
  "DisplayManager._0.terminateServer" from "False" to "True" and do
  'killall -HUP xdm'.


As I wrote, Ctrl-\ should do the trick. And it’s just not practical having to know which program applies the double pattern, and having to train yourself to not accidentally hit Ctrl-C twice.


My brush with the double-ctrl-C pattern was in a place that wrote a lot of Java. It was generally frowned on to write any code that relied on signals which windows users can't send, and if I recall, Java made it quite difficult anyhow.

Windows does have a tradition of using ctrl-c to quit though, so SIGINT ends up being one of the few that you can use in both places. It's not pretty, but giving it a different meaning based on how many times you've ordered it seems like a somewhat natural next step, if a hacky one.


In the Meson build system's test harness, a single Ctrl-C terminates the longest running test with a SIGTERM; while three Ctrl-C in a second interrupt the whole run as if you sent the harness a SIGTERM. This was done because it's not uncommon that there are hundreds of tests left to run and you have seen what you want, and it's useful to have an intuitive shortcut for that case.

However, in both cases it's a clean shutdown, all running are terminated and the test report is printed.


> Unclean exits should be reserved for SIGQUIT (Ctrl-\) and SIGKILL (by definition).

I don't know how it works on your keyboard but on french layout, Ctrl-\ is a two-hands, three-fingers, very unpleasant on the wrist, keyboard shortcut. Not a chance I'd use that for such a common operation.


The byte that sends SIGQUIT is very much configurable with stty quit ^X , but unfortunately X has to be a-z or one of \]^_ (that is, 0x41 through 0x5F except 0x5B = [ which would conflict with other uses of ESC = ^[ = 0x1B) because of how the Ctrl modifier traditionally works. Looking at a map of AZERTY, I don’t see any good options, but you may still want to experiment.


Curiously, on many terminal emulators the following work:

Ctrl-2 = Ctrl-@ = NUL byte

Ctrl-3 = Ctrl-[ = ESC

Ctrl-4 = Ctrl-\ = default for SIGQUIT

Ctrl-5 = Ctrl-] = jump to definition in vim

Ctrl-6 = Ctrl-^ = mosh escape key

Ctrl-7 = Ctrl-_ = undo in Emacs

I think these probably originate in xterm.


I map SIGQUIT to ^Q because that's the easiest to remember.


I suppose you never hit CTRL+S by accident?


stty -ixon

Make sure that thing is disabled


I like that Konlose defaults into disabling that thing. And also that there is a visual sign of the terminal being stopped.


Ctrl-S / Ctrl-Q was super useful in the dialup modem days.


Rarely enough that needing to open another terminal and use kill to send a signal doesn't bother me.


I think the point is that it is not to be a common operation.


well I don't know, it feels like I must mash ctrl-c twenty times per day on average at least


While on UK keyboards it's the opposite "problem" - the left Ctrl key and the \ key are right next to each other (making it potentially a one-finger operation), which is the opposite of how a US keyboard is laid out (where Ctrl-\ was presumably intended to need to be a two-handed, two-finger operation).


> which is the opposite of how a US keyboard is laid out (where Ctrl-\ was presumably intended to need to be a two-handed, two-finger operation).

We have a right Ctrl, so one-hand two-finger.


When using a keyboard "properly" how are you gonna manage that?


two handed operations shouldn't exist.


I completely agree - they're very inaccessible. That's why I quoted the word "problem"; it's not actually a problem at all.


stty quit ^] ?


It's worse, because there are languages that encode interruption into the error handling functionality, so it's common that people mismanage their errors and programs require several Ctrl-C presses to actually reach the interruption handler.

What means that you have to memorize a list of "oh, this program needs Ctrl-C 3 times; oh, this program must only receive Ctrl-C once!"... I don't know of any "oh, this program needs Ctrl-C exactly 2 times", but it's an annoying possibility.


Any software I've come across that uses intentional double ctrl-c shows a message after the first ctrl-c. Something to the effect of "shutting down gracefully, press ctrl-c again for immediate shutdown".

Hence you can just press it once and wait half a second, if no message to this effect appears you can spam ctrl-c.


Yep, this is generally the pattern.


That shouldn't matter. Your database should be consistent in the face of an unclean exit. ACID has been around for a long time.


They can print a message that states that it is attempting to quit cleanly but can be forced to quit by pressing Ctrl+C another time(s). Unison does this.


While I agree in spirit, I also want to meet users where they are.


The article doesn't mention the most useful of all signals: SIGINFO, aka "please print to stderr your current status". Very useful for tools like dd and tar.

Probably because Linux doesn't implement it. Worst mistake Linus ever made.

Also, it talks about self-pipe but doesn't mention that self-socket is much better since you can't select on a pipe.


> self-socket is much better since you can't select on a pipe.

This needs further explanation. Why can’t you select on a pipe? You certainly can use select/poll on pipes in general and I’m not sure of any reason in particular they won’t work for the self pipe notification.

Its even right in the original: https://cr.yp.to/docs/selfpipe.html


Oops, brainfart. Sadly it's too late for me to edit that comment.

Yes, you can select just fine on pipes. What I was thinking of is that recv and send doesn't work on pipes, and asynchronous I/O frameworks typically want to use send/recv rather than write/read because the latter don't have a flags parameter.


Thanks for the feedback! As the talk and the post both mentioned, I was focusing on signals that work on all Unix platforms. Within the constraints of a 30 minute talk there must be material left on the cutting room floor. (If I started talking about the specifics of various Unix lineages I could fill up a whole day...)

For most users in the real world, self-pipes are sufficient. This includes mio (Tokio's underlying library)'s portable Unix implementation of wakers (how parts of the system tell other parts to wake up).


SIGSTOP and SIGCONT are very useful as well.

SIGSTOP is the equivalent of Ctrl-Z in a shell, but you can address it to any process. If you have a server being bogged down, you can stop the offending process temporarily.

SIGCONT undoes SIGSTOP.

The cpulimit tool does this in an automated way so that a process can be limited to use x% of CPU. Nice/renice doesn't keep your CPU from hitting 100% even with an idle priority process, which may be undesirable if it drains battery quickly or makes the cooling fan loud.


Note Ctrl-Z is actually SIGTSTP, which is basically "SIGSTOP except the process can install a signal handler for it".

I have a very exciting blog post about debugging a nasty bug with how SIGTSTP works, coming very soon.


dd prints out status when sent SIGUSR1, but yeah that would be cool if other utilities did that as well off SIGINFO.


And does ^T map to SIGUSR1? That's the other thing which makes it so useful in BSD.


You wouldn’t want it to, because the default behavior for SIGUSR1 is to terminate.


Exactly. Whereas on BSD hitting ^T is (a) very likely to print useful information, and (b) if it doesn't do that, won't do anything at all.


I recently wrote a little data transfer service in python that runs in ECS. When developing it locally it was easy to handle SIGINT: try write a batch, except KeyboardInterrupt, if caught mark the transfer as incomplete and finally commit the change and shut down.

But there’s no exception in python to catch for a SIGTERM, which is what ECS and other service mangers send when it’s time to shut down. So I had to add a signal handler. Would have been neat if SIGTERM could be caught like SIGINT with a “native” exception.


  from signal import SIGTERM, raise_signal, signal
  import sys # for excepthook
  class Terminate(BaseException):
      pass
  def _excepthook(type, value, traceback):
      if not issubclass(type, Terminate):
          return _prevhook(type, value, traceback)
      # If a Terminate went unhandled, make sure we are killed
      # by SIGTERM as far as wait(2) and friends are concerned.
      signal(SIGTERM, _prevterm)
      raise_signal(SIGTERM)
  _prevhook, sys.excepthook = sys.excepthook, _excepthook
  def terminate(signo=SIGTERM, frame=None):
      signal(SIGTERM, _prevterm)
      raise Terminate
  _prevterm = signal(SIGTERM, terminate)


I mean you can just have the signal handler throw StopRequested in your Python boilerplate and never think about it again.

One common pattern is raising KeyboardInterrupt from your handler so it's all handled the same.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: